Last Updated on August 15, 2020
If you get serious with data analysis and machine learning in python then you will make good use of IPython notebooks.
In this post we will review some takeaway points made by Fernando Perez, the creator of IPython in a keynote presentation at SciPy 2013.
The title of the talk was IPython: from the shell to a book with a single tool; the method behind the madness.
Kick-start your project with my new book Machine Learning Mastery With Python, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
Fernando opens the talk with an excellent quote by Richard Hamming (1962) from the preface of Numerical Methods for Scientists and Engineers that bears repeating:
the purpose of computing is insight, not numbers
Need help with Machine Learning in Python?
Take my free 2-week email course and discover data prep, algorithms and more (with code).
Click to sign-up now and also get a free PDF Ebook version of the course.
Fernando presents what he calls a schematic for the lifecycle of a scientific idea, as follows:
- Individual: exploratory work
- Collaborative: development
- Parallel: production runs
- Publication: with reproducible results
- Education: sharing what was learned
- GoTo Step 1
He stresses the requirement to be able to move backward and forward through this process, that it is not linear. He comments that IPython was designed in October or November 2001 to address this requirement.
IPython started as a better python shell. It developed to include live interactive plotting, then live interactive parallel computing and embedding in applications. Interactivity is important, it is the ‘I’ in IPython. The platform has been through 6 iterations and has arrived at the IPython Notebook.
IPython Notebooks allow you to have cells of executable python code and markdown descriptions. This allows a single document to include the description, computation (such as Python scriptlets and programs) and artefacts (such as results and plots) from running the computation. This is a simple but very powerful communication tool.
Fernando describes this as Literate Computing, a step beyond Knuth’s Literate Programming.
An important contribution is the IPython Notebook Viewer that will render any notebook for you and presented it on the web. This service used in contribution with open source Notebook files on the web (such as GitHub) is a powerful resource.
Fernando then provides some cornerstone notebook examples to highlight the benefits of the technology.
Reproducible Research Paper
The paper Collaborative cloud-enabled tools allow rapid, reproducible biological insights, and the associated materials.
This paper was developed and written as an IPython notebook. It includes the descriptions, computations, results and even the configuration to spin up the Cluster to execute the computations in parallel on a cluster. Completely reproducible research.
Notebook-based Technical Blogging
The blog Pythonic Perambulations, Musings and ramblings through the world of Python and beyond by Jake VanderPlas.
Jake blogs using IPython notebooks allowing the combination of descriptions, computation and the outputs of executed computations in the form of graphs.
Bayesian Methods for Hackers
The book Bayesian Methods for Hackers was developed by Cameron Davidson-Pilon as a series of IPython notebooks (one per chapter) that you can work through.
This is a high-quality book and an excellent use case and demonstration for the technology.
Fernando spends some times describing the impressive architecture of the IPython kernel and shell and it is well worth the time to understand this material better,
For more information you can checkout the IPython home page and this curated gallery of notable IPython Notebooks.
IPython (now Jupyter Notebooks) is an excellent tool and I’d dare to say it changed the way data scientist, data analysts and machine learning enthusiasts share, improve, and complement their knowledge. Many companies like Kaggle, DataBricks, Zeppelin, and Skymind has adopted this way of presenting our work in an interactive way.
Have you used Jupyter Notebooks with another language besides Python? How was your experience? Thank you very much for your time and attention.
Keep up the good work!
I avoid them and generally recommend students to work from the command line instead as the notebooks can introduce env issues and hide true errors. I got a lot of emails from confused beginners when I recommended using them.