Last Updated on
The Python ecosystem is growing and may become the dominant platform for machine learning.
The primarily rationale for adopting Python for machine learning is because it is a general purpose programming language that you can use both for research and development and in production.
In this post you will discover the Python ecosystem for machine learning.
Discover how to prepare data with pandas, fit and evaluate models with scikit-learn, and more in my new book, with 16 step-by-step tutorials, 3 projects, and full python code.
Let’s get started.
Python is a general purpose interpreted programming language. It is easy to learn and use primarily because the language focuses on readability.
The philosophy of Python is captured in the Zen of Python which includes phrases like:
- Beautiful is better than ugly.
- Explicit is better than implicit.
- Simple is better than complex.
- Complex is better than complicated.
- Flat is better than nested.
- Sparse is better than dense.
- Readability counts.
You can see the full Zen of Python in your Python environment by typing:
It is a popular language in general, consistently appearing in the top 10 programming languages in surveys on StackOverflow (for example the 2015 survey results). It’s a dynamic language and very suited to interactive development and quick prototyping with the power to support the development of large applications.
It is also widely used for machine learning and data science because of the excellent library support and because it is a general purpose programming language (unlike R or Matlab). For example, see the results of the Kaggle platform survey results in 2011 and the KDD Nuggets 2015 tool survey results.
This is a simple and very important consideration.
It means that you can perform your research and development (figuring out what models to use) in the same programming language that you use in operations. Greatly simplifying the transition from development to operations.
Need help with Machine Learning in Python?
Take my free 2-week email course and discover data prep, algorithms and more (with code).
Click to sign-up now and also get a free PDF Ebook version of the course.
SciPy is an ecosystem of Python libraries for mathematics, science and engineering. It is an add-on to Python that you will need for machine learning.
The SciPy ecosystem is comprised of the following core modules relevant to machine learning:
- NumPy: A foundation for SciPy that allows you to efficiently work with data in arrays.
- Matplotlib: Allows you to create 2D charts and plots from data.
- pandas: Tools and data structures to organize and analyze your data.
To be effective at machine learning in Python you must install and become familiar with SciPy. Specifically:
- You will use Pandas to load explore and better understand your data.
- You will use Matplotlib (and wrappers of Matplotlib in other frameworks) to create plots and charts of your data.
- You will prepare your data as NumPy arrays for modeling in machine learning algorithms.
You can learn more about Pandas in the posts Prepare Data for Machine Learning in Python with Pandas and Quick and Dirty Data Analysis with Pandas.
The scikit-learn library is how you can develop and practice machine learning in python.
It is built upon and requires the SciPy ecosystem. The name “scikit” suggests that it is a SciPy plugin or toolkit. You can review a full list of available SciKits.
The focus of the library is machine learning algorithms for classification, regression, clustering and more. It also provides tools for related tasks such as evaluating models, tuning parameters and pre-processing data.
Like Python and SciPy, scikit-learn is open source and commercially usable under the BSD license. This means that you can learn about machine learning, develop models and put them into operations all with the same ecosystem and code. A powerful reason to use scikit-learn.
You can learn more about scikit-learn in the post A Gentle Introduction to scikit-learn.
Python Ecosystem Installation
There are multiple ways to install the Python ecosystem for machine learning. In this section we cover how to install the Python ecosystem for machine learning.
How To Install Python
The first step is to install Python. I prefer to use and recommend Python 2.7.
Once installed you can confirm the installation was successful. Open a command line and type:
You should see a response like the following:
How To Install SciPy
There are many ways to install SciPy. For example two popular ways are to use package management on your platform (e.g. yum on RedHat or macports on OS X) or use a Python package management tool like pip.
The SciPy documentation is excellent and covers how-to instructions for many different platforms on the page Installing the SciPy Stack.
When installing SciPy, ensure that you install the following packages as a minimum:
Once installed, you can confirm that the installation was successful. Open the python interactive environment by typing “python” at the command line, then type in and run the following python code to print the versions of the installed libraries.
print('scipy: %s' % scipy.__version__)
print('numpy: %s' % numpy.__version__)
print('matplotlib: %s' % matplotlib.__version__)
print('pandas: %s' % pandas.__version__)
On my workstation at the time of posting I see the following output.
What output do you see? Post it in the comments.
If you have an error, you may need to consult the documentation for your platform.
How To Install scikit-learn
I would suggest that you use the same method to install scikit-learn as you used to install SciPy.
Like SciPy, you can confirm that scikit-learn was installed succesfully. Start your Python interactive environment and type and run the following code.
print('sklearn: %s' % sklearn.__version__)
It will print the version of the scikit-learn library installed. On my workstation I see the following output:
How To Install The Ecosystem: An Easier Way
If you are not confident at installing software on your machine, there is an easier option for you.
There is a distribution called Anaconda that you can download and install for free.
It supports the three main platforms of Microsoft Windows, Mac OS X and Linux.
It includes Python, SciPy and scikit-learn. Everything you need to learn, practice and use machine learning with the Python Environment.
In this post you discovered the Python ecosystem for machine learning.
You learned about:
- Python and it’s rising use for machine learning.
- SciPy and the functionality it provides with NumPy, Matplotlib and Pandas.
- scikit-learn that provides all of the machine learning algorithms.
You also learned how to install the Python ecosystem for machine learning on your workstation.
Do you have any questions about Python for machine learning or this post? Ask your question in the comments and I will do my best to answer.