Python Ecosystem for Machine Learning

The Python ecosystem is growing and may become the dominant platform for machine learning.

The primarily rationale for adopting Python for machine learning is because it is a general purpose programming language that you can use both for research and development and in production.

In this post you will discover the Python ecosystem for machine learning.

Python Ecosystem for Machine Learning

Python Ecosystem for Machine Learning
Photo by Stewart Black, some rights reserved.

Python

Python is a general purpose interpreted programming language. It is easy to learn and use primarily because the language focuses on readability.

The philosophy of Python is captured in the Zen of Python which includes phrases like:

  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Complex is better than complicated.
  • Flat is better than nested.
  • Sparse is better than dense.
  • Readability counts.

You can see the full Zen of Python in your Python environment by typing:

It is a popular language in general, consistently appearing in the top 10 programming languages in surveys on StackOverflow (for example the 2015 survey results). It’s a dynamic language and very suited to interactive development and quick prototyping with the power to support the development of large applications.

It is also widely used for machine learning and data science because of the excellent library support and because it is a general purpose programming language (unlike R or Matlab). For example, see the results of the Kaggle platform survey results in 2011 and the KDD Nuggets 2015 tool survey results.

This is a simple and very important consideration.

It means that you can perform your research and development (figuring out what models to use) in the same programming language that you use in operations. Greatly simplifying the transition from development to operations.

Need help with Machine Learning in Python?

Take my free 2-week email course and discover data prep, algorithms and more (with sample code).

Click to sign-up now and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

SciPy

SciPy is an ecosystem of Python libraries for mathematics, science and engineering. It is an add-on to Python that you will need for machine learning.

The SciPy ecosystem is comprised of the following core modules relevant to machine learning:

  • NumPy: A foundation for SciPy that allows you to efficiently work with data in arrays.
  • Matplotlib: Allows you to create 2D charts and plots from data.
  • pandas: Tools and data structures to organize and analyze your data.

To be effective at machine learning in Python you must install and become familiar with SciPy. Specifically:

  • You will use Pandas to load explore and better understand your data.
  • You will use Matplotlib (and wrappers of Matplotlib in other frameworks) to create plots and charts of your data.
  • You will prepare your data as NumPy arrays for modeling in machine learning algorithms.

You can learn more about Pandas in the posts Prepare Data for Machine Learning in Python with Pandas and Quick and Dirty Data Analysis with Pandas.

scikit-learn

The scikit-learn library is how you can develop and practice machine learning in python.

It is built upon and requires the SciPy ecosystem. The name “scikit” suggests that it is a SciPy plugin or toolkit. You can review a full list of available SciKits.

The focus of the library is machine learning algorithms for classification, regression, clustering and more. It also provides tools for related tasks such as evaluating models, tuning parameters and pre-processing data.

Like Python and SciPy, scikit-learn is open source and commercially usable under the BSD license. This means that you can learn about machine learning, develop models and put them into operations all with the same ecosystem and code. A powerful reason to use scikit-learn.

You can learn more about scikit-learn in the post A Gentle Introduction to scikit-learn.

Python Ecosystem Installation

There are multiple ways to install the Python ecosystem for machine learning. In this section we cover how to install the Python ecosystem for machine learning.

How To Install Python

The first step is to install Python. I prefer to use and recommend Python 2.7.

This will be specific to your platform. For instructions see Downloading Python in the Python Beginners Guide.

Once installed you can confirm the installation was successful. Open a command line and type:

You should see a response like the following:

How To Install SciPy

There are many ways to install SciPy. For example two popular ways are to use package management on your platform (e.g. yum on RedHat or macports on OS X) or use a Python package management tool like pip.

The SciPy documentation is excellent and covers how-to instructions for many different platforms on the page Installing the SciPy Stack.

When installing SciPy, ensure that you install the following packages as a minimum:

  • scipy
  • numpy
  • matplotlib
  • pandas

Once installed, you can confirm that the installation was successful. Open the python interactive environment by typing “python” at the command line, then type in and run the following python code to print the versions of the installed libraries.

On my workstation at the time of posting I see the following output.

What output do you see? Post it in the comments.

If you have an error, you may need to consult the documentation for your platform.

How To Install scikit-learn

I would suggest that you use the same method to install scikit-learn as you used to install SciPy.

There are instructions for installing scikit-learn, but they are limited to using the Python pip and conda package managers.

Like SciPy, you can confirm that scikit-learn was installed succesfully. Start your Python interactive environment and type and run the following code.

It will print the version of the scikit-learn library installed. On my workstation I see the following output:

How To Install The Ecosystem: An Easier Way

If you are not confident at installing software on your machine, there is an easier option for you.

There is a distribution called Anaconda that you can download and install for free.

It supports the three main platforms of Microsoft Windows, Mac OS X and Linux.

It includes Python, SciPy and scikit-learn. Everything you need to learn, practice and use machine learning with the Python Environment.

Summary

In this post you discovered the Python ecosystem for machine learning.

You learned about:

  • Python and it’s rising use for machine learning.
  • SciPy and the functionality it provides with NumPy, Matplotlib and Pandas.
  • scikit-learn that provides all of the machine learning algorithms.

You also learned how to install the Python ecosystem for machine learning on your workstation.

Do you have any questions about Python for machine learning or this post? Ask your question in the comments and I will do my best to answer.


Frustrated With Python Machine Learning?

Master Machine Learning With Python

Develop Your Own Models in Minutes

…with just a few lines of scikit-learn code

Discover how in my new Ebook:
Machine Learning Mastery With Python

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, modeling, tuning, and much more…

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.


15 Responses to Python Ecosystem for Machine Learning

  1. Antole May 2, 2016 at 5:54 am #

    Just got started with Python, data science and machine learning. Your resources and posts have made it very easy for me to have a structured way to study and a birds eye view of what i need to learn. This is just a note to say THANK YOU.

  2. pedro May 6, 2016 at 12:14 pm #

    I got these:

    scipy : 0.14.1
    numpy : 1.8.2
    matplotlib : 1.4.2
    pandas : 0.15.0

    it is fine if I just used these version? Or it is recommended to update? thanks

    • Jason Brownlee May 6, 2016 at 3:05 pm #

      It should be fine, give it a shot.

    • Lucky September 16, 2016 at 2:44 pm #

      Thanks a lot Jason Brownlee.. But you can help me to install anaconda.. Regards!

      • Jason Brownlee September 17, 2016 at 9:27 am #

        I believe it is straight forward to install anaconda.

        If you cannot setup the environment, my material might not be a good fit for you.

  3. Lucky September 16, 2016 at 3:03 pm #

    I got these,is it right?

    scipy : 0. 17 .1
    numpy : 1.1.11
    matplotlib : 1.5.1
    pandas : 0.18.1

  4. yxl from China December 14, 2016 at 7:00 pm #

    Jason,usfule for me to begin ML with Python, thanks a lot!

    scipy: 0.18.0
    numpy: 1.11.1
    matplotlib: 1.5.2
    pandas: 0.18.1

  5. Mitch Sanders January 28, 2017 at 8:01 am #

    Jason…. the pic above isn’t a python – but a boa. Particularly it looks like a Dumeril’s boa from Madagascar. Thought you’d want to know. [a former amateur herpetologist] 🙂

  6. asbi February 2, 2017 at 5:50 am #

    after running scipy code that is given above i got this…………

    Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (Intel)] on win32
    Type “copyright”, “credits” or “license()” for more information.
    >>>

  7. Ed Sauer February 24, 2017 at 7:39 am #

    Thank you Jason for helping me to get going with Machine Learning. I chose to use Anaconda and have the following versions of the installed libraries.

    scipy: 0.18.1
    numpy: 1.12.0
    matplotlib: 2.0.0
    pandas: 0.19.2

    Cheers,

Leave a Reply