Machine Learning Development Environment

The development environment that you use for machine learning may be just as important as the machine learning methods that you use to solve your predictive modeling problem.

A few times a week, I get a question such as:

What is your development environment for machine learning?

In this post, you will discover the development environment that I use and recommend for applied machine learning for developers.

After reading this post, you will know:

  • The important distinctions between the role of workstation and server hardware in machine learning.
  • How to ensure that your machine learning dependencies are installed and updated in a repeatable manner.
  • How to develop machine learning code and run it in a safe way that does not introduce new issues.

Let’s get started.

Machine Learning Development Environment

Machine Learning Development Environment
Photo by Mohamed Aymen Bettaieb, some rights reserved.

What does your machine learning development environment look like?
Let me know in the comments below.

Hardware for Machine Learning

Whether you are learning machine learning or are developing large models for operations, your workstation hardware does not matter that much.

Here’s why:

I do not recommend that you fit large models on your workstation.

Machine learning development involves lots of small tests to figure out preliminary answers to questions such as:

  • What data to use.
  • How to prepare data.
  • What models to use.
  • What configuration to use.

Ultimately, your goal on your workstation is to figure out what experiments to run. I call this preliminary experiments. For your preliminary experiments, use less data: a small sample that will fit within your hardware capabilities.

Larger experiments take minutes, hours, or even days to complete. They should be run on large hardware other than your workstation.

This may be a server environment, perhaps with GPU hardware if you are using deep learning methods. This hardware may be provided by your employer or you can rent it cheaply in the cloud, such as AWS.

It is true that the faster (CPU) your workstation is and the more capacity (RAM) your workstation has, the more or larger preliminary small experiments you can run and the more you can get out of your larger experiments. So, get the best hardware you can, but in general, work with what you have got.

I myself like large Linux boxes with lots of RAM and lots of cores for serious R&D. For everyday work, I like an iMac, again with as many cores and as much RAM as I can get.

In summary:

  • Workstation. Work with a small sample of your data and figure out what large experiments to run.
  • Server(s). Run large experiments that take hours or days and help you figure out what model to use in operations.

Install Machine Learning Dependencies

You must install the library dependencies you have for machine learning development.

This is mainly the libraries you are using.

In Python, this may be Pandas, scikit-learn, Keras, and more. In R, this is all the packages and perhaps caret.

More than just installing the dependencies, you should have a repeatable process so that you can set-up the development environment again in seconds, such as on new workstations and on new servers.

I recommend using a package manager and a script, such as a shell script to install everything.

On my iMac, I use macports to manage installed packages. I think have two scripts: one to install all the packages I require on a new mac (such as after an upgrade of workstation or laptop) and another script specifically to update the installed packages.

Libraries are always being updated with bug fixes, so this second script to update the specifically installed libraries (and their dependencies) is key.

These are shell scripts that I can run at any time and that I keep updated as I need to install new libraries.

If you need help setting up your environment, one of these tutorials may help:

You may wish to take things to the next level in terms of having a repeatable environment, such as using a container such as Docker or maintaining your own virtualized instance.

In summary:

  • Install Script. Maintain a script that you can use to reinstall everything needed for your development environment.
  • Update Script. Maintain a script to update all key dependencies for machine learning development and run it periodically.

Machine Learning Editor

I recommend a very simple editing environment.

The hard work with machine learning development is not writing code; it is instead dealing with the unknowns already mentioned. Unknowns such as:

  • What data to use.
  • How to prepare the data.
  • What algorithm/s to use.
  • What configurations to use.

Writing code is the easy part, especially because you are very likely to use an existing algorithm implementation from a modern machine learning library.

For this reason, you do not need a fancy IDE; it will not help you get answers to these questions.

Instead, I recommend using a very simple text editor that offers basic code highlighting.

Personally, I use and recommend Sublime Text, but any similar text editor will work just as well.

Example of a Machine Learning Text Editor

Example of a Machine Learning Text Editor

Some developers like to use notebooks, such as Jupyter. I do not use or recommend them as I have found that these environments to be challenging for development; they can hide errors and introduce dependency strangeness for development.

For studying machine learning and for machine learning development, I recommend writing scripts or code that can be run directly from the command line or from a shell script.

For example, R scripts and Python scripts can be run directly using the respective interpreter.

Example of Running a Machine Learning Model

Example of Running a Machine Learning Model

For more advice on how to run experiments from the command line, see the post:

Once you have a finalized model (or set of predictions), you can integrate it into your application using your standard development tools for your project.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Summary

In this post, you discovered the hardware, dependencies, and editor to use for machine learning development.

Specifically, you learned:

  • The important distinctions between the role of workstation and server hardware in machine learning.
  • How to ensure that your machine learning dependencies are installed and updated in a repeatable manner.
  • How to develop machine learning code and run it in a safe way that does not introduce new issues.

What does your machine learning development environment look like?
Let me know in the comments below.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

4 Responses to Machine Learning Development Environment

  1. Bart April 11, 2018 at 5:40 am #

    Hello Jason,

    Important to mention Google Colab, Google’s free cloud service for AI developers. With Colab, you can develop deep learning applications on the GPU for free. (https://colab.research.google.com)

    It is for free with GPU available – manly Tesla K80 GPU.

    You can have 200$ laptop and still do the machine learning!

    It just proves that the best things in this world have always been for free…

    • Jason Brownlee April 11, 2018 at 6:41 am #

      Very impressive, thanks for sharing.

    • Neha April 13, 2018 at 5:06 pm #

      Thanks for sharing

    • Solomon October 21, 2018 at 3:10 am #

      @Bart, do you mean I can use this to do my research in Machine learning(DeepLearning) for free and without been locked out on a particular day provided i adhere to their T&Cs?

Leave a Reply