Linux is an excellent environment for machine learning development with Python.
The tools can be installed quickly and easily and you can develop and run large models directly.
In this tutorial, you will discover how to create and setup a Linux virtual machine for machine learning with Python.
After completing this tutorial, you will know:
- How to download and install VirtualBox for managing virtual machines.
- How to download and setup Fedora Linux.
- How to install a SciPy environment for machine learning in Python 3.
This tutorial is suitable if your base operating system is Windows, Mac OS X, and Linux.
Let’s get started.
Benefits of a Linux Virtual Machine
There are a number of reasons that you may want to use a Linux virtual machine for Python machine learning development.
For example, below is a list of 5 top benefits for using a virtual machine:
- To use tools not available on your system (if you’re on Windows).
- To install and use machine learning tools without impacting your local environment (e.g. use Python 3 tools).
- To have highly customized environments for different projects (Python2 and Python3).
- To save the state of the machine and pick up exactly where you left off (jump from machine to machine).
- To share development environment with other developers (set-up once and reuse many times).
Perhaps the most beneficial point is the first, being able to easily use machine learning tools not supported on your environment.
I’m an OS X user, and even though machine learning tools can be installed using brew and macports, I still find it easier to setup and use Linux virtual machines for machine learning development.
This tutorial is broken down into 3 parts:
- Download and Install VirtualBox.
- Download and Install Fedora Linux in a Virtual Machine.
- Install Python Machine Learning Environment
1. Download and Install VirtualBox
VirtualBox is a free open source platform for creating and managing virtual machines.
Once installed, you can create all the virtual machines you like, as long as you have the ISO images or CDs to install from.
- 3. Choose binaries for your workstation.
- 4. Install the software for your system and follow the installation instructions.
- 5. Open the VirtualBox software and confirm it works.
2. Download and Install Fedora Linux
I chose Fedora Linux because I think it is a kinder and gentler Linux than some.
It is a leading edge for RedHat Linux intended for workstations and developers.
2.1 Download the Fedora ISO Image
Let’s start off by downloading the ISO for Fedora Linux. In this case, the 64-bit version of Fedora 25.
- 1. Visit GetFedora.org.
- 2. Click “Workstation” to access the Workstation page.
- 3. Click “Download now” to access the Downloads page.
- 4. Under “Other Downloads” click “64-bit 1.3GB Live image“
- 5. You should now have an ISO file with the name:
We are now ready to create the VM in VirtualBox.
2.2 Create the Fedora Virtual Machine
Now, let’s create the Fedora virtual machine in VirtualBox.
- 1. Open the VirtualBox software.
- 2. Click “New” button.
- 3. Select the Name and operating system.
- name: Fedora25
- type: Linux
- version: Fedora (64-bit)
- Click “Continue“
- 4. Configure the Memory Size
- 5. Configure the Hard Disk
- Create a virtual hard disk now
- Hard disk file type
- VDI (VirtualBox Disk Image)
- Storage on physical hard disk
- Dynamically allocated
- File location and size: 10GB
We are now ready to install Fedora from the ISO image.
2.3 Install Fedora Linux
Now, let’s install Fedora Linux on the new virtual machine.
- 1. Select the new virtual machine and click the “Start” button.
- 2. Click Folder Icon and choose the Fedora ISO file:
- 3. Click the “Start” button.
- 4. Select the first option “Start Fedora-Live-Workstation-Live 25” and press the Enter key.
- 5. Hit the “Esc” key to skip the check.
- 6. Select “Live System User“.
- 7. Select “Install to Hard Drive“.
- 8. Complete “Language Selection” (English)
- 9. Complete “Installation Destination” (“ATA VBOX HARDDISK“).
- You may need to wait one minute for the VM to create the hard disk.
- 10. Click “Begin Installation“.
- 11. Set root password.
- 12. Create a user for yourself.
- Note down the username and password (so that you can use it later).
- Tick the “Make this user administrator” (so you can install software).
- 13. Wait for the installation to complete… (5 minutes?)
- 14. Click “Quit”, click power icon in top right; select power off.
2.4 Finalize Fedora Linux Installation
Fedora Linux has been installed; let’s finalize the installation and make it ready for use.
- 1. In VirtualBox with the Fedora25 VM selected, under “Storage“, click on “Optical Drive“.
- Select “Remove disk from virtual drive” to eject the ISO image.
- 2. Click the “Start” button to start the Fedora Linux installation.
- 3. Login as the user you created.
- 4. Finalize installation
- Choose language “English“
- Click “Next“
- Choose Keyboard “US“
- Click “Next“
- Configure Privacy
- Click “Next“
- Connect Your Online Accounts
- Click “Skip“
- Click “Start using Fedora“
- 5. Close the help system that starts automatically.
We now have a Fedora Linux virtual machine ready to install new software.
3. Install Python Machine Learning Environment
Fedora uses Gnome 3 as the window manager.
Gnome 3 is quite different to prior versions of Gnome; you can learn how to get around by using the built-in help system.
3.1 Install Python Environment
Let’s start off by installing the required Python libraries for machine learning development.
- 1. Open the terminal.
- Click “Activities“
- Type “terminal“
- Click icon or press enter
- 2. Confirm Python3 was installed.
- 3. Install the Python machine learning environment. Specifically:
DNF is the software installation system, formally yum. The first time you run dnf, it will update the database of packages, this might take a minute.
sudo dnf install python3-numpy python3-scipy python3-scikit-learn python3-pandas python3-matplotlib python3-statsmodels
Enter your password when prompted.
Confirm the installation when prompted by pressing “y” and “enter“.
3.2 Confirm Python Environment
Now that the environment is installed, we can confirm it by printing the versions of each required library.
- 1. Open Gedit.
- Click “Activities“
- Type “gedit“
- Click icon or press enter
- 2. Type the following script and save it as versions.py in the home directory.
print('scipy: %s' % scipy.__version__)
print('numpy: %s' % numpy.__version__)
print('matplotlib: %s' % matplotlib.__version__)
print('pandas: %s' % pandas.__version__)
print('sklearn: %s' % sklearn.__version__)
print('statsmodels: %s' % statsmodels.__version__)
There is no copy-paste support; you may want to open Firefox within the VM and navigate to this page and copy paste the script into your Gedit window.
- 3. Run the script in the terminal.
Tips For Using the VM
This section lists some tips using the VM for machine learning development.
- Copy-paste and Folder Sharing. These features require the installation of “Guest Additions” in the Linux VM. I have not been able to get this to install correctly and therefore do not use these features. You can try if you like; let me know how you do in the comments.
- Use GitHub. I recommend storing all of your code in GitHub and checking the code in and out from the VM. It makes life a lot easier for getting code and assets in and out of the VM.
- Use Sublime. I think sublime is a great text editor on Linux for development, better than Gedit at least.
- Use AWS for large jobs. You can use the same procedure to setup Fedora Linux on Amazon Web Services for running large models in the cloud.
- VM Tools. You can save the VM at any point by closing the window. You can also take a snapshot of the VM at any point and return to the snapshot. This can be helpful if you are making large changes to the file system.
- Python2. You can easily install Python2 alongside Python 3 in Linux and use the python (rather than python3) binary or use alternatives to switch between the two.
- Notebooks. Consider running a notebook server inside the VM and opening up the firewall so that you can connect and run from your main workstation outside of the VM.
Do you have any tips to share? Let me know in the comments.
Below are some resources for further reading if you are new to the tools used in this tutorial.
- VirtualBox User Manual
- Fedora Documentation
- Fedora Wiki (tons of help on common topics)
- SciPy Homepage
- Scikit-Learn Homepage
In this tutorial, you discovered how to setup a Linux virtual machine for Python machine learning development.
Specifically, you learned:
- How to download and install VirtualBox, free, open-source software for managing virtual machines.
- How to download and setup Fedora Linux, a friendly Linux distribution for developers.
- How to install and test a Python3 environment for machine learning development.
Did you complete the tutorial?
Let me know how it went in the comments below.