Google Colab for Machine Learning Projects

Last Updated on May 4, 2022

Have you ever wanted an easy-to-configure interactive environment to run your machine learning code that came with access to GPUs for free? Google Colab is the answer you’ve been looking for. It is a convenient and easy-to-use way to run Jupyter notebooks on the cloud, and their free version comes with some limited access to GPUs as well.

If you’re familiar with Jupyter notebooks, learning Colab will be a piece of cake, and we can even import Jupyter notebooks to be run on Google Colab. But, there are a lot of nifty things that Colab can do as well, which we’re going to explore in this article. Let’s dive right in!

After completing the tutorial, you will learn how to:

  • Speed up training using Google Colab’s free tier with GPU
  • Using Google Colab’s extensions to save to Google Drive, present interactive display for pandas DataFrame, etc.
  • Save your model’s progress when training with Google Colab

Let’s get started!

Google Colab for Machine Learning Projects
Photo by NASA and processing by Thomas Thomopoulos. Some rights reserved.

Overview

This tutorial is divided into five parts; they are:

  • What is Google Colab?
  • Google Colab quick start guide
  • Exploring your Colab environment
  • Useful Google Colab extensions
  • Example: Saving model progress on Google Drive

What Is Google Colab?

From the “Welcome to Colab” notebook:

Colab notebooks allow you to combine executable code and rich text in a single document, along with images, HTML, LaTeX and more. When you create your own Colab notebooks, they are stored in your Google Drive account. You can easily share your Colab notebooks with co-workers or friends, allowing them to comment on your notebooks or even edit them.

We can use Google Colabs like Jupyter notebooks. They are really convenient because Google Colab hosts them, so we don’t use any of our own computer resources to run the notebook. We can also share these notebooks so other people can easily run our code, all with a standard environment since it’s not dependent on our own local machines. However, we might need to install some libraries in our environment during initialization.

Google Colab Quick Start Guide

To create your Google Colab file and get started with Google Colab, you can go to Google Drive and create a Google Drive account if you do not have one. Now, click on the “New” button at the top left corner of your Google Drive page, then click on More ▷ Google Colaboratory.

Creating a New Google Colab Notebook

You will then enter the page for your new Google Colab file:

New Google Colab Notebook

From here, you can share your Google Colab file with others using the Share button on the top right-hand corner or start coding!

The hotkeys on Colab and that on Jupyter notebooks are similar. These are some of the useful ones:

  • Run cell: Ctrl + Enter
  • Run cell and add new cell below: Alt + Enter
  • Run cell and goto cell below: Shift + Enter
  • Indent line by two spaces: Ctrl + ]
  • Unindent line by two spaces: Ctrl + [

But there’s also one extra that’s pretty useful that lets you only run a particular selected part of the code in a cell:

  • Run selected part of a cell: Ctrl + Shift + Enter

Just like the Jupyter notebook, you can also write text with Markdown cells. But Colab has an additional feature that automatically generates a table of contents based on your markdown content, and you can also hide parts of the code based on their headings in the markdown cells.

Google Colab with Markdown and Table of Contents

If you run Jupyter on your own computer, you have no choice but to use the CPU from your computer. But in Colab, you can change the runtime to include GPUs and TPUs in addition to CPUs because it is executed on Google’s cloud. You can switch to a different runtime by going to Runtime ▷ Change runtime type:

Changing the Runtime Type for Google Colab

You can then select from the different hardware accelerators to equip your environment with.

Adding GPU/TPU to Google Colab Notebook Environment

Unlike your own computer, Google Colab does not provide you with a terminal to enter commands to manage your Python environment. To install Python libraries and other programs, we can use the ! character to run shell commands just like in Jupyter notebooks, e.g. !pip install numpy (but as we’ll see later on, Colab already comes pre-installed with a lot of the libraries we’ll need, such as NumPy)

Now that we know how to set up our Colab environment and start running some code, let’s do some exploration of the environment!

Exploring Your Colab Environment

As we can run some shell commands with ! using the wget command, it is probably the easiest way to get some data. For example, running this will bring you a CSV file to the Colab environment:

To explore the current working directory of your Colab file on the virtual machine, click on the File icon on the left-hand side of the screen. By default, Colab provides you with a directory named sample_data with a few files:

Files Tab of Google Colab Notebook

This is the current working directory for our Colab notebook. You can read one of these files in Python by using a code like this on the notebook:

Later we’ll see how to use Colab extensions to mount our Google Drive to this directory in order to store and access files on our Google Drive account.

By running shell commands using !, we can also look at the hardware configuration of our Colab environment. To take a look at the CPU, we can use:

This gives the output for my environment as:

We can also check if we have a GPU attached to the runtime by using:

This gives the output if you’ve got one:

These are just some examples of the shell commands that we can use to explore the Colab environment. There are also many others, such as !pip list to look at the libraries that the Colab environment has access to, the standard !ls to explore the files in the working directory, etc.

Useful Colab Extensions

Colab also comes with a lot of really useful extensions. One such extension allows us to mount our Google Drive to our working directory. We can do this using:

Then, Colab will request permission to access your Google Drive files, which you can do after selecting which Google account you want to give it access to. After giving it the required permissions, we can see our Google Drive mounted in the Files tab on the left-hand side.

Google Drive Mounted onto Current Working Directory of Google Colab Notebook

Then, to write a file to our Google Drive, we can do the following:

This code snippet writes Hello World! to a test.txt file in the top level of your Google Drive. Similarly, we can read from a file in our Google Drive as well by using:

This outputs:

It is based on our earlier example.

Furthermore, Google Colab comes with some extensions to create a better experience using a notebook. If we use pandas DataFrame a lot, there is an extension to display interactive tables. To use this, we can use magic functions:

This enables the interactive display for DataFrames, then when we run:

This will show you the DataFrame as an interactive table, where we can filter based on columns, see the different rows in the table, etc.

Interactive Interface for pandas DataFrame in Google Colab

To disable this feature later on, we can run:

When we display the same DataFrame X again, we get the standard Pandas DataFrame interface:

Standard Interface for pandas DataFrame

Example: Saving Model Progress on Google Drive

Google Colab is probably the easiest way to give us powerful GPU resources for your machine learning project. But in the free version of Colab, Google limits the time we can use our Colab notebook in each session. Our kernel may terminate for no reason. We can restart our notebook and continue our work, but we may lose everything in the memory. This is a problem if we need to train our model for a long time. Our Colab instance may terminate before the training is completed.

Using the Google Colab extension to mount our Google Drive and Keras ModelCheckpoint callback, we can save our model progress on Google Drive. This is particularly useful to work around Colab timeouts. It is more lenient for paid Pro and Pro+ users, but there is always a chance that our model training terminates midway at random times. It is valuable if we don’t want to lose our partially trained model.

For this demonstration, we’ll use the LeNet-5 model on the MNIST dataset.

Then, to save model progress during training on Google Drive, first, we need to mount our Google Drive onto our Colab environment.

Afterward, we declare the Callback to save our checkpoint model to Google Drive.

Next, we begin training on the MNIST dataset with the checkpoint callbacks to ensure we can resume at the last epoch should our Colab session time out:

This trains our model and gives the output:

And from the output, we can see that the checkpoints have been saved. Looking at my Google Drive folder, we can also see the checkpoints stored there.

Checkpoints Stored in Google Drive

Colab instance is on Google’s cloud environment. The machine it is running has some storage, so we can install a package or download some files into it. However, we should not save our checkpoint there because we have no guarantee we will get it back after our session is terminated. Therefore, in the above, we mount our Google Drive into the instance and save the checkpoint in our Google Drive. This is how we can be assured the checkpoint files are accessible.

Here we attach the full code for the model training and saving to Google Drive:

If model training stops midway, we can continue by just recompiling the model and loading the weights, and then we can continue our training:

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Articles

Summary

In this tutorial, you have learned what Google Colab is, how to leverage Google Colab to get free access to GPUs using its free tier, how to use Google Colab with your Google Drive account, and how to save models to store model progress during training on Google Drive in a Google Colab notebook.

Specifically, you learned:

  • What is Google Colab, and how to start using it
  • How to explore your Google Colab notebook’s environment using bash commands with !
  • Useful extensions that come with Google Colab
  • Saving model progress during training to Google Drive

2 Responses to Google Colab for Machine Learning Projects

  1. Radek April 29, 2022 at 7:41 pm #

    Thanks for a great introduction to using the google colab capabilities, especially with GPU usage. But I’m not sure if the last code actually uses the registered checkpoints. From my tests, it seems that the whole training procedure was implemented from the beginning and not from 5 epochs. Am I wrong on this?

    • Zhe Ming Chng May 1, 2022 at 5:02 am #

      Hi Radek, thanks for your comment! If you load the weights, the training procedure is done starting with the trained model where it stopped (with the trained weights), it will start the epochs counting from 1 but the model should be loaded with trained weights (you can verify the model is trained by using model.evaluate to see the model’s performance before you continue training). To store the full model state, I believe you will need to use the save_weights_only = False argument in ModelCheckpoint to save the full model.

Leave a Reply