Using Optimizers from PyTorch

Last Updated on December 7, 2022

Optimization is a process where we try to find the best possible set of parameters for a deep learning model. Optimizers generate new parameter values and evaluate them using some criterion to determine the best option. Being an important part of neural network architecture, optimizers help in determining best weights, biases or other hyper-parameters that will result in the desired output.

There are many kinds of optimizers available in PyTorch, each with its own strengths and weaknesses. These include Adagrad, Adam, RMSProp and so on.

In the previous tutorials, we implemented all necessary steps of an optimizer to update the weights and biases during training. Here, you’ll learn about some PyTorch packages that make the implementation of the optimizers even easier. Particularly, you’ll learn:

  • How optimizers can be implemented using some packages in PyTorch.
  • How you can import linear class and loss function from PyTorch’s ‘nn’ package.
  • How Stochastic Gradient Descent and Adam (most commonly used optimizer) can be implemented using ‘optim’ package in PyTorch.
  • How you can customize weights and biases of the model.

Note that we’ll use the same implementation steps in our subsequent tutorials of our PyTorch series.

Let’s get started.

Using Optimizers from PyTorch.
Picture by Jean-Daniel Calame. Some rights reserved.

Overview

This tutorial is in five parts; they are

  • Preparing Data
  • Build the Model and Loss Function
  • Train a Model with Stochastic Gradient Descent
  • Train a Model with Adam Optimizer
  • Plotting Graphs

Preparing Data

Let’s start by importing the libraries we’ll use in this tutorial.

We will use a custom data class. The data is a line with values from $-5$ to $5$ having slope and bias of $-5$ and $1$ respectively. Also, we’ll add the noise with same values as x and train our model to estimate this line.

Now let’s use it to create our dataset object and plot the data.

Data from the custom dataset object

Putting everything together, the following is the complete code to create the plot:

Build the Model and Loss Function

In the previous tutorials, we created some functions for our linear regression model and loss function. PyTorch allows us to do just that with only a few lines of code. Here’s how we’ll import our built-in linear regression model and its loss criterion from PyTorch’s nn package.

The model parameters are randomized at creation. We can verify this with the following:

which prints

While PyTorch will randomly initialize the model parameters, we can also customize them to use our own. We can set our weights and bias as follows. Note that we rarely need to do this in practice.

Before we start the training, let’s create a DataLoader object to load our dataset into the pipeline.

Train a Model with Stochastic Gradient Descent

To use the optimizer of our choice, we can import the optim package from PyTorch. It includes several state-of-the-art parameter optimization algorithms that can be implemented with only a single line of code. As an example, stochastic gradient descent (SGD) is available as follows.

As an input, we provided model.parameters() to the constructor to denote what to optimize. We also defined the step size or learning rate (lr).

To help visualize the optimizer’s progress later, we create an empty list to store the loss and let our model train for 20 epochs.

In above, we feed the data samples into the model for prediction and calculate the loss. Gradients are computed during the backward pass, and parameters are optimized. While in previous sessions we used some extra lines of code to update the parameters and zero the gradients, PyTorch features zero_grad() and step() methods from the optimizer to make the process concise.

You may increase the batch_size argument in the DataLoader object above for mini-batch gradient descent.

Together, the complete code is as follows:

Train the Model with Adam Optimizer

Adam is one of the most used optimizers for training deep learning models. It is fast and quite efficient when you have a lot of data for training. Adam is an optimizer with momentum that can perform better than SGD when the model is complex, as in most cases of deep learning.

In PyTorch, replacing the SGD optimizer above with Adam optimizer is as simple as follows. While all other steps would be the same, we only need to replace SGD() method with Adam() to implement the algorithm.

Similarly, we’ll define number of iterations and an empty list to store the model loss. Then we can run our training.

Putting everything together, the following is the complete code.

Plotting Graphs

We have successfully implemented the SGD and Adam optimizers for model training. Let’s visualize how the model loss decreases in both algorithms during training process, which are stored in the lists loss_SGD and loss_Adam:

You can see that SGD converges faster than Adam in the above examples. This is because we are training a linear regression model, in which the algorithm provided by Adam is overkilled.

Putting everything together, the following is the complete code.

Summary

In this tutorial, you implemented optimization algorithms using some built-in packages in PyTorch. Particularly, you learned:

  • How optimizers can be implemented using some packages in PyTorch.
  • How you can import linear class and loss function from PyTorch’s nn package.
  • How Stochastic Gradient Descent and Adam (the most commonly used optimizer) can be implemented using optim package in PyTorch.
  • How you can customize weights and biases of the model.

One Response to Using Optimizers from PyTorch

  1. Slava Kostin January 2, 2023 at 1:19 pm #

    There’s a mistake in the code – only 1 optimizer (Adam) is used for both loops.
    In the full listing there’s no line:
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

    So final graph – doesn’t look like the picture above

Leave a Reply