Activation Functions in PyTorch

As neural networks become increasingly popular in the field of machine learning, it is important to understand the role that activation functions play in their implementation. In this article, you’ll explore the concept of activation functions that are applied to the output of each neuron in a neural network to introduce non-linearity into the model. Without activation functions, neural networks would simply be a series of linear transformations, which would limit their ability to learn complex patterns and relationships in data.

PyTorch offers a variety of activation functions, each with its own unique properties and use cases. Some common activation functions in PyTorch include ReLU, sigmoid, and tanh. Choosing the right activation function for a particular problem can be an important consideration for achieving optimal performance in a neural network. You will see how to train a neural network in PyTorch with different activation functions and analyze their performance.

In this tutorial, you’ll learn:

  • About various activation functions that are used in neural network architectures.
  • How activation functions can be implemented in PyTorch.
  • How activation functions actually compare with each other in a real problem.

Let’s get started.

Activation Functions in PyTorch
Image generated by Adrian Tam using stable diffusion. Some rights reserved.


This tutorial is divided into four parts; they are:

  • Logistic activation function
  • Tanh activation function
  • ReLU activation function
  • Exploring activation functions in a neural network

Logistic Activation Function

You’ll start with the logistic function which is a commonly used activation function in neural networks and also known as the sigmoid function. It takes any input and maps it to a value between 0 and 1, which can be interpreted as a probability. This makes it particularly useful for binary classification tasks, where the network needs to predict the probability of an input belonging to one of two classes.

One of the main advantages of the logistic function is that it is differentiable, which means that it can be used in backpropagation algorithms to train the neural network. Additionally, it has a smooth gradient, which can help avoid issues such as exploding gradients. However, it can also introduce vanishing gradients during training.

Now, let’s apply logistic function on a tensor using PyTorch and draw it to see how it looks like.

In the example above, you have used the torch.sigmoid() function from the Pytorch library to apply the logistic activation function to a tensor x. You have used the matplotlib library to create the plot with a custom color.

Tanh Activation Function

Next, you will investigate the tanh activation function which outputs values between $-1$ and $1$, with a mean output of 0. This can help ensure that the output of a neural network layer remains centered around 0, making it useful for normalization purposes. Tanh is a smooth and continuous activation function, which makes it easier to optimize during the process of gradient descent.

Like the logistic activation function, the tanh function can be susceptible to the vanishing gradient problem, especially for deep neural networks with many layers. This is because the slope of the function becomes very small for large or small input values, making it difficult for gradients to propagate through the network.

Also, due to the use of exponential functions, tanh can be computationally expensive, especially for large tensors or when used in deep neural networks with many layers.

Here is how to apply tanh on a tensor and visualize it.

ReLU Activation Function

ReLU (Rectified Linear Unit) is another commonly used activation function in neural networks. Unlike the sigmoid and tanh functions, ReLU is a non-saturating function, which means that it does not become flat at the extremes of the input range. Instead, ReLU simply outputs the input value if it is positive, or 0 if it is negative.

This simple, piecewise linear function has several advantages over sigmoid and tanh activation functions. First, it is computationally more efficient, making it well-suited for large-scale neural networks. Second, ReLU has been shown to be less susceptible to the vanishing gradient problem, as it does not have a flattened slope. Plus, ReLU can help sparsify the activation of neurons in a network, which can lead to better generalization.

Here’s an example of how to apply the ReLU activation function to a PyTorch tensor x and plot the results.

Below is the complete code to print all the activation functions discussed above.

Exploring Activation Functions in a Neural Network

Activation functions play a vital role in the training of deep learning models, as they introduce non-linearity into the network, enabling it to learn complex patterns.

Let’s take the popular MNIST dataset, which consists of 70000 grayscale images in 28×28 pixels of handwritten digits. You’ll create a simple feedforward neural network to classify these digits, and experiment with different activation functions like ReLU, Sigmoid, Tanh, and Leaky ReLU.

Let’s create a NeuralNetwork class that inherits from nn.Module. This class has three linear layers and an activation function as an input parameter. The forward method defines the forward pass of the network, applying the activation function after each linear layer except the last one.

You’ve added an activation_function parameter to the NeuralNetwork class, which allows you to plug in any activation function you’d like to experiment with.

Training and Testing the Model with Different Activation Functions

Let’s create functions to help the training. The train() function trains the network for one epoch. It iterates through the training data loader, computes the loss, and performs backpropagation and optimization. The test() function evaluates the network on the test dataset, computing the test loss and accuracy.

To compare them, let’s create a dictionary of activation functions and iterate over them. For each activation function, you instantiate the NeuralNetwork class, define the criterion (CrossEntropyLoss), and set up the optimizer (Adam). Then, train the model for a specified number of epochs, calling the train() and test() functions in each epoch to evaluate the model’s performance. You store the training loss, testing loss, and testing accuracy for each epoch in the results dictionary.

When you run the above, it prints:

You may use Matplotlib to create plots comparing the performance of each activation function. You can create three separate plots to visualize the training loss, testing loss, and testing accuracy for each activation function over the epochs.

These plots provide a visual comparison of the performance of each activation function. By analyzing the results, you can determine which activation function works best for the specific task and dataset used in this example.


In this tutorial, you have implemented some of the most popular activation functions in PyTorch. You also saw how to train a neural network in PyTorch with different activation functions, using the popular MNIST dataset. You explored ReLU, Sigmoid, Tanh, and Leaky ReLU activation functions and analyzed their performance by plotting the training loss, testing loss, and testing accuracy.
As you can see, the choice of activation function plays an essential role in model performance. However, keep in mind that the optimal activation function may vary depending on the task and dataset.

Get Started on Deep Learning with PyTorch!

Deep Learning with PyTorch

Learn how to build deep learning models

...using the newly released PyTorch 2.0 library

Discover how in my new Ebook:
Deep Learning with PyTorch

It provides self-study tutorials with hundreds of working code to turn you from a novice to expert. It equips you with
tensor operation, training, evaluation, hyperparameter optimization, and much more...

Kick-start your deep learning journey with hands-on exercises

See What's Inside

2 Responses to Activation Functions in PyTorch

  1. Avatar
    Yeganekh May 31, 2023 at 4:47 am #

    Thank you for this tutorial. Can you please post something about sequence-to-sequence LSTM models in PyTorch as well?

Leave a Reply