Activation Functions in PyTorch

By Muhammad Asad Iqbal Khan on May 3, 2023 in Deep Learning with PyTorch 2

As neural networks become increasingly popular in the field of machine learning, it is important to understand the role that activation functions play in their implementation. In this article, you’ll explore the concept of activation functions that are applied to the output of each neuron in a neural network to introduce non-linearity into the model. Without activation functions, neural networks would simply be a series of linear transformations, which would limit their ability to learn complex patterns and relationships in data.

PyTorch offers a variety of activation functions, each with its own unique properties and use cases. Some common activation functions in PyTorch include ReLU, sigmoid, and tanh. Choosing the right activation function for a particular problem can be an important consideration for achieving optimal performance in a neural network. You will see how to train a neural network in PyTorch with different activation functions and analyze their performance.

In this tutorial, you’ll learn:

About various activation functions that are used in neural network architectures.
How activation functions can be implemented in PyTorch.
How activation functions actually compare with each other in a real problem.

Let’s get started.

Activation Functions in PyTorch
Image generated by Adrian Tam using stable diffusion. Some rights reserved.

Overview

This tutorial is divided into four parts; they are:

Logistic activation function
Tanh activation function
ReLU activation function
Exploring activation functions in a neural network

Logistic Activation Function

You’ll start with the logistic function which is a commonly used activation function in neural networks and also known as the sigmoid function. It takes any input and maps it to a value between 0 and 1, which can be interpreted as a probability. This makes it particularly useful for binary classification tasks, where the network needs to predict the probability of an input belonging to one of two classes.

One of the main advantages of the logistic function is that it is differentiable, which means that it can be used in backpropagation algorithms to train the neural network. Additionally, it has a smooth gradient, which can help avoid issues such as exploding gradients. However, it can also introduce vanishing gradients during training.

Now, let’s apply logistic function on a tensor using PyTorch and draw it to see how it looks like.

# importing the libraries
import torch
import matplotlib.pyplot as plt

# create a PyTorch tensor
x = torch.linspace(-10, 10, 100)

# apply the logistic activation function to the tensor
y = torch.sigmoid(x)

# plot the results with a custom color
plt.plot(x.numpy(), y.numpy(), color='purple')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Logistic Activation Function')
plt.show()

# importing the libraries

import torch

import matplotlib.pyplot as plt

# create a PyTorch tensor

x = torch.linspace(-10, 10, 100)

# apply the logistic activation function to the tensor

y = torch.sigmoid(x)

# plot the results with a custom color

plt.plot(x.numpy(), y.numpy(), color='purple')

plt.xlabel('Input')

plt.ylabel('Output')

plt.title('Logistic Activation Function')

plt.show()

In the example above, you have used the torch.sigmoid() function from the Pytorch library to apply the logistic activation function to a tensor x. You have used the matplotlib library to create the plot with a custom color.

Tanh Activation Function

Next, you will investigate the tanh activation function which outputs values between $-1$ and $1$, with a mean output of 0. This can help ensure that the output of a neural network layer remains centered around 0, making it useful for normalization purposes. Tanh is a smooth and continuous activation function, which makes it easier to optimize during the process of gradient descent.

Like the logistic activation function, the tanh function can be susceptible to the vanishing gradient problem, especially for deep neural networks with many layers. This is because the slope of the function becomes very small for large or small input values, making it difficult for gradients to propagate through the network.

Also, due to the use of exponential functions, tanh can be computationally expensive, especially for large tensors or when used in deep neural networks with many layers.

Here is how to apply tanh on a tensor and visualize it.

# apply the tanh activation function to the tensor
y = torch.tanh(x)

# plot the results with a custom color
plt.plot(x.numpy(), y.numpy(), color='blue')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Tanh Activation Function')
plt.show()

# apply the tanh activation function to the tensor

y = torch.tanh(x)

# plot the results with a custom color

plt.plot(x.numpy(), y.numpy(), color='blue')

plt.xlabel('Input')

plt.ylabel('Output')

plt.title('Tanh Activation Function')

plt.show()

ReLU Activation Function

ReLU (Rectified Linear Unit) is another commonly used activation function in neural networks. Unlike the sigmoid and tanh functions, ReLU is a non-saturating function, which means that it does not become flat at the extremes of the input range. Instead, ReLU simply outputs the input value if it is positive, or 0 if it is negative.

This simple, piecewise linear function has several advantages over sigmoid and tanh activation functions. First, it is computationally more efficient, making it well-suited for large-scale neural networks. Second, ReLU has been shown to be less susceptible to the vanishing gradient problem, as it does not have a flattened slope. Plus, ReLU can help sparsify the activation of neurons in a network, which can lead to better generalization.

Here’s an example of how to apply the ReLU activation function to a PyTorch tensor x and plot the results.

# apply the ReLU activation function to the tensor
y = torch.relu(x)

# plot the results with a custom color
plt.plot(x.numpy(), y.numpy(), color='green')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('ReLU Activation Function')
plt.show()

# apply the ReLU activation function to the tensor

y = torch.relu(x)

# plot the results with a custom color

plt.plot(x.numpy(), y.numpy(), color='green')

plt.xlabel('Input')

plt.ylabel('Output')

plt.title('ReLU Activation Function')

plt.show()

Below is the complete code to print all the activation functions discussed above.

# importing the libraries
import torch
import matplotlib.pyplot as plt

# create a PyTorch tensor
x = torch.linspace(-10, 10, 100)

# apply the logistic activation function to the tensor and plot
y = torch.sigmoid(x)
plt.plot(x.numpy(), y.numpy(), color='purple')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Logistic Activation Function')
plt.show()

# apply the tanh activation function to the tensor and plot
y = torch.tanh(x)
plt.plot(x.numpy(), y.numpy(), color='blue')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Tanh Activation Function')
plt.show()

# apply the ReLU activation function to the tensor and plot
y = torch.relu(x)
plt.plot(x.numpy(), y.numpy(), color='green')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('ReLU Activation Function')
plt.show()

# importing the libraries

import torch

import matplotlib.pyplot as plt

# create a PyTorch tensor

x = torch.linspace(-10, 10, 100)

# apply the logistic activation function to the tensor and plot

y = torch.sigmoid(x)

plt.plot(x.numpy(), y.numpy(), color='purple')

plt.xlabel('Input')

plt.ylabel('Output')

plt.title('Logistic Activation Function')

plt.show()

# apply the tanh activation function to the tensor and plot

y = torch.tanh(x)

plt.plot(x.numpy(), y.numpy(), color='blue')

plt.xlabel('Input')

plt.ylabel('Output')

plt.title('Tanh Activation Function')

plt.show()

# apply the ReLU activation function to the tensor and plot

y = torch.relu(x)

plt.plot(x.numpy(), y.numpy(), color='green')

plt.xlabel('Input')

plt.ylabel('Output')

plt.title('ReLU Activation Function')

plt.show()

Exploring Activation Functions in a Neural Network

Activation functions play a vital role in the training of deep learning models, as they introduce non-linearity into the network, enabling it to learn complex patterns.

Let’s take the popular MNIST dataset, which consists of 70000 grayscale images in 28×28 pixels of handwritten digits. You’ll create a simple feedforward neural network to classify these digits, and experiment with different activation functions like ReLU, Sigmoid, Tanh, and Leaky ReLU.

import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Load the MNIST dataset
transform = transforms.ToTensor()
train_dataset = datasets.MNIST(root='data/', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='data/', train=False, transform=transform, download=True)

train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

import torchvision.datasets as datasets

import torchvision.transforms as transforms

from torch.utils.data import DataLoader

# Load the MNIST dataset

transform = transforms.ToTensor()

train_dataset = datasets.MNIST(root='data/', train=True, transform=transform, download=True)

test_dataset = datasets.MNIST(root='data/', train=False, transform=transform, download=True)

train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)

test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

Let’s create a NeuralNetwork class that inherits from nn.Module. This class has three linear layers and an activation function as an input parameter. The forward method defines the forward pass of the network, applying the activation function after each linear layer except the last one.

import torch
import torch.nn as nn
import torch.optim as optim

class NeuralNetwork(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes, activation_function):
        super(NeuralNetwork, self).__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.layer2 = nn.Linear(hidden_size, hidden_size)
        self.layer3 = nn.Linear(hidden_size, num_classes)
        self.activation_function = activation_function

    def forward(self, x):
        x = self.activation_function(self.layer1(x))
        x = self.activation_function(self.layer2(x))
        x = self.layer3(x)
        return x

import torch

import torch.nn as nn

import torch.optim as optim

class NeuralNetwork(nn.Module):

def __init__(self, input_size, hidden_size, num_classes, activation_function):

super(NeuralNetwork, self).__init__()

self.layer1 = nn.Linear(input_size, hidden_size)

self.layer2 = nn.Linear(hidden_size, hidden_size)

self.layer3 = nn.Linear(hidden_size, num_classes)

self.activation_function = activation_function

def forward(self, x):

x = self.activation_function(self.layer1(x))

x = self.activation_function(self.layer2(x))

x = self.layer3(x)

return x

You’ve added an activation_function parameter to the NeuralNetwork class, which allows you to plug in any activation function you’d like to experiment with.

Training and Testing the Model with Different Activation Functions

Let’s create functions to help the training. The train() function trains the network for one epoch. It iterates through the training data loader, computes the loss, and performs backpropagation and optimization. The test() function evaluates the network on the test dataset, computing the test loss and accuracy.

def train(network, data_loader, criterion, optimizer, device):
    network.train()
    running_loss = 0.0

    for data, target in data_loader:
        data, target = data.to(device), target.to(device)
        data = data.view(data.shape[0], -1)

        optimizer.zero_grad()
        output = network(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * data.size(0)

    return running_loss / len(data_loader.dataset)

def test(network, data_loader, criterion, device):
    network.eval()
    correct = 0
    total = 0
    test_loss = 0.0

    with torch.no_grad():
        for data, target in data_loader:
            data, target = data.to(device), target.to(device)
            data = data.view(data.shape[0], -1)

            output = network(data)
            loss = criterion(output, target)
            test_loss += loss.item() * data.size(0)
            _, predicted = torch.max(output.data, 1)
            total += target.size(0)
            correct += (predicted == target).sum().item()

    return test_loss / len(data_loader.dataset), 100 * correct / total

def train(network, data_loader, criterion, optimizer, device):

network.train()

running_loss = 0.0

for data, target in data_loader:

data, target = data.to(device), target.to(device)

data = data.view(data.shape[0], -1)

optimizer.zero_grad()

output = network(data)

loss = criterion(output, target)

loss.backward()

optimizer.step()

running_loss += loss.item() * data.size(0)

return running_loss / len(data_loader.dataset)

def test(network, data_loader, criterion, device):

network.eval()

correct = 0

total = 0

test_loss = 0.0

with torch.no_grad():

for data, target in data_loader:

data, target = data.to(device), target.to(device)

data = data.view(data.shape[0], -1)

output = network(data)

loss = criterion(output, target)

test_loss += loss.item() * data.size(0)

_, predicted = torch.max(output.data, 1)

total += target.size(0)

correct += (predicted == target).sum().item()

return test_loss / len(data_loader.dataset), 100 * correct / total

To compare them, let’s create a dictionary of activation functions and iterate over them. For each activation function, you instantiate the NeuralNetwork class, define the criterion (CrossEntropyLoss), and set up the optimizer (Adam). Then, train the model for a specified number of epochs, calling the train() and test() functions in each epoch to evaluate the model’s performance. You store the training loss, testing loss, and testing accuracy for each epoch in the results dictionary.

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
input_size = 784
hidden_size = 128
num_classes = 10
num_epochs = 10
learning_rate = 0.001

activation_functions = {
    'ReLU': nn.ReLU(),
    'Sigmoid': nn.Sigmoid(),
    'Tanh': nn.Tanh(),
    'LeakyReLU': nn.LeakyReLU()
}

results = {}

# Train and test the model with different activation functions
for name, activation_function in activation_functions.items():
    print(f"Training with {name} activation function...")

    model = NeuralNetwork(input_size, hidden_size, num_classes, activation_function).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    train_loss_history = []
    test_loss_history = []
    test_accuracy_history = []

    for epoch in range(num_epochs):
        train_loss = train(model, train_loader, criterion, optimizer, device)
        test_loss, test_accuracy = test(model, test_loader, criterion, device)

        train_loss_history.append(train_loss)
        test_loss_history.append(test_loss)
        test_accuracy_history.append(test_accuracy)

        print(f"Epoch [{epoch+1}/{num_epochs}], Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.2f}%")

    results[name] = {
        'train_loss_history': train_loss_history,
        'test_loss_history': test_loss_history,
        'test_accuracy_history': test_accuracy_history
    }

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

input_size = 784

hidden_size = 128

num_classes = 10

num_epochs = 10

learning_rate = 0.001

activation_functions = {

'ReLU': nn.ReLU(),

'Sigmoid': nn.Sigmoid(),

'Tanh': nn.Tanh(),

'LeakyReLU': nn.LeakyReLU()

}

results = {}

# Train and test the model with different activation functions

for name, activation_function in activation_functions.items():

print(f"Training with {name} activation function...")

model = NeuralNetwork(input_size, hidden_size, num_classes, activation_function).to(device)

criterion = nn.CrossEntropyLoss()

optimizer = optim.Adam(model.parameters(), lr=learning_rate)

train_loss_history = []

test_loss_history = []

test_accuracy_history = []

for epoch in range(num_epochs):

train_loss = train(model, train_loader, criterion, optimizer, device)

test_loss, test_accuracy = test(model, test_loader, criterion, device)

train_loss_history.append(train_loss)

test_loss_history.append(test_loss)

test_accuracy_history.append(test_accuracy)

print(f"Epoch [{epoch+1}/{num_epochs}], Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.2f}%")

results[name] = {

'train_loss_history': train_loss_history,

'test_loss_history': test_loss_history,

'test_accuracy_history': test_accuracy_history

}

When you run the above, it prints:

Training with ReLU activation function...
Epoch [1/10], Test Loss: 0.1589, Test Accuracy: 95.02%
Epoch [2/10], Test Loss: 0.1138, Test Accuracy: 96.52%
Epoch [3/10], Test Loss: 0.0886, Test Accuracy: 97.15%
Epoch [4/10], Test Loss: 0.0818, Test Accuracy: 97.50%
Epoch [5/10], Test Loss: 0.0783, Test Accuracy: 97.47%
Epoch [6/10], Test Loss: 0.0754, Test Accuracy: 97.80%
Epoch [7/10], Test Loss: 0.0832, Test Accuracy: 97.56%
Epoch [8/10], Test Loss: 0.0783, Test Accuracy: 97.78%
Epoch [9/10], Test Loss: 0.0789, Test Accuracy: 97.75%
Epoch [10/10], Test Loss: 0.0735, Test Accuracy: 97.99%
Training with Sigmoid activation function...
Epoch [1/10], Test Loss: 0.2420, Test Accuracy: 92.81%
Epoch [2/10], Test Loss: 0.1718, Test Accuracy: 94.99%
Epoch [3/10], Test Loss: 0.1339, Test Accuracy: 96.06%
Epoch [4/10], Test Loss: 0.1141, Test Accuracy: 96.42%
Epoch [5/10], Test Loss: 0.1004, Test Accuracy: 97.00%
Epoch [6/10], Test Loss: 0.0909, Test Accuracy: 97.10%
Epoch [7/10], Test Loss: 0.0846, Test Accuracy: 97.28%
Epoch [8/10], Test Loss: 0.0797, Test Accuracy: 97.42%
Epoch [9/10], Test Loss: 0.0785, Test Accuracy: 97.58%
Epoch [10/10], Test Loss: 0.0795, Test Accuracy: 97.58%
Training with Tanh activation function...
Epoch [1/10], Test Loss: 0.1660, Test Accuracy: 95.17%
Epoch [2/10], Test Loss: 0.1152, Test Accuracy: 96.47%
Epoch [3/10], Test Loss: 0.1057, Test Accuracy: 96.86%
Epoch [4/10], Test Loss: 0.0865, Test Accuracy: 97.21%
Epoch [5/10], Test Loss: 0.0760, Test Accuracy: 97.61%
Epoch [6/10], Test Loss: 0.0856, Test Accuracy: 97.23%
Epoch [7/10], Test Loss: 0.0735, Test Accuracy: 97.66%
Epoch [8/10], Test Loss: 0.0790, Test Accuracy: 97.67%
Epoch [9/10], Test Loss: 0.0805, Test Accuracy: 97.47%
Epoch [10/10], Test Loss: 0.0834, Test Accuracy: 97.82%
Training with LeakyReLU activation function...
Epoch [1/10], Test Loss: 0.1587, Test Accuracy: 95.14%
Epoch [2/10], Test Loss: 0.1084, Test Accuracy: 96.37%
Epoch [3/10], Test Loss: 0.0861, Test Accuracy: 97.22%
Epoch [4/10], Test Loss: 0.0883, Test Accuracy: 97.06%
Epoch [5/10], Test Loss: 0.0870, Test Accuracy: 97.37%
Epoch [6/10], Test Loss: 0.0929, Test Accuracy: 97.26%
Epoch [7/10], Test Loss: 0.0824, Test Accuracy: 97.54%
Epoch [8/10], Test Loss: 0.0785, Test Accuracy: 97.77%
Epoch [9/10], Test Loss: 0.0908, Test Accuracy: 97.92%
Epoch [10/10], Test Loss: 0.1012, Test Accuracy: 97.76%

Training with ReLU activation function...

Epoch [1/10], Test Loss: 0.1589, Test Accuracy: 95.02%

Epoch [2/10], Test Loss: 0.1138, Test Accuracy: 96.52%

Epoch [3/10], Test Loss: 0.0886, Test Accuracy: 97.15%

Epoch [4/10], Test Loss: 0.0818, Test Accuracy: 97.50%

Epoch [5/10], Test Loss: 0.0783, Test Accuracy: 97.47%

Epoch [6/10], Test Loss: 0.0754, Test Accuracy: 97.80%

Epoch [7/10], Test Loss: 0.0832, Test Accuracy: 97.56%

Epoch [8/10], Test Loss: 0.0783, Test Accuracy: 97.78%

Epoch [9/10], Test Loss: 0.0789, Test Accuracy: 97.75%

Epoch [10/10], Test Loss: 0.0735, Test Accuracy: 97.99%

Training with Sigmoid activation function...

Epoch [1/10], Test Loss: 0.2420, Test Accuracy: 92.81%

Epoch [2/10], Test Loss: 0.1718, Test Accuracy: 94.99%

Epoch [3/10], Test Loss: 0.1339, Test Accuracy: 96.06%

Epoch [4/10], Test Loss: 0.1141, Test Accuracy: 96.42%

Epoch [5/10], Test Loss: 0.1004, Test Accuracy: 97.00%

Epoch [6/10], Test Loss: 0.0909, Test Accuracy: 97.10%

Epoch [7/10], Test Loss: 0.0846, Test Accuracy: 97.28%

Epoch [8/10], Test Loss: 0.0797, Test Accuracy: 97.42%

Epoch [9/10], Test Loss: 0.0785, Test Accuracy: 97.58%

Epoch [10/10], Test Loss: 0.0795, Test Accuracy: 97.58%

Training with Tanh activation function...

Epoch [1/10], Test Loss: 0.1660, Test Accuracy: 95.17%

Epoch [2/10], Test Loss: 0.1152, Test Accuracy: 96.47%

Epoch [3/10], Test Loss: 0.1057, Test Accuracy: 96.86%

Epoch [4/10], Test Loss: 0.0865, Test Accuracy: 97.21%

Epoch [5/10], Test Loss: 0.0760, Test Accuracy: 97.61%

Epoch [6/10], Test Loss: 0.0856, Test Accuracy: 97.23%

Epoch [7/10], Test Loss: 0.0735, Test Accuracy: 97.66%

Epoch [8/10], Test Loss: 0.0790, Test Accuracy: 97.67%

Epoch [9/10], Test Loss: 0.0805, Test Accuracy: 97.47%

Epoch [10/10], Test Loss: 0.0834, Test Accuracy: 97.82%

Training with LeakyReLU activation function...

Epoch [1/10], Test Loss: 0.1587, Test Accuracy: 95.14%

Epoch [2/10], Test Loss: 0.1084, Test Accuracy: 96.37%

Epoch [3/10], Test Loss: 0.0861, Test Accuracy: 97.22%

Epoch [4/10], Test Loss: 0.0883, Test Accuracy: 97.06%

Epoch [5/10], Test Loss: 0.0870, Test Accuracy: 97.37%

Epoch [6/10], Test Loss: 0.0929, Test Accuracy: 97.26%

Epoch [7/10], Test Loss: 0.0824, Test Accuracy: 97.54%

Epoch [8/10], Test Loss: 0.0785, Test Accuracy: 97.77%

Epoch [9/10], Test Loss: 0.0908, Test Accuracy: 97.92%

Epoch [10/10], Test Loss: 0.1012, Test Accuracy: 97.76%

You may use Matplotlib to create plots comparing the performance of each activation function. You can create three separate plots to visualize the training loss, testing loss, and testing accuracy for each activation function over the epochs.

import matplotlib.pyplot as plt

# Plot the training loss
plt.figure()
for name, data in results.items():
    plt.plot(data['train_loss_history'], label=name)
plt.xlabel('Epoch')
plt.ylabel('Training Loss')
plt.legend()
plt.show()

# Plot the testing loss
plt.figure()
for name, data in results.items():
    plt.plot(data['test_loss_history'], label=name)
plt.xlabel('Epoch')
plt.ylabel('Testing Loss')
plt.legend()
plt.show()

# Plot the testing accuracy
plt.figure()
for name, data in results.items():
    plt.plot(data['test_accuracy_history'], label=name)
plt.xlabel('Epoch')
plt.ylabel('Testing Accuracy')
plt.legend()
plt.show()

import matplotlib.pyplot as plt

# Plot the training loss

plt.figure()

for name, data in results.items():

plt.plot(data['train_loss_history'], label=name)

plt.xlabel('Epoch')

plt.ylabel('Training Loss')

plt.legend()

plt.show()

# Plot the testing loss

plt.figure()

for name, data in results.items():

plt.plot(data['test_loss_history'], label=name)

plt.xlabel('Epoch')

plt.ylabel('Testing Loss')

plt.legend()

plt.show()

# Plot the testing accuracy

plt.figure()

for name, data in results.items():

plt.plot(data['test_accuracy_history'], label=name)

plt.xlabel('Epoch')

plt.ylabel('Testing Accuracy')

plt.legend()

plt.show()

These plots provide a visual comparison of the performance of each activation function. By analyzing the results, you can determine which activation function works best for the specific task and dataset used in this example.

Summary

In this tutorial, you have implemented some of the most popular activation functions in PyTorch. You also saw how to train a neural network in PyTorch with different activation functions, using the popular MNIST dataset. You explored ReLU, Sigmoid, Tanh, and Leaky ReLU activation functions and analyzed their performance by plotting the training loss, testing loss, and testing accuracy.
As you can see, the choice of activation function plays an essential role in model performance. However, keep in mind that the optimal activation function may vary depending on the task and dataset.

2 Responses to Activation Functions in PyTorch

Yeganekh May 31, 2023 at 4:47 am #

Thank you for this tutorial. Can you please post something about sequence-to-sequence LSTM models in PyTorch as well?

- James Carmichael May 31, 2023 at 9:15 am #
  
  You are very welcome Yeganekh! We appreciate the suggestion! The following may be of interest:
  
  https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

Navigation

Activation Functions in PyTorch

Overview

Logistic Activation Function

Tanh Activation Function

ReLU Activation Function

Exploring Activation Functions in a Neural Network

Training and Testing the Model with Different Activation Functions

Summary

Get Started on Deep Learning with PyTorch!

Learn how to build deep learning models

Kick-start your deep learning journey with hands-on exercises

More On This Topic

2 Responses to Activation Functions in PyTorch

Leave a Reply Click here to cancel reply.