Introduction to Softmax Classifier in PyTorch

By Muhammad Asad Iqbal Khan on April 8, 2023 in Deep Learning with PyTorch 6

While a logistic regression classifier is used for binary class classification, softmax classifier is a supervised learning algorithm which is mostly used when multiple classes are involved.

Softmax classifier works by assigning a probability distribution to each class. The probability distribution of the class with the highest probability is normalized to 1, and all other probabilities are scaled accordingly.

Similarly, a softmax function transforms the output of neurons into a probability distribution over the classes. It has the following properties:

It is related to the logistic sigmoid, which is used in probabilistic modeling and has similar properties.
It takes values between 0 and 1, with 0 corresponding to an impossible event and 1 corresponding to an event that is certain to occur.
The derivative of softmax with respect to input x can be interpreted as predicting how likely it is that a particular class will be selected, given an input x.

In this tutorial, we’ll build a one-dimensional softmax classifier and explore its functionality. Particularly, we’ll learn:

How you can use a Softmax classifier for multiclass classification.
How to build and train a Softmax classifier in PyTorch.
How to analyze the results of the model on test data.

Kick-start your project with my book Deep Learning with PyTorch. It provides self-study tutorials with working code.

Let’s get started.

Introduction to Softmax Classifier in PyTorch.
Picture by Julia Caesar. Some rights reserved.

Overview

This tutorial is in four parts; they are

Preparing Dataset
Load Dataset into DataLoader
Build the Model with nn.Module
Training the Classifier

Preparing Dataset

First, let’s build our dataset class to generate some data samples. Unlike the previous experiments, you will generate data for multiple classes. Then you will train the softmax classifier on these data samples and later use it to make predictions on test data.

In below, we generate data for four classes based on a single input variable:

import torch
from torch.utils.data import Dataset

class toy_data(Dataset):
    "The data for multi-class classification"
    def __init__(self):
        # single input
        self.x = torch.arange(-3, 3, 0.1).view(-1, 1)
        # multi-class output
        self.y = torch.zeros(self.x.shape[0])
        self.y[(self.x > -2.0)[:, 0] * (self.x < 0.0)[:, 0]] = 1 self.y[(self.x >= 0.0)[:, 0] * (self.x < 2.0)[:, 0]] = 2 self.y[(self.x >= 2.0)[:, 0]] = 3
        self.y = self.y.type(torch.LongTensor)
        self.len = self.x.shape[0]

    def __getitem__(self, idx):
        "accessing one element in the dataset by index"
        return self.x[idx], self.y[idx] 

    def __len__(self):
        "size of the entire dataset"
        return self.len

import torch

from torch.utils.data import Dataset

class toy_data(Dataset):

"The data for multi-class classification"

def __init__(self):

# single input

self.x = torch.arange(-3, 3, 0.1).view(-1, 1)

# multi-class output

self.y = torch.zeros(self.x.shape[0])

self.y[(self.x > -2.0)[:, 0] * (self.x < 0.0)[:, 0]] = 1 self.y[(self.x >= 0.0)[:, 0] * (self.x < 2.0)[:, 0]] = 2 self.y[(self.x >= 2.0)[:, 0]] = 3

self.y = self.y.type(torch.LongTensor)

self.len = self.x.shape[0]

def __getitem__(self, idx):

"accessing one element in the dataset by index"

return self.x[idx], self.y[idx]

def __len__(self):

"size of the entire dataset"

return self.len

Let’s create the data object and check the first ten data samples and their labels.

# Create the dataset object and check a few samples
data = toy_data()
print("first ten data samples: ", data.x[0:10])
print("first ten data labels: ", data.y[0:10])

# Create the dataset object and check a few samples

data = toy_data()

print("first ten data samples: ", data.x[0:10])

print("first ten data labels: ", data.y[0:10])

This prints:

first ten data samples:  tensor([[-3.0000],
        [-2.9000],
        [-2.8000],
        [-2.7000],
        [-2.6000],
        [-2.5000],
        [-2.4000],
        [-2.3000],
        [-2.2000],
        [-2.1000]])
first ten data labels:  tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

first ten data samples: tensor([[-3.0000],

[-2.9000],

[-2.8000],

[-2.7000],

[-2.6000],

[-2.5000],

[-2.4000],

[-2.3000],

[-2.2000],

[-2.1000]])

first ten data labels: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Building the Softmax Model with `nn.Module`

You will employ nn.Module from PyTorch to build a custom softmax module. It is similar to the custom module you built in previous tutorials for logistic regression. So, what’s the difference here? Previously you used 1 in place of n_ouputs for binary classification, while here we’ll define four classes for multi-class classification. Secondly, in the forward() function, the model doesn’t use logistic function for prediction.

class Softmax(torch.nn.Module):
    "custom softmax module"
    def __init__(self, n_inputs, n_outputs):
        super().__init__()
        self.linear = torch.nn.Linear(n_inputs, n_outputs)

    def forward(self, x):
        pred = self.linear(x)
        return pred

class Softmax(torch.nn.Module):

"custom softmax module"

def __init__(self, n_inputs, n_outputs):

super().__init__()

self.linear = torch.nn.Linear(n_inputs, n_outputs)

def forward(self, x):

pred = self.linear(x)

return pred

Now, let’s create the model object. It takes a one-dimensional vector as input and predicts for four different classes. Let’s also check how parameters are initialized.

# call Softmax Classifier
model_softmax = Softmax(1, 4)
model_softmax.state_dict()

# call Softmax Classifier

model_softmax = Softmax(1, 4)

model_softmax.state_dict()

This prints

OrderedDict([('linear.weight',
              tensor([[-0.0075],
                      [ 0.5364],
                      [-0.8230],
                      [-0.7359]])),
             ('linear.bias', tensor([-0.3852,  0.2682, -0.0198,  0.7929]))])

OrderedDict([('linear.weight',

tensor([[-0.0075],

[ 0.5364],

[-0.8230],

[-0.7359]])),

('linear.bias', tensor([-0.3852, 0.2682, -0.0198, 0.7929]))])

Want to Get Started With Deep Learning with PyTorch?

Take my free email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Training the Model

Combined with the stochastic gradient descent, you will use cross entropy loss for model training and set the learning rate at 0.01. You’ll load the data into the data loader and set the batch size to 2.

...
from torch.utils.dataimport DataLoader

# define loss, optimizier, and dataloader
optimizer = torch.optim.SGD(model_softmax.parameters(), lr = 0.01)
criterion = torch.nn.CrossEntropyLoss()
train_loader = DataLoader(dataset = data, batch_size = 2)

...

from torch.utils.dataimport DataLoader

# define loss, optimizier, and dataloader

optimizer = torch.optim.SGD(model_softmax.parameters(), lr = 0.01)

criterion = torch.nn.CrossEntropyLoss()

train_loader = DataLoader(dataset = data, batch_size = 2)

Now that everything is set, let’s train our model for 100 epochs.

# Train the model
Loss = []
epochs = 100
for epoch in range(epochs):
    for x, y in train_loader:
        optimizer.zero_grad()
        y_pred = model_softmax(x)
        loss = criterion(y_pred, y)
        Loss.append(loss)
        loss.backward()
        optimizer.step()
print("Done!")

# Train the model

Loss = []

epochs = 100

for epoch in range(epochs):

for x, y in train_loader:

optimizer.zero_grad()

y_pred = model_softmax(x)

loss = criterion(y_pred, y)

Loss.append(loss)

loss.backward()

optimizer.step()

print("Done!")

After the training loop completed, you call the max() method on the model to make predictions. The argument 1 returns maximum value with respect to axis one, i.e., to return the index of the maximum value from each column.

# Make predictions on test data
pred_model =  model_softmax(data.x)
_, y_pred = pred_model.max(1)
print("model predictions on test data:", y_pred)

# Make predictions on test data

pred_model = model_softmax(data.x)

_, y_pred = pred_model.max(1)

print("model predictions on test data:", y_pred)

From above, you should see:

model predictions on test data: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

model predictions on test data: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

These are the model predictions on test data.

Let’s also check the model accuracy.

# check model accuracy
correct = (data.y == y_pred).sum().item()
acc = correct / len(data)
print("model accuracy: ", acc)

# check model accuracy

correct = (data.y == y_pred).sum().item()

acc = correct / len(data)

print("model accuracy: ", acc)

In this case, you may see

model accuracy:  0.9833333333333333

1	model accuracy: 0.9833333333333333

Which in this simple model, you can see the accuracy approach 1 if you train it longer.

Putting everything together, the following is the complete code:

import torch
from torch.utils.data import Dataset, DataLoader

class toy_data(Dataset):
    "The data for multi-class classification"
    def __init__(self):
        # single input
        self.x = torch.arange(-3, 3, 0.1).view(-1, 1)
        # multi-class output
        self.y = torch.zeros(self.x.shape[0])
        self.y[(self.x > -2.0)[:, 0] * (self.x < 0.0)[:, 0]] = 1
        self.y[(self.x >= 0.0)[:, 0] * (self.x < 2.0)[:, 0]] = 2
        self.y[(self.x >= 2.0)[:, 0]] = 3
        self.y = self.y.type(torch.LongTensor)
        self.len = self.x.shape[0]

    def __getitem__(self, idx):
        "accessing one element in the dataset by index"
        return self.x[idx], self.y[idx] 

    def __len__(self):
        "size of the entire dataset"
        return self.len

# Create the dataset object and check a few samples
data = toy_data()
print("first ten data samples: ", data.x[0:10])
print("first ten data labels: ", data.y[0:10])

class Softmax(torch.nn.Module):
    "custom softmax module"
    def __init__(self, n_inputs, n_outputs):
        super().__init__()
        self.linear = torch.nn.Linear(n_inputs, n_outputs)

    def forward(self, x):
        pred = self.linear(x)
        return pred

# call Softmax Classifier
model_softmax = Softmax(1, 4)
model_softmax.state_dict()

# define loss, optimizier, and dataloader
optimizer = torch.optim.SGD(model_softmax.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()
train_loader = DataLoader(dataset=data, batch_size=2)

# Train the model
Loss = []
epochs = 100
for epoch in range(epochs):
    for x, y in train_loader:
        optimizer.zero_grad()
        y_pred = model_softmax(x)
        loss = criterion(y_pred, y)
        Loss.append(loss)
        loss.backward()
        optimizer.step()
print("Done!")

# Make predictions on test data
pred_model =  model_softmax(data.x)
_, y_pred = pred_model.max(1)
print("model predictions on test data:", y_pred)

# check model accuracy
correct = (data.y == y_pred).sum().item()
acc = correct / len(data)
print("model accuracy: ", acc)

import torch

from torch.utils.data import Dataset, DataLoader

class toy_data(Dataset):

"The data for multi-class classification"

def __init__(self):

# single input

self.x = torch.arange(-3, 3, 0.1).view(-1, 1)

# multi-class output

self.y = torch.zeros(self.x.shape[0])

self.y[(self.x > -2.0)[:, 0] * (self.x < 0.0)[:, 0]] = 1

self.y[(self.x >= 0.0)[:, 0] * (self.x < 2.0)[:, 0]] = 2

self.y[(self.x >= 2.0)[:, 0]] = 3

self.y = self.y.type(torch.LongTensor)

self.len = self.x.shape[0]

def __getitem__(self, idx):

"accessing one element in the dataset by index"

return self.x[idx], self.y[idx]

def __len__(self):

"size of the entire dataset"

return self.len

# Create the dataset object and check a few samples

data = toy_data()

print("first ten data samples: ", data.x[0:10])

print("first ten data labels: ", data.y[0:10])

class Softmax(torch.nn.Module):

"custom softmax module"

def __init__(self, n_inputs, n_outputs):

super().__init__()

self.linear = torch.nn.Linear(n_inputs, n_outputs)

def forward(self, x):

pred = self.linear(x)

return pred

# call Softmax Classifier

model_softmax = Softmax(1, 4)

model_softmax.state_dict()

# define loss, optimizier, and dataloader

optimizer = torch.optim.SGD(model_softmax.parameters(), lr=0.01)

criterion = torch.nn.CrossEntropyLoss()

train_loader = DataLoader(dataset=data, batch_size=2)

# Train the model

Loss = []

epochs = 100

for epoch in range(epochs):

for x, y in train_loader:

optimizer.zero_grad()

y_pred = model_softmax(x)

loss = criterion(y_pred, y)

Loss.append(loss)

loss.backward()

optimizer.step()

print("Done!")

# Make predictions on test data

pred_model = model_softmax(data.x)

_, y_pred = pred_model.max(1)

print("model predictions on test data:", y_pred)

# check model accuracy

correct = (data.y == y_pred).sum().item()

acc = correct / len(data)

print("model accuracy: ", acc)

Summary

In this tutorial, you learned how to build a simple one-dimensional softmax classifier. Particularly, you learned:

How you can use a Softmax classifier for multiclass classification.
How to build and train a Softmax classifier in PyTorch.
How to analyze the results of the model on test data.

6 Responses to Introduction to Softmax Classifier in PyTorch

John William O'Meara July 3, 2023 at 3:39 am #

Correct me if I’m wrong, but this doesn’ appear to implement a Softmax classifier at all.
The custom class defined that is called “Softmax()” only has a single linear layer with no softmax activation.
Is this an oversight?

- James Carmichael July 3, 2023 at 7:59 am #
  
  Hi John…The tutorial introduces the concept. Let us know if you utilize it an what your findings are in your own projects.
  
  - JC November 24, 2024 at 10:02 am #
    
    your output dimension is 4, but your prediction required 1 number, I can’t see how you get from 4 to 1
    
    - James Carmichael November 25, 2024 at 7:25 am #
      
      Hi JC…Were you able to run the code to confirm your output? Keep us posted on your progress.
      
Ted L. November 16, 2025 at 6:53 pm #

Article misleading at best; does not actually use any sort of softmax function at all.

- Adrian Tam December 2, 2025 at 9:42 am #
  
  Hi Ted, you’re correct that you do not see the softmax activation. You can add one with nn.Softmax (https://docs.pytorch.org/docs/stable/generated/torch.nn.Softmax.html) but the code is still correct. The CrossEntropyLoss (https://docs.pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) used assumes the “logits” as the input. This is the common design to skip the unnecessary transformation in the model. The modules outside of the model, such as the loss function, will calculate the normalized softmax value. This is useful especially when you want to introduce “temperature” parameter to the softmax calculation, since you can tweak the logits before the actual softmax is calculated.

Navigation

Introduction to Softmax Classifier in PyTorch

Overview

Preparing Dataset

Building the Softmax Model with `nn.Module`

Want to Get Started With Deep Learning with PyTorch?

Training the Model

Summary

Get Started on Deep Learning with PyTorch!

Learn how to build deep learning models

Kick-start your deep learning journey with hands-on exercises

More On This Topic

6 Responses to Introduction to Softmax Classifier in PyTorch

Leave a Reply Click here to cancel reply.

Navigation

Overview

Preparing Dataset

Building the Softmax Model with nn.Module

Want to Get Started With Deep Learning with PyTorch?

Training the Model

Summary

Get Started on Deep Learning with PyTorch!

Learn how to build deep learning models

Kick-start your deep learning journey with hands-on exercises

More On This Topic

6 Responses to Introduction to Softmax Classifier in PyTorch

Leave a Reply Click here to cancel reply.

Building the Softmax Model with `nn.Module`