Last Updated on April 8, 2023
While a logistic regression classifier is used for binary class classification, softmax classifier is a supervised learning algorithm which is mostly used when multiple classes are involved.
Softmax classifier works by assigning a probability distribution to each class. The probability distribution of the class with the highest probability is normalized to 1, and all other probabilities are scaled accordingly.
Similarly, a softmax function transforms the output of neurons into a probability distribution over the classes. It has the following properties:
- It is related to the logistic sigmoid, which is used in probabilistic modeling and has similar properties.
- It takes values between 0 and 1, with 0 corresponding to an impossible event and 1 corresponding to an event that is certain to occur.
- The derivative of softmax with respect to input
x
can be interpreted as predicting how likely it is that a particular class will be selected, given an inputx
.
In this tutorial, we’ll build a one-dimensional softmax classifier and explore its functionality. Particularly, we’ll learn:
- How you can use a Softmax classifier for multiclass classification.
- How to build and train a Softmax classifier in PyTorch.
- How to analyze the results of the model on test data.
Kick-start your project with my book Deep Learning with PyTorch. It provides self-study tutorials with working code.
Let’s get started.

Introduction to Softmax Classifier in PyTorch.
Picture by Julia Caesar. Some rights reserved.
Overview
This tutorial is in four parts; they are
- Preparing Dataset
- Load Dataset into DataLoader
- Build the Model with
nn.Module
- Training the Classifier
Preparing Dataset
First, let’s build our dataset class to generate some data samples. Unlike the previous experiments, you will generate data for multiple classes. Then you will train the softmax classifier on these data samples and later use it to make predictions on test data.
In below, we generate data for four classes based on a single input variable:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
import torch from torch.utils.data import Dataset class toy_data(Dataset): "The data for multi-class classification" def __init__(self): # single input self.x = torch.arange(-3, 3, 0.1).view(-1, 1) # multi-class output self.y = torch.zeros(self.x.shape[0]) self.y[(self.x > -2.0)[:, 0] * (self.x < 0.0)[:, 0]] = 1 self.y[(self.x >= 0.0)[:, 0] * (self.x < 2.0)[:, 0]] = 2 self.y[(self.x >= 2.0)[:, 0]] = 3 self.y = self.y.type(torch.LongTensor) self.len = self.x.shape[0] def __getitem__(self, idx): "accessing one element in the dataset by index" return self.x[idx], self.y[idx] def __len__(self): "size of the entire dataset" return self.len |
Let’s create the data object and check the first ten data samples and their labels.
1 2 3 4 |
# Create the dataset object and check a few samples data = toy_data() print("first ten data samples: ", data.x[0:10]) print("first ten data labels: ", data.y[0:10]) |
This prints:
1 2 3 4 5 6 7 8 9 10 11 |
first ten data samples: tensor([[-3.0000], [-2.9000], [-2.8000], [-2.7000], [-2.6000], [-2.5000], [-2.4000], [-2.3000], [-2.2000], [-2.1000]]) first ten data labels: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) |
Building the Softmax Model with nn.Module
You will employ nn.Module
from PyTorch to build a custom softmax module. It is similar to the custom module you built in previous tutorials for logistic regression. So, what’s the difference here? Previously you used 1
in place of n_ouputs
for binary classification, while here we’ll define four classes for multi-class classification. Secondly, in the forward()
function, the model doesn’t use logistic function for prediction.
1 2 3 4 5 6 7 8 9 |
class Softmax(torch.nn.Module): "custom softmax module" def __init__(self, n_inputs, n_outputs): super().__init__() self.linear = torch.nn.Linear(n_inputs, n_outputs) def forward(self, x): pred = self.linear(x) return pred |
Now, let’s create the model object. It takes a one-dimensional vector as input and predicts for four different classes. Let’s also check how parameters are initialized.
1 2 3 |
# call Softmax Classifier model_softmax = Softmax(1, 4) model_softmax.state_dict() |
This prints
1 2 3 4 5 6 |
OrderedDict([('linear.weight', tensor([[-0.0075], [ 0.5364], [-0.8230], [-0.7359]])), ('linear.bias', tensor([-0.3852, 0.2682, -0.0198, 0.7929]))]) |
Want to Get Started With Deep Learning with PyTorch?
Take my free email crash course now (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Training the Model
Combined with the stochastic gradient descent, you will use cross entropy loss for model training and set the learning rate at 0.01. You’ll load the data into the data loader and set the batch size to 2.
1 2 3 4 5 6 7 |
... from torch.utils.dataimport DataLoader # define loss, optimizier, and dataloader optimizer = torch.optim.SGD(model_softmax.parameters(), lr = 0.01) criterion = torch.nn.CrossEntropyLoss() train_loader = DataLoader(dataset = data, batch_size = 2) |
Now that everything is set, let’s train our model for 100 epochs.
1 2 3 4 5 6 7 8 9 10 11 12 |
# Train the model Loss = [] epochs = 100 for epoch in range(epochs): for x, y in train_loader: optimizer.zero_grad() y_pred = model_softmax(x) loss = criterion(y_pred, y) Loss.append(loss) loss.backward() optimizer.step() print("Done!") |
After the training loop completed, you call the max()
method on the model to make predictions. The argument 1
returns maximum value with respect to axis one, i.e., to return the index of the maximum value from each column.
1 2 3 4 |
# Make predictions on test data pred_model = model_softmax(data.x) _, y_pred = pred_model.max(1) print("model predictions on test data:", y_pred) |
From above, you should see:
1 2 3 |
model predictions on test data: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]) |
These are the model predictions on test data.
Let’s also check the model accuracy.
1 2 3 4 |
# check model accuracy correct = (data.y == y_pred).sum().item() acc = correct / len(data) print("model accuracy: ", acc) |
In this case, you may see
1 |
model accuracy: 0.9833333333333333 |
Which in this simple model, you can see the accuracy approach 1 if you train it longer.
Putting everything together, the following is the complete code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
import torch from torch.utils.data import Dataset, DataLoader class toy_data(Dataset): "The data for multi-class classification" def __init__(self): # single input self.x = torch.arange(-3, 3, 0.1).view(-1, 1) # multi-class output self.y = torch.zeros(self.x.shape[0]) self.y[(self.x > -2.0)[:, 0] * (self.x < 0.0)[:, 0]] = 1 self.y[(self.x >= 0.0)[:, 0] * (self.x < 2.0)[:, 0]] = 2 self.y[(self.x >= 2.0)[:, 0]] = 3 self.y = self.y.type(torch.LongTensor) self.len = self.x.shape[0] def __getitem__(self, idx): "accessing one element in the dataset by index" return self.x[idx], self.y[idx] def __len__(self): "size of the entire dataset" return self.len # Create the dataset object and check a few samples data = toy_data() print("first ten data samples: ", data.x[0:10]) print("first ten data labels: ", data.y[0:10]) class Softmax(torch.nn.Module): "custom softmax module" def __init__(self, n_inputs, n_outputs): super().__init__() self.linear = torch.nn.Linear(n_inputs, n_outputs) def forward(self, x): pred = self.linear(x) return pred # call Softmax Classifier model_softmax = Softmax(1, 4) model_softmax.state_dict() # define loss, optimizier, and dataloader optimizer = torch.optim.SGD(model_softmax.parameters(), lr=0.01) criterion = torch.nn.CrossEntropyLoss() train_loader = DataLoader(dataset=data, batch_size=2) # Train the model Loss = [] epochs = 100 for epoch in range(epochs): for x, y in train_loader: optimizer.zero_grad() y_pred = model_softmax(x) loss = criterion(y_pred, y) Loss.append(loss) loss.backward() optimizer.step() print("Done!") # Make predictions on test data pred_model = model_softmax(data.x) _, y_pred = pred_model.max(1) print("model predictions on test data:", y_pred) # check model accuracy correct = (data.y == y_pred).sum().item() acc = correct / len(data) print("model accuracy: ", acc) |
Summary
In this tutorial, you learned how to build a simple one-dimensional softmax classifier. Particularly, you learned:
- How you can use a Softmax classifier for multiclass classification.
- How to build and train a Softmax classifier in PyTorch.
- How to analyze the results of the model on test data.
No comments yet.