PyTorch Tutorial: How to Develop Deep Learning Models with Python

By Jason Brownlee on May 1, 2023 in Deep Learning with PyTorch 130

Predictive modeling with deep learning is a skill that modern developers need to know.

PyTorch is the premier open-source deep learning framework developed and maintained by Facebook.

At its core, PyTorch is a mathematical library that allows you to perform efficient computation and automatic differentiation on graph-based models. Achieving this directly is challenging, although thankfully, the modern PyTorch API provides classes and idioms that allow you to easily develop a suite of deep learning models.

In this tutorial, you will discover a step-by-step guide to developing deep learning models in PyTorch.

After completing this tutorial, you will know:

The difference between Torch and PyTorch and how to install and confirm PyTorch is working.
The five-step life-cycle of PyTorch models and how to define, fit, and evaluate models.
How to develop PyTorch deep learning models for regression, classification, and predictive modeling tasks.

Kick-start your project with my book Deep Learning with PyTorch. It provides self-study tutorials with working code.

Let’s get started.

PyTorch Tutorial – How to Develop Deep Learning Models
Photo by Dimitry B., some rights reserved.

PyTorch Tutorial Overview

The focus of this tutorial is on using the PyTorch API for common deep learning model development tasks; we will not be diving into the math and theory of deep learning. For that, I recommend starting with this excellent book.

The best way to learn deep learning in python is by doing. Dive in. You can circle back for more theory later.

I have designed each code example to use best practices and to be standalone so that you can copy and paste it directly into your project and adapt it to your specific needs. This will give you a massive head start over trying to figure out the API from official documentation alone.

It is a large tutorial, and as such, it is divided into three parts; they are:

How to Install PyTorch
1. What Are Torch and PyTorch?
2. How to Install PyTorch
3. How to Confirm PyTorch Is Installed
PyTorch Deep Learning Model Life-Cycle
1. Step 1: Prepare the Data
2. Step 2: Define the Model
3. Step 3: Train the Model
4. Step 4: Evaluate the Model
5. Step 5: Make Predictions
How to Develop PyTorch Deep Learning Models
1. How to Develop an MLP for Binary Classification
2. How to Develop an MLP for Multiclass Classification
3. How to Develop an MLP for Regression
4. How to Develop a CNN for Image Classification

You Can Do Deep Learning in Python!

Work through this tutorial. It will take you 60 minutes, max!

You do not need to understand everything (at least not right now). Your goal is to run through the tutorial end-to-end and get a result. You do not need to understand everything on the first pass. List down your questions as you go. Make heavy use of the API documentation to learn about all of the functions that you’re using.

You do not need to know the math first. Math is a compact way of describing how algorithms work, specifically tools from linear algebra, probability, and calculus. These are not the only tools that you can use to learn how algorithms work. You can also use code and explore algorithm behavior with different inputs and outputs. Knowing the math will not tell you what algorithm to choose or how to best configure it. You can only discover that through carefully controlled experiments.

You do not need to know how the algorithms work. It is important to know about the limitations and how to configure deep learning algorithms. But learning about algorithms can come later. You need to build up this algorithm knowledge slowly over a long period of time. Today, start by getting comfortable with the platform.

You do not need to be a Python programmer. The syntax of the Python language can be intuitive if you are new to it. Just like other languages, focus on function calls (e.g. function()) and assignments (e.g. a = “b”). This will get you most of the way. You are a developer; you know how to pick up the basics of a language really fast. Just get started and dive into the details later.

You do not need to be a deep learning expert. You can learn about the benefits and limitations of various algorithms later, and there are plenty of tutorials that you can read to brush up on the steps of a deep learning project.

1. How to Install PyTorch

In this section, you will discover what PyTorch is, how to install it, and how to confirm that it is installed correctly.

1.1. What Are Torch and PyTorch?

PyTorch is an open-source Python library for deep learning developed and maintained by Facebook.

The project started in 2016 and quickly became a popular framework among developers and researchers.

Torch (Torch7) is an open-source project for deep learning written in C and generally used via the Lua interface. It was a precursor project to PyTorch and is no longer actively developed. PyTorch includes “Torch” in the name, acknowledging the prior torch library with the “Py” prefix indicating the Python focus of the new project.

The PyTorch API is simple and flexible, making it a favorite for academics and researchers in the development of new deep learning models and applications. The extensive use has led to many extensions for specific applications (such as text, computer vision, and audio data), and may pre-trained models that can be used directly. As such, it may be the most popular library used by academics.

The flexibility of PyTorch comes at the cost of ease of use, especially for beginners, as compared to simpler interfaces like Keras. The choice to use PyTorch instead of Keras gives up some ease of use, a slightly steeper learning curve, and more code for more flexibility, and perhaps a more vibrant academic community.

1.2. How to Install PyTorch

Before installing PyTorch, ensure that you have Python installed, such as Python 3.6 or higher.

If you don’t have Python installed, you can install it using Anaconda. This tutorial will show you how:

How to Setup Your Python Environment for Machine Learning With Anaconda

There are many ways to install the PyTorch open-source deep learning library.

The most common, and perhaps simplest, way to install PyTorch on your workstation is by using pip.

For example, on the command line, you can type:

sudo pip install torch

1	sudo pip install torch

Perhaps the most popular application of deep learning is for computer vision, and the PyTorch computer vision package is called “torchvision.”

Installing torchvision is also highly recommended and it can be installed as follows:

sudo pip install torchvision

1	sudo pip install torchvision

If you prefer to use an installation method more specific to your platform or package manager, you can see a complete list of installation instructions here:

PyTorch Installation Guide

There is no need to set up the GPU now.

All examples in this tutorial will work just fine on a modern CPU. If you want to configure PyTorch for your GPU, you can do that after completing this tutorial. Don’t get distracted!

1.3. How to Confirm PyTorch Is Installed

Once PyTorch is installed, it is important to confirm that the library was installed successfully and that you can start using it.

Don’t skip this step.

If PyTorch is not installed correctly or raises an error on this step, you won’t be able to run the examples later.

Create a new file called versions.py and copy and paste the following code into the file.

# check pytorch version
import torch
print(torch.__version__)

# check pytorch version

import torch

print(torch.__version__)

Save the file, then open your command line and change directory to where you saved the file.

Then type:

python versions.py

1	python versions.py

You should then see output like the following:

1.3.1

1.3.1

This confirms that PyTorch is installed correctly and that we are all using the same version.

This also shows you how to run a Python script from the command line. I recommend running all code from the command line in this manner, and not from a notebook or an IDE.

2. PyTorch Deep Learning Model Life-Cycle

In this section, you will discover the life-cycle for a deep learning model and the PyTorch API that you can use to define models.

A model has a life-cycle, and this very simple knowledge provides the backbone for both modeling a dataset and understanding the PyTorch API.

The five steps in the life-cycle are as follows:

1. Prepare the Data.
2. Define the Model.
3. Train the Model.
4. Evaluate the Model.
5. Make Predictions.

Let’s take a closer look at each step in turn.

Note: There are many ways to achieve each of these steps using the PyTorch API, although I have aimed to show you the simplest, or most common, or most idiomatic.

If you discover a better approach, let me know in the comments below.

Step 1: Prepare the Data

The first step is to load and prepare your data.

Neural network models require numerical input data and numerical output data.

You can use standard Python libraries to load and prepare tabular data, like CSV files. For example, Pandas can be used to load your CSV file, and tools from scikit-learn can be used to encode categorical data, such as class labels.

PyTorch provides the Dataset class that you can extend and customize to load your dataset.

For example, the constructor of your dataset object can load your data file (e.g. a CSV file). You can then override the __len__() function that can be used to get the length of the dataset (number of rows or samples), and the __getitem__() function that is used to get a specific sample by index.

When loading your dataset, you can also perform any required transforms, such as scaling or encoding.

A skeleton of a custom Dataset class is provided below.

# dataset definition
class CSVDataset(Dataset):
    # load the dataset
    def __init__(self, path):
        # store the inputs and outputs
        self.X = ...
        self.y = ...

    # number of rows in the dataset
    def __len__(self):
        return len(self.X)

    # get a row at an index
    def __getitem__(self, idx):
        return [self.X[idx], self.y[idx]]

# dataset definition

class CSVDataset(Dataset):

# load the dataset

def __init__(self, path):

# store the inputs and outputs

self.X = ...

self.y = ...

# number of rows in the dataset

def __len__(self):

return len(self.X)

# get a row at an index

def __getitem__(self, idx):

return [self.X[idx], self.y[idx]]

Once loaded, PyTorch provides the DataLoader class to navigate a Dataset instance during the training and evaluation of your model.

A DataLoader instance can be created for the training dataset, test dataset, and even a validation dataset.

The random_split() function can be used to split a dataset into train and test sets. Once split, a selection of rows from the Dataset can be provided to a DataLoader, along with the batch size and whether the data should be shuffled every epoch.

For example, we can define a DataLoader by passing in a selected sample of rows in the dataset.

...
# create the dataset
dataset = CSVDataset(...)
# select rows from the dataset
train, test = random_split(dataset, [[...], [...]])
# create a data loader for train and test sets
train_dl = DataLoader(train, batch_size=32, shuffle=True)
test_dl = DataLoader(test, batch_size=1024, shuffle=False)

...

# create the dataset

dataset = CSVDataset(...)

# select rows from the dataset

train, test = random_split(dataset, [[...], [...]])

# create a data loader for train and test sets

train_dl = DataLoader(train, batch_size=32, shuffle=True)

test_dl = DataLoader(test, batch_size=1024, shuffle=False)

Once defined, a DataLoader can be enumerated, yielding one batch worth of samples each iteration.

...
# train the model
for i, (inputs, targets) in enumerate(train_dl):
	...

...

# train the model

for i, (inputs, targets) in enumerate(train_dl):

...

Step 2: Define the Model

The next step is to define a model.

The idiom for defining a model in PyTorch involves defining a class that extends the Module class.

The constructor of your class defines the layers of the model and the forward() function is the override that defines how to forward propagate input through the defined layers of the model.

Many layers are available, such as Linear for fully connected layers, Conv2d for convolutional layers, and MaxPool2d for pooling layers.

Activation functions can also be defined as layers, such as ReLU, Softmax, and Sigmoid.

Below is an example of a simple MLP model with one layer.

# model definition
class MLP(Module):
    # define model elements
    def __init__(self, n_inputs):
        super(MLP, self).__init__()
        self.layer = Linear(n_inputs, 1)
        self.activation = Sigmoid()

    # forward propagate input
    def forward(self, X):
        X = self.layer(X)
        X = self.activation(X)
        return X

# model definition

class MLP(Module):

# define model elements

def __init__(self, n_inputs):

super(MLP, self).__init__()

self.layer = Linear(n_inputs, 1)

self.activation = Sigmoid()

# forward propagate input

def forward(self, X):

X = self.layer(X)

X = self.activation(X)

return X

The weights of a given layer can also be initialized after the layer is defined in the constructor.

Common examples include the Xavier and He weight initialization schemes. For example:

...
xavier_uniform_(self.layer.weight)

1 2	... xavier_uniform_(self.layer.weight)

Step 3: Train the Model

The training process requires that you define a loss function and an optimization algorithm.

Common loss functions include the following:

BCELoss: Binary cross-entropy loss for binary classification.
CrossEntropyLoss: Categorical cross-entropy loss for multi-class classification.
MSELoss: Mean squared loss for regression.

For more on loss functions generally, see the tutorial:

Loss and Loss Functions for Training Deep Learning Neural Networks

Stochastic gradient descent is used for optimization, and the standard algorithm is provided by the SGD class, although other versions of the algorithm are available, such as Adam.

# define the optimization
criterion = MSELoss()
optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)

# define the optimization

criterion = MSELoss()

optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)

Training the model involves enumerating the DataLoader for the training dataset.

First, a loop is required for the number of training epochs. Then an inner loop is required for the mini-batches for stochastic gradient descent.

...
# enumerate epochs
for epoch in range(100):
    # enumerate mini batches
    for i, (inputs, targets) in enumerate(train_dl):
    	...

...

# enumerate epochs

for epoch in range(100):

# enumerate mini batches

for i, (inputs, targets) in enumerate(train_dl):

...

Each update to the model involves the same general pattern comprised of:

Clearing the last error gradient.
A forward pass of the input through the model.
Calculating the loss for the model output.
Backpropagating the error through the model.
Update the model in an effort to reduce loss.

For example:

...
# clear the gradients
optimizer.zero_grad()
# compute the model output
yhat = model(inputs)
# calculate loss
loss = criterion(yhat, targets)
# credit assignment
loss.backward()
# update model weights
optimizer.step()

...

# clear the gradients

optimizer.zero_grad()

# compute the model output

yhat = model(inputs)

# calculate loss

loss = criterion(yhat, targets)

# credit assignment

loss.backward()

# update model weights

optimizer.step()

Step 4: Evaluate the model

Once the model is fit, it can be evaluated on the test dataset.

This can be achieved by using the DataLoader for the test dataset and collecting the predictions for the test set, then comparing the predictions to the expected values of the test set and calculating a performance metric.

...
for i, (inputs, targets) in enumerate(test_dl):
    # evaluate the model on the test set
    yhat = model(inputs)
    ...

...

for i, (inputs, targets) in enumerate(test_dl):

# evaluate the model on the test set

yhat = model(inputs)

...

Step 5: Make predictions

A fit model can be used to make a prediction on new data.

For example, you might have a single image or a single row of data and want to make a prediction.

This requires that you wrap the data in a PyTorch Tensor data structure.

A Tensor is just the PyTorch version of a NumPy array for holding data. It also allows you to perform the automatic differentiation tasks in the model graph, like calling backward() when training the model.

The prediction too will be a Tensor, although you can retrieve the NumPy array by detaching the Tensor from the automatic differentiation graph and calling the NumPy function.

...
# convert row to data
row = Variable(Tensor([row]).float())
# make prediction
yhat = model(row)
# retrieve numpy array
yhat = yhat.detach().numpy()

...

# convert row to data

row = Variable(Tensor([row]).float())

# make prediction

yhat = model(row)

# retrieve numpy array

yhat = yhat.detach().numpy()

Now that we are familiar with the PyTorch API at a high-level and the model life-cycle, let’s look at how we can develop some standard deep learning models from scratch.

3. How to Develop PyTorch Deep Learning Models

In this section, you will discover how to develop, evaluate, and make predictions with standard deep learning models, including Multilayer Perceptrons (MLP) and Convolutional Neural Networks (CNN).

A Multilayer Perceptron model, or MLP for short, is a standard fully connected neural network model.

It is comprised of layers of nodes where each node is connected to all outputs from the previous layer and the output of each node is connected to all inputs for nodes in the next layer.

An MLP is a model with one or more fully connected layers. This model is appropriate for tabular data, that is data as it looks in a table or spreadsheet with one column for each variable and one row for each variable. There are three predictive modeling problems you may want to explore with an MLP; they are binary classification, multiclass classification, and regression.

Let’s fit a model on a real dataset for each of these cases.

Note: The models in this section are effective, but not optimized. See if you can improve their performance. Post your findings in the comments below.

3.1. How to Develop an MLP for Binary Classification

We will use the Ionosphere binary (two class) classification dataset to demonstrate an MLP for binary classification.

This dataset involves predicting whether there is a structure in the atmosphere or not given radar returns.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

We will use a LabelEncoder to encode the string labels to integer values 0 and 1. The model will be fit on 67 percent of the data, and the remaining 33 percent will be used for evaluation, split using the train_test_split() function.

It is a good practice to use ‘relu‘ activation with a ‘He Uniform‘ weight initialization. This combination goes a long way to overcome the problem of vanishing gradients when training deep neural network models. For more on ReLU, see the tutorial:

A Gentle Introduction to the Rectified Linear Unit (ReLU)

The model predicts the probability of class 1 and uses the sigmoid activation function. The model is optimized using stochastic gradient descent and seeks to minimize the binary cross-entropy loss.

The complete example is listed below.

# pytorch mlp for binary classification
from numpy import vstack
from pandas import read_csv
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torch.utils.data import random_split
from torch import Tensor
from torch.nn import Linear
from torch.nn import ReLU
from torch.nn import Sigmoid
from torch.nn import Module
from torch.optim import SGD
from torch.nn import BCELoss
from torch.nn.init import kaiming_uniform_
from torch.nn.init import xavier_uniform_

# dataset definition
class CSVDataset(Dataset):
    # load the dataset
    def __init__(self, path):
        # load the csv file as a dataframe
        df = read_csv(path, header=None)
        # store the inputs and outputs
        self.X = df.values[:, :-1]
        self.y = df.values[:, -1]
        # ensure input data is floats
        self.X = self.X.astype('float32')
        # label encode target and ensure the values are floats
        self.y = LabelEncoder().fit_transform(self.y)
        self.y = self.y.astype('float32')
        self.y = self.y.reshape((len(self.y), 1))

    # number of rows in the dataset
    def __len__(self):
        return len(self.X)

    # get a row at an index
    def __getitem__(self, idx):
        return [self.X[idx], self.y[idx]]

    # get indexes for train and test rows
    def get_splits(self, n_test=0.33):
        # determine sizes
        test_size = round(n_test * len(self.X))
        train_size = len(self.X) - test_size
        # calculate the split
        return random_split(self, [train_size, test_size])

# model definition
class MLP(Module):
    # define model elements
    def __init__(self, n_inputs):
        super(MLP, self).__init__()
        # input to first hidden layer
        self.hidden1 = Linear(n_inputs, 10)
        kaiming_uniform_(self.hidden1.weight, nonlinearity='relu')
        self.act1 = ReLU()
        # second hidden layer
        self.hidden2 = Linear(10, 8)
        kaiming_uniform_(self.hidden2.weight, nonlinearity='relu')
        self.act2 = ReLU()
        # third hidden layer and output
        self.hidden3 = Linear(8, 1)
        xavier_uniform_(self.hidden3.weight)
        self.act3 = Sigmoid()

    # forward propagate input
    def forward(self, X):
        # input to first hidden layer
        X = self.hidden1(X)
        X = self.act1(X)
         # second hidden layer
        X = self.hidden2(X)
        X = self.act2(X)
        # third hidden layer and output
        X = self.hidden3(X)
        X = self.act3(X)
        return X

# prepare the dataset
def prepare_data(path):
    # load the dataset
    dataset = CSVDataset(path)
    # calculate split
    train, test = dataset.get_splits()
    # prepare data loaders
    train_dl = DataLoader(train, batch_size=32, shuffle=True)
    test_dl = DataLoader(test, batch_size=1024, shuffle=False)
    return train_dl, test_dl

# train the model
def train_model(train_dl, model):
    # define the optimization
    criterion = BCELoss()
    optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)
    # enumerate epochs
    for epoch in range(100):
        # enumerate mini batches
        for i, (inputs, targets) in enumerate(train_dl):
            # clear the gradients
            optimizer.zero_grad()
            # compute the model output
            yhat = model(inputs)
            # calculate loss
            loss = criterion(yhat, targets)
            # credit assignment
            loss.backward()
            # update model weights
            optimizer.step()

# evaluate the model
def evaluate_model(test_dl, model):
    predictions, actuals = list(), list()
    for i, (inputs, targets) in enumerate(test_dl):
        # evaluate the model on the test set
        yhat = model(inputs)
        # retrieve numpy array
        yhat = yhat.detach().numpy()
        actual = targets.numpy()
        actual = actual.reshape((len(actual), 1))
        # round to class values
        yhat = yhat.round()
        # store
        predictions.append(yhat)
        actuals.append(actual)
    predictions, actuals = vstack(predictions), vstack(actuals)
    # calculate accuracy
    acc = accuracy_score(actuals, predictions)
    return acc

# make a class prediction for one row of data
def predict(row, model):
    # convert row to data
    row = Tensor([row])
    # make prediction
    yhat = model(row)
    # retrieve numpy array
    yhat = yhat.detach().numpy()
    return yhat

# prepare the data
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv'
train_dl, test_dl = prepare_data(path)
print(len(train_dl.dataset), len(test_dl.dataset))
# define the network
model = MLP(34)
# train the model
train_model(train_dl, model)
# evaluate the model
acc = evaluate_model(test_dl, model)
print('Accuracy: %.3f' % acc)
# make a single prediction (expect class=1)
row = [1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1,0.03760,0.85243,-0.17755,0.59755,-0.44945,0.60536,-0.38223,0.84356,-0.38542,0.58212,-0.32192,0.56971,-0.29674,0.36946,-0.47357,0.56811,-0.51171,0.41078,-0.46168,0.21266,-0.34090,0.42267,-0.54487,0.18641,-0.45300]
yhat = predict(row, model)
print('Predicted: %.3f (class=%d)' % (yhat, yhat.round()))

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

# pytorch mlp for binary classification

from numpy import vstack

from pandas import read_csv

from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import accuracy_score

from torch.utils.data import Dataset

from torch.utils.data import DataLoader

from torch.utils.data import random_split

from torch import Tensor

from torch.nn import Linear

from torch.nn import ReLU

from torch.nn import Sigmoid

from torch.nn import Module

from torch.optim import SGD

from torch.nn import BCELoss

from torch.nn.init import kaiming_uniform_

from torch.nn.init import xavier_uniform_

# dataset definition

class CSVDataset(Dataset):

# load the dataset

def __init__(self, path):

# load the csv file as a dataframe

df = read_csv(path, header=None)

# store the inputs and outputs

self.X = df.values[:, :-1]

self.y = df.values[:, -1]

# ensure input data is floats

self.X = self.X.astype('float32')

# label encode target and ensure the values are floats

self.y = LabelEncoder().fit_transform(self.y)

self.y = self.y.astype('float32')

self.y = self.y.reshape((len(self.y), 1))

# number of rows in the dataset

def __len__(self):

return len(self.X)

# get a row at an index

def __getitem__(self, idx):

return [self.X[idx], self.y[idx]]

# get indexes for train and test rows

def get_splits(self, n_test=0.33):

# determine sizes

test_size = round(n_test * len(self.X))

train_size = len(self.X) - test_size

# calculate the split

return random_split(self, [train_size, test_size])

# model definition

class MLP(Module):

# define model elements

def __init__(self, n_inputs):

super(MLP, self).__init__()

# input to first hidden layer

self.hidden1 = Linear(n_inputs, 10)

kaiming_uniform_(self.hidden1.weight, nonlinearity='relu')

self.act1 = ReLU()

# second hidden layer

self.hidden2 = Linear(10, 8)

kaiming_uniform_(self.hidden2.weight, nonlinearity='relu')

self.act2 = ReLU()

# third hidden layer and output

self.hidden3 = Linear(8, 1)

xavier_uniform_(self.hidden3.weight)

self.act3 = Sigmoid()

# forward propagate input

def forward(self, X):

# input to first hidden layer

X = self.hidden1(X)

X = self.act1(X)

# second hidden layer

X = self.hidden2(X)

X = self.act2(X)

# third hidden layer and output

X = self.hidden3(X)

X = self.act3(X)

return X

# prepare the dataset

def prepare_data(path):

# load the dataset

dataset = CSVDataset(path)

# calculate split

train, test = dataset.get_splits()

# prepare data loaders

train_dl = DataLoader(train, batch_size=32, shuffle=True)

test_dl = DataLoader(test, batch_size=1024, shuffle=False)

return train_dl, test_dl

# train the model

def train_model(train_dl, model):

# define the optimization

criterion = BCELoss()

optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)

# enumerate epochs

for epoch in range(100):

# enumerate mini batches

for i, (inputs, targets) in enumerate(train_dl):

# clear the gradients

optimizer.zero_grad()

# compute the model output

yhat = model(inputs)

# calculate loss

loss = criterion(yhat, targets)

# credit assignment

loss.backward()

# update model weights

optimizer.step()

# evaluate the model

def evaluate_model(test_dl, model):

predictions, actuals = list(), list()

for i, (inputs, targets) in enumerate(test_dl):

# evaluate the model on the test set

yhat = model(inputs)

# retrieve numpy array

yhat = yhat.detach().numpy()

actual = targets.numpy()

actual = actual.reshape((len(actual), 1))

# round to class values

yhat = yhat.round()

# store

predictions.append(yhat)

actuals.append(actual)

predictions, actuals = vstack(predictions), vstack(actuals)

# calculate accuracy

acc = accuracy_score(actuals, predictions)

return acc

# make a class prediction for one row of data

def predict(row, model):

# convert row to data

row = Tensor([row])

# make prediction

yhat = model(row)

# retrieve numpy array

yhat = yhat.detach().numpy()

return yhat

# prepare the data

path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv'

train_dl, test_dl = prepare_data(path)

print(len(train_dl.dataset), len(test_dl.dataset))

# define the network

model = MLP(34)

# train the model

train_model(train_dl, model)

# evaluate the model

acc = evaluate_model(test_dl, model)

print('Accuracy: %.3f' % acc)

# make a single prediction (expect class=1)

row = [1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1,0.03760,0.85243,-0.17755,0.59755,-0.44945,0.60536,-0.38223,0.84356,-0.38542,0.58212,-0.32192,0.56971,-0.29674,0.36946,-0.47357,0.56811,-0.51171,0.41078,-0.46168,0.21266,-0.34090,0.42267,-0.54487,0.18641,-0.45300]

yhat = predict(row, model)

print('Predicted: %.3f (class=%d)' % (yhat, yhat.round()))

Running the example first reports the shape of the train and test datasets, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Summary

In this tutorial, you discovered a step-by-step guide to developing deep learning models in PyTorch.

Specifically, you learned:

The difference between Torch and PyTorch and how to install and confirm PyTorch is working.
The five-step life-cycle of PyTorch models and how to define, fit, and evaluate models.
How to develop PyTorch deep learning models for regression, classification, and predictive modeling tasks.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

130 Responses to PyTorch Tutorial: How to Develop Deep Learning Models with Python

Kashif March 23, 2020 at 5:39 am #

Nice tutorial. May I ask if you are migrating from Keras to Pytorch and any solid grounds for doing so? I know its a research favourite but what else.

Reply
- Jason Brownlee March 23, 2020 at 6:16 am #
  
  No plan to migrate to pytorch at this stage, I’m just showing how to get started.
  
  Reply
  - Hoson March 27, 2020 at 6:40 pm #
    
    Keras API is still looking simpler than Pytorch API, but Tensorflow’s version 2 is relatively more powerful and production-ready than Pytorch. Any reason behind in creating this tutorial or any plan/reason in switching to Pytorch?
    
    I did evaluate Pytorch more than a year ago but did not switch finally. Another one was Julia but also sticking with TF and Keras until now, especially after having version 2 eager mode.
    
    Reply
    - Jason Brownlee March 28, 2020 at 6:15 am #
      
      Made on request.
      
      Agreed, Keras API is better for beginners by a mile. No plans to switch.
      
      Reply
Anon March 24, 2020 at 3:10 am #

Sir a simple program I am running binary classification using DNN with pytorch. A discussion also going on in pytorch forum which is not yet solved. Can you help out I will post the link

Reply
- Jason Brownlee March 24, 2020 at 6:09 am #
  
  See the MLP example above for a “DNN with pytorch”.
  
  Reply
Vince March 24, 2020 at 8:27 am #

Thank you for your tutorial. It is very nice. However, did you have any tutorial like this for tensorflow? Or from other source (book, web…)?
Thank you!

Reply
- Jason Brownlee March 24, 2020 at 8:31 am #
  
  Yes, right here:
  https://machinelearningmastery.com/tensorflow-tutorial-deep-learning-with-tf-keras/
  
  Reply
Vince March 24, 2020 at 9:33 am #

My meant is: did you have any tutorial like this for only tensorflow without keras?
Furthermore, I really like “prepare dataset” in your pytorch tutorial because I can be easy to customize my data set. However, this tutorial is not. It is instead of function in library that is difficult to customize. Can you explain why you do that?
Thank you!

Reply
- Jason Brownlee March 24, 2020 at 1:43 pm #
  
  Yes, many examples here:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Perhaps start here:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  I used best practices for pytorch in this tutorial – that is why.
  
  Reply
Benjamin March 25, 2020 at 2:51 am #

Great tutorial.

I am hoping you’ll continue this series and cover more examples in Pytorch

Reply
- Jason Brownlee March 25, 2020 at 6:36 am #
  
  Thanks for the suggestion.
  
  No plans at this stage.
  
  Reply
ZCoder March 25, 2020 at 7:28 am #

Jason,

This is an amazing tutorial. Your end-to-end examples with very clear explanations take the prize! Thank you very much!

My favorite tutorial on your site is “How to Use the Keras Functional API for Deep Learning”, especially its part 5 “Multiple Input and Output Models”. That Keras tutorial helped me to develop far more sophisticated models.

While you’ve mentioned that you have no current plans to expand on PyTorch tutorials, it would be fantastic to see a similar PyTorch tutorial covering different multiple input/multiple output cases sometime in the future.

Thanks!

Reply
- Jason Brownlee March 25, 2020 at 11:24 am #
  
  Thanks!
  
  Great suggestion.
  
  Reply
Kyla March 26, 2020 at 5:48 am #

Thanks as always Jason! I was looking forward to a good pytorch tutorial. this nailed it.

Reply
- Jason Brownlee March 26, 2020 at 8:02 am #
  
  Thanks!
  
  Reply
Abraham March 26, 2020 at 7:01 pm #

Hi Jason,
Pytorch may provide some flexibility wrt Keras in deep learning issues according to my recent experiences i.e You can have deep insight what the model does and enhance somehow your coding skills. We look forward to new pytorch tutorials.

Reply
- Jason Brownlee March 27, 2020 at 6:08 am #
  
  Thanks.
  
  Reply
Ashutosh March 27, 2020 at 5:58 am #

This is an excellent tutorial. Thanks, Jason for sharing.

Reply
- Jason Brownlee March 27, 2020 at 6:21 am #
  
  Thank you!
  
  Reply

Anthony The Koala March 27, 2020 at 1:46 pm #

Dear Dr Jason,
I thought I’d share this with you.
For those with difficulties with the installation of Pytorch, I present another way of installing

When you thought that by pipping pytorch and its cuda and non-cuda variations seemed to be ok, you may get errors on either installation or DLL errors when trying to import from within python.

The following installation was straightforward and produced no errors.

If you one is having trouble with the PyTorch Installation particularly when you get errors during the installation, try the following:

	Uninstall all pytorch versions including pytorch packages with *.whl 
	Goto https://pytorch.org/get-started/locally/ 
   	Select the stable version, eg, 1.4
	Select OS:Linux, Mac, Win
	Select source: conda, pip, libtorch, source
	Select language: python, c++/java
	Select cuda version: 9.1, 10.1 or none= relying on cpu
	For me: I selected stable, Win, pip, python, none.
	As a result I copied the following  for my particular choice:
	pip install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
	Opened a command window and pasted the above 
	Once installed, opened python and did this:
	import torch
	torch.__version__
	result: '1.4.0+cpu'

Uninstall all pytorch versions including pytorch packages with *.whl

Goto https://pytorch.org/get-started/locally/

Select the stable version, eg, 1.4

Select OS:Linux, Mac, Win

Select source: conda, pip, libtorch, source

Select language: python, c++/java

Select cuda version: 9.1, 10.1 or none= relying on cpu

For me: I selected stable, Win, pip, python, none.

As a result I copied the following for my particular choice:

pip install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

Opened a command window and pasted the above

Once installed, opened python and did this:

import torch

torch.__version__

result: '1.4.0+cpu'

Thank you,
Anthony of Sydney

Jason Brownlee March 28, 2020 at 6:09 am #

Thanks for sharing!

Reply
- Sean benhur November 2, 2020 at 12:39 am #
  
  This is an excellent tutorial..Please make more on Pytorch,
  
  Reply
  - Jason Brownlee November 2, 2020 at 6:41 am #
    
    Thanks!
    
    Reply

asko April 2, 2020 at 1:17 am #

Thank you very much,

I still do not understand why we extend Dataset when we create a custom dataset class, I know Dataset is an abstract class but what will happen if we will not extend it ??

Thank you

Reply
- Jason Brownlee April 2, 2020 at 6:00 am #
  
  This is the convention in the PyTorch library.
  
  What happens if you don’t follow it. Probably your code won’t work – e.g. your are not meeting the expectations of the library.
  
  Reply
Sushant Gautam April 4, 2020 at 3:55 am #

This is a very well-detailed explanation Tutorial. Your Tutorial always helps me to learn more. Thanks, Jasons for Sharing.

Reply
- Jason Brownlee April 4, 2020 at 6:26 am #
  
  I’m happy it helps!
  
  Reply
Ando Ki April 27, 2020 at 12:55 pm #

It is a nice tutorial and thank you.
Could you enlighten me to how to prepare a new image to feed ‘forward(x)’?
Thanks again.
Ando

Reply
- Ando Ki April 27, 2020 at 10:37 pm #
  
  The previous question is about MNIST and the question was how to prepare input image to feed forward(x) from PNG or JPG image.
  
  Reply
  - Jason Brownlee April 28, 2020 at 6:45 am #
    
    Load the image as per normal, then scale pixels/resize in an identical manner as the training dataset.
    
    Reply
- Jason Brownlee April 28, 2020 at 6:39 am #
  
  You’re welcome.
  
  New images must be prepared in an identical way as the training data.
  
  Reply
Xu Zhang April 29, 2020 at 8:41 am #

Thank you so much for your great post.
Recently, more and more new models are written in Pytorch. However, I am familiar with Keras. Do you have any ideas to transfer pytorch models to keras models? Many thanks

Reply
- Jason Brownlee April 29, 2020 at 12:05 pm #
  
  You’re welcome.
  
  No, not offhand, sorry.
  
  Reply
Akshay Tiwari May 10, 2020 at 12:31 am #

As always very useful tutorial Jason.Thanks.
Am I not wrong in assuming that pytorch is more useful for people who are looking for complete control over their model i.e researchers.Having said that, keras does almost all my stuff.

Reply
- Jason Brownlee May 10, 2020 at 6:12 am #
  
  You’re welcome.
  
  Perhaps. Both tensorflow and pytorch give complete control, but for those interested in control, pytorch appears more popular – e.g. academics/researchers developing new methods rather than engineers solving problems.
  
  Just my observation, not a truth.
  
  Reply
Tuan May 14, 2020 at 12:35 am #

Thank you for a great post. I am just curious: what do the numbers that go with MLP mean? For example MLP(13) and MLP(1). I see that you use different numbers in different examples.

Reply
- Jason Brownlee May 14, 2020 at 5:52 am #
  
  The number of inputs to the model.
  
  Reply
Tenchu May 23, 2020 at 2:06 am #

Thank you so much… wonderful post. I learned a lot

Reply
- Jason Brownlee May 23, 2020 at 6:25 am #
  
  Thanks!
  
  Reply
Hoang May 26, 2020 at 12:54 pm #

Hello, I ran the regression code on my side, but I noticed the output is always the same for any sets of inputs. Did anyone flag this bug yet?

Ex: Try running
row = [0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98]
yhat = predict(row, model)
print(‘Predicted: %.3f’ % yhat)

row = [1,30,2.310,0,2,6.5750,80,4.0900,1,296.0,20,500,4.98]
yhat = predict(row, model)
print(‘Predicted: %.3f’ % yhat)

They always are the same answer…

Reply
- Jason Brownlee May 26, 2020 at 1:23 pm #
  
  Perhaps double check your code does not have a bug after you made your modifications.
  
  Reply
  - Hoang May 26, 2020 at 9:13 pm #
    
    Actually, I took the as is from your site, and just ran it. If I expand the predictions to:
    row = [0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98]
    yhat = predict(row, model)
    print(‘Predicted: %.3f’ % yhat) # should be near 24.00
    
    row2 = [0.17505,0.00,5.960,0,0.4990,5.9660,30.20,3.8473,5,279.0,19.20,393.43,10.13]
    yhat2 = predict(row2, model)
    print(‘Predicted: %.3f’ % yhat2) #should be near 24.70
    
    row3 = [2.77974,0.00,19.580,0,0.8710,4.9030,97.80,1.3459,5,403.0,14.70,396.90,29.29]
    yhat3 = predict(row3, model)
    print(‘Predicted: %.3f’ % yhat3) #should be near 11.80
    
    row4 = [0.07503,33.00,2.180,0,0.4720,7.4200,71.90,3.0992,7,222.0,18.40,396.90,6.47]
    yhat4 = predict(row4, model)
    print(‘Predicted: %.3f’ % yhat4) #should be near 33.40
    
    I get:
    
    339 167
    MSE: 85.617, RMSE: 9.253
    Predicted: 22.318
    Predicted: 22.318
    Predicted: 22.318
    Predicted: 22.318
    
    Can you retest it on your side see if you get the same results? Again, I ran the same code with the same dataset of housing prices.
    
    Thanks!
    
    Reply
    - Carlos May 27, 2020 at 5:20 am #
      
      The SGD optimizer was getting stuck at a local minimum, changing it for the Adam optimizer works a lot better and you’ll see a noticeable response to different inputs.
      
      Reply
      - Hoang May 27, 2020 at 5:39 am #
        
        Thank you so much Carlos! Check out his blog here: http://cgoliver.com/blog
      - Jason Brownlee May 27, 2020 at 8:04 am #
        
        Great blog!
      - Jason Brownlee May 27, 2020 at 8:03 am #
        
        Well done!
    - Jason Brownlee May 27, 2020 at 7:49 am #
      
      Yes, I see the same problem.
      
      Perhaps the model has over fit the training data, perhaps try changing the model architecture or learning hyperparameters?
      
      E.g. smaller learning rate or fewer epochs.
      
      Reply
      - Carlos May 27, 2020 at 9:24 am #
        
        Thanks!
        
        But wouldn’t the issue be under-fitting? I would expect very high variance at inference time for a model that over-fit, but we see the opposite..
        .
      - Jason Brownlee May 27, 2020 at 1:29 pm #
        
        It could be either over or under fit. Both could give similar behavior.
    - X Yang June 8, 2022 at 5:37 am #
      
      I just tried this using the provided MLP regression code/data and got the same result as Hoang: All 4 test cases outputting the same value. If I plot all the predicted vs. actual in the test set, every output for the prediction is the same value.
      
      I did the following modification to sanity check: I removed layers 2 and 3, and I just kept a single linear layer with the xavier uniform weight initialization, and it seems like I’m able to get an RMSE of 6.5 and with predictions on par with the actual for Hoang’s 4 cases. Plotting the predicted and actual in the test set also gave similar curves.
      
      My hypothesis is: This just means that the Boston housing data is pretty much linear and adding multiple layers overfit the data?
      Could someone else corroborate this test and hypothesis?
      
      Reply
Hoang Ngo May 27, 2020 at 2:02 pm #

BTW Jason, great site you got going here, I’m a software engineer too and starting to learn ML and I love your content. Keep up the great work, you’re doing a blessing to all devs wanting to get in the ML train ride!

Reply
- Jason Brownlee May 28, 2020 at 6:07 am #
  
  Thanks!
  
  Reply
Bruce D. Sidlinger June 6, 2020 at 7:26 am #

>>> # check pytorch version
>>> import torch
>>> print(torch.__version__)
1.5.0
>>>
(base) MacBookAir81-2:~ sidlinger$ nano bindemo.py
(base) MacBookAir81-2:~ sidlinger$ python bindemo.py
235 116
Accuracy: 0.931
Predicted: 0.999 (class=1)
(base) MacBookAir81-2:~ sidlinger$

Reply
- Jason Brownlee June 6, 2020 at 8:00 am #
  
  Grea work!
  
  Reply
Hemanth V June 21, 2020 at 11:07 pm #

Jason,

I plotted actuals Vs predictions for the Regression example and see a constant line. Tried changing learning rate, epochs, Relu, number of hidden layers but did not help. This is true for other data sets aswell, not just Boston Housing dataset. Looks like something else needs to be changed in the program, any ideas.

Thanks
Hemanth

Reply
- Jason Brownlee June 22, 2020 at 6:14 am #
  
  Sorry to hear that you are having trouble, this may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Hemanth V June 22, 2020 at 3:40 pm #
    
    Json,
    
    The code is unmodified just added a plot to evaluate_model function, pl see below. My torch version is 1.5.1 running on anaconda windows 10. All code is run from command line. Could you check once on your system, if you are seeing a constant line for predictions.
    
    Thanks
    Hemanth
    
    def evaluate_model(test_dl, model):
    predictions, actuals = list(), list()
    for i, (inputs, targets) in enumerate(test_dl):
    # evaluate the model on the test set
    yhat = model(inputs)
    # retrieve numpy array
    yhat = yhat.detach().numpy()
    actual = targets.numpy()
    actual = actual.reshape((len(actual), 1))
    # store
    predictions.append(yhat)
    actuals.append(actual)
    predictions, actuals = vstack(predictions), vstack(actuals)
    # calculate mse
    mse = mean_squared_error(actuals, predictions)
    pyplot.plot(actuals)
    pyplot.plot(predictions, color=’red’)
    pyplot.show()
    return mse
    
    Reply
    - Jason Brownlee June 23, 2020 at 6:15 am #
      
      It is possible that the model is not skilful and requires further tuning on the dataset.
      
      Reply
      - Hemanth V June 23, 2020 at 11:37 am #
        
        Can you suggest how to do further training
      - Jason Brownlee June 23, 2020 at 1:28 pm #
        
        Here are some suggestions for improving neural nets:
        https://machinelearningmastery.com/improve-deep-learning-performance/
SJ July 10, 2020 at 4:07 am #

Thank you Jason,
I tried to solve the NYC taxi problem
https://www.kaggle.com/c/new-york-city-taxi-fare-prediction

I was able to crack it even though it had Categorical Variables, however i couldn’t create a “prepare data” function as the Model requires many inputs:

class TabularModel(nn.Module):
def __init__(self, embedding_size, num_numerical_cols, output_size, layers, p=0.4):

Can you please suggest how to write the “Preapre data” when there are categorical and numerical values ?

Sample code for your reference – https://stackabuse.com/introduction-to-pytorch-for-classification/

Reply
- Jason Brownlee July 10, 2020 at 6:07 am #
  
  Sorry, I don’t have data preparation tutorials for pytorch, I cannot give you good advice off the cuff.
  
  Reply
Bunga July 12, 2020 at 9:27 am #

Hi
Thank for the great tutorial.

Anyway, I copied the mlp multiclass classification and run it in spider. However, it got the following error message.

runfile(‘E:/Exercises/master_MLPMultiClassIris.py’, wdir=’E:/Exercises’)
100 50
Traceback (most recent call last):

File “E:\Exercises\master_MLPMultiClassIris.py”, line 158, in
train_model(train_dl, model)

File “E:\Exercises\master_MLPMultiClassIris.py”, line 113, in train_model
loss = criterion(yhat, targets)

File “C:\Anaconda3\lib\site-packages\torch\nn\modules\module.py”, line 477, in __call__
result = self.forward(*input, **kwargs)

File “C:\Anaconda3\lib\site-packages\torch\nn\modules\loss.py”, line 862, in forward
ignore_index=self.ignore_index, reduction=self.reduction)

File “C:\Anaconda3\lib\site-packages\torch\nn\functional.py”, line 1550, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)

File “C:\Anaconda3\lib\site-packages\torch\nn\functional.py”, line 1407, in nll_loss
return torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

RuntimeError: Expected object of type torch.LongTensor but found type torch.IntTensor for argument #2 ‘target’

Reply
- Bunga July 12, 2020 at 9:29 am #
  
  Sorry,,, I accidently press enter before finishing my message. Would you mind suggesting me what to do in resolving this problem?
  
  Reply
- Jason Brownlee July 12, 2020 at 11:29 am #
  
  I’m sorry to hear that you’re having trouble, I have some suggestions here that might help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Bunga July 14, 2020 at 7:48 pm #
    
    Thanks a lot, Jason…
    
    I am truly sorry for instantly jumping into questioning. The previous problem has disappeared now and dealing with another problem in the same program.
    
    I would recheck it again though….
    
    Reply
    - Jason Brownlee July 15, 2020 at 8:15 am #
      
      No problem.
      
      Reply
      - Waiming July 20, 2020 at 6:47 pm #
        
        Jason,
        
        I got the same error as Bunga, only for Multi-class code. Can you help?
        
        RuntimeError Traceback (most recent call last)
        in
        149 model = MLP(4)
        150 # train the model
        –> 151 train_model(train_dl, model)
        152 # evaluate the model
        153 acc = evaluate_model(test_dl, model)
        
        in train_model(train_dl, model)
        104 yhat = model(inputs)
        105 # calculate loss
        –> 106 loss = criterion(yhat, targets)
        107 # credit assignment
        108 loss.backward()
      - Jason Brownlee July 21, 2020 at 5:59 am #
        
        Sorry to hear that, this will help:
        https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
    - Waiming July 20, 2020 at 5:32 pm #
      
      Hi Bunga,
      
      I met the same problem as yours.
      How did you solve the 1st issue?
      
      Reply
- Waiming July 20, 2020 at 7:04 pm #
  
  Bunga,
  
  How can you solve the runtime error, can you share with me? I got the same error
  
  Reply
  - Edy March 1, 2021 at 8:20 pm #
    
    hi Bunga and Waiming,
    
    the problem is targets var must be long() so change the targets variable at
    loss = criterion(yhat, targets) to loss = criterion(yhat, targets.long())
    
    # train the model
    def train_model(train_dl, model):
    # define the optimization
    criterion = CrossEntropyLoss()
    optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)
    # enumerate epochs
    for epoch in range(500):
    # enumerate mini batches
    for i, (inputs, targets) in enumerate(train_dl):
    # clear the gradients
    optimizer.zero_grad()
    # compute the model output
    yhat = model(inputs)
    # calculate loss and change targets to long()
    loss = criterion(yhat, targets.long())
    # credit assignment
    loss.backward()
    # update model weights
    optimizer.step()
    
    Reply
    - Gaurav August 26, 2021 at 11:58 pm #
      
      Changing the data type worked. Thanks.
      
      Reply
L. Wolf July 24, 2020 at 3:46 am #

Nice tuto JAson!
Please do you have an ebook on pytorch?

Reply
- Jason Brownlee July 24, 2020 at 6:36 am #
  
  Not at this stage.
  
  Reply
Haider August 24, 2020 at 1:00 am #

greetings I hope you are doing great I am getting this error while trying to run your given code for MLP multiclass classification problem

runfile(‘C:/Users/haide/OneDrive/바탕 화면/temp.py’, wdir=’C:/Users/haide/OneDrive/바탕 화면’)
100 50
Traceback (most recent call last):

File “C:\Users\haide\OneDrive\바탕 화면\temp.py”, line 150, in
train_model(train_dl, model)

File “C:\Users\haide\OneDrive\바탕 화면\temp.py”, line 105, in train_model
loss = criterion(yhat, targets)

File “C:\Users\haide\anaconda3\lib\site-packages\torch\nn\modules\module.py”, line 532, in __call__
result = self.forward(*input, **kwargs)

File “C:\Users\haide\anaconda3\lib\site-packages\torch\nn\modules\loss.py”, line 915, in forward
return F.cross_entropy(input, target, weight=self.weight,

File “C:\Users\haide\anaconda3\lib\site-packages\torch\nn\functional.py”, line 2021, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)

File “C:\Users\haide\anaconda3\lib\site-packages\torch\nn\functional.py”, line 1838, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

RuntimeError: expected scalar type Long but found Int

Any idea how to remove this error, please?

Reply
- Jason Brownlee August 24, 2020 at 6:28 am #
  
  I’m sorry to hear that.
  
  This may have some helpful ideas:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
German Mandrini August 26, 2020 at 4:25 am #

Great tutorial, very clear. I run the MLP for Regression and when I change the predictor values of the “row” I get always the same that. Is it possible that the model is predicting only one value?
Thanks

Reply
- German Mandrini August 26, 2020 at 4:27 am #
  
  *the same “yhat”. The dictionary changed it.
  
  Reply
- Jason Brownlee August 26, 2020 at 6:54 am #
  
  It is possible, in which case perhaps the model requires further tuning.
  
  Reply
JC September 6, 2020 at 5:41 pm #

Hi Jason,

I am soon enrolling a course on DL at my work, now doing some exercises in advance to better understand the logic of e.g. perceptrons. Running your MLP for MC classification, I get the error shown below (just from copy’n’paste into my IDE). What am I doing wrong? I get the 100, 50, but accuracy info is blocked by the error happening.

Traceback (most recent call last):
File “C:/Users/jcst/PycharmProjects/Deep_Learning_Projects/MLP_for_Multiclass_Classification.py”, line 166, in
train_model(train_dl, model)
File “C:/Users/jcst/PycharmProjects/Deep_Learning_Projects/MLP_for_Multiclass_Classification.py”, line 121, in train_model
loss = criterion(yhat, targets)
File “C:\Users\jcst\PycharmProjects\Deep_Learning_Projects\venv\lib\site-packages\torch\nn\modules\module.py”, line 722, in _call_impl
result = self.forward(*input, **kwargs)
File “C:\Users\jcst\PycharmProjects\Deep_Learning_Projects\venv\lib\site-packages\torch\nn\modules\loss.py”, line 948, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File “C:\Users\jcst\PycharmProjects\Deep_Learning_Projects\venv\lib\site-packages\torch\nn\functional.py”, line 2422, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File “C:\Users\jcst\PycharmProjects\Deep_Learning_Projects\venv\lib\site-packages\torch\nn\functional.py”, line 2218, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: expected scalar type Long but found Int

Reply
- Jason Brownlee September 7, 2020 at 8:28 am #
  
  Sorry to hear that, some of these suggestions may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Let me know how you go.
  
  Reply
- NN May 8, 2021 at 2:23 am #
  
  Hi JC,
  You might want to try this on Google Colab. I got the same message (“RuntimeError: expected scalar type Long but found Int”) when I tried on my Jupiter notebook, but it worked on Google Colab.
  
  Reply
Joseph Kim September 11, 2020 at 11:35 pm #

I am always thankful for your valuable articles like this!

Just one minor comment for the loss function:
The cross entropy loss (i.e., torch.nn.CrossEntropyLoss) in PyTorch combines nn.LogSoftmax() and nn.NLLLoss() in one single class, which means that you don’t need the softmax function in the case of multiclass classification problem (e.g., see the comments from ‘Oli’, ‘god_sp33d’, and so on). In fact, many PyTorch tutorials on multiclass classification do not use the softmax function for the said reason.

Reply
- Jason Brownlee September 12, 2020 at 6:15 am #
  
  Thanks!
  
  Reply
George November 9, 2020 at 5:14 pm #

Hi Jason,
How to do parameter tuning for epochs, activation function…. for pytorch?
Thanks

Reply
- Jason Brownlee November 10, 2020 at 6:36 am #
  
  Perhaps start with a simple for-loop over the values you want to compare.
  
  Reply
Nur December 27, 2020 at 10:58 pm #

This is a very well-detailed explanation Tutorial. Your Tutorial always helps me to learn more. I have toruble, I shared it on github, can you review?

https://stackoverflow.com/questions/65459540/indexerror-dimension-out-of-range-expected-to-be-in-range-of-2-1-but-got?noredirect=1#comment115731790_65459540

Reply
- Nur December 28, 2020 at 12:13 am #
  
  Datas:
  
  https://stackoverflow.com/questions/65466114/transpose-on-pytorch-indexerror-dimension-out-of-range-expected-to-be-in-ran
  
  Reply
- Jason Brownlee December 28, 2020 at 5:58 am #
  
  Thanks!
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/can-you-comment-on-my-stackoverflow-question
  
  Reply
Robnald Stauffer January 12, 2021 at 8:45 am #

Jason:
I am learning PyTorch through an on-line academic course and found your tutorial immensely helpful (as are all of your books). Please consider expanding your library with one or more Deep Learning books focused on PyTorch. You will find a very receptive audience.

Reply
- Jason Brownlee January 12, 2021 at 10:33 am #
  
  Thanks, I’m happy to hear that.
  
  Great suggestion!
  
  Reply
Tom February 3, 2021 at 10:24 pm #

Hi Jaso,

Please can you explain this line: for i, (inputs, targets) in enumerate(train_dl).

Regards,
Tom

Reply
- Jason Brownlee February 4, 2021 at 6:18 am #
  
  Sure, it is a for loop, we are enumerating “train_dl” and each iteration we get a temp variable i for the iteration number and a tuple with the inputs and targets retrieved as the i’th item from “train_dl”.
  
  I hope that helps.
  
  Reply
Persian February 13, 2021 at 3:38 am #

Hi Jason,

This is really great job.
Very well organized and clear.
Helped me a lot.

Many Thanks

Reply
- Jason Brownlee February 13, 2021 at 6:10 am #
  
  Thank you!
  
  Reply
rosasha March 21, 2021 at 5:05 pm #

extremely important inform

Reply
- Jason Brownlee March 22, 2021 at 5:28 am #
  
  Thanks.
  
  Reply
Peter April 11, 2021 at 3:23 am #

Thanks Jason, you seem to have a tutorial on everything I am ever looking for, and each of them is 10x better than anything else out there.

I’ll definitely be donating once I get a job, I really owe you.

Reply
- Jason Brownlee April 11, 2021 at 4:55 am #
  
  Thanks!
  
  Reply
Mac May 16, 2021 at 2:41 am #

Jason, I guess those who want to work with libraries like transformers should eventually move from keras to pytorch?

Or it is possible to use tensorflow transformers libraries in keras?

Reply
- Jason Brownlee May 16, 2021 at 5:35 am #
  
  I believe the TF library provides transformers. I hope to write about the topic in the future.
  
  Reply
Vijaya Yadav July 19, 2021 at 3:36 pm #

Hello Jason, will you provide me some PyTorch functions that will help me in machine learning projects?

Reply
- Jason Brownlee July 20, 2021 at 5:33 am #
  
  Perhaps in the future.
  
  Reply
Human August 17, 2021 at 2:00 am #

why reshaping of target values(y) was done in CSVDataset() in binary classification but not in multi class classification

Reply
- Adrian Tam August 17, 2021 at 8:07 am #
  
  Because the model is different. The LabelEncoder always outputs a vector of values, but if the model requires a matrix, you need to reshape a vector into a Nx1 matrix.
  
  Reply
Human August 17, 2021 at 5:32 pm #

How to know if the model requires target values as matrix or vector?
does it depends on loss function? or output layer outputs?

Reply
- Adrian Tam August 18, 2021 at 2:29 am #
  
  That depends on how the model designed, and the library used. It is better to consult the documentation of the functions you used. But I believe more likely, a matrix.
  
  Reply
Ale September 27, 2021 at 4:36 pm #

Nice,
I want to implement a model on a GPU ,Also want to detect persons in a video.
Can u help me how i can do that?

Reply
- Adrian Tam September 28, 2021 at 9:35 am #
  
  PyTorch is using CUDA for GPU. Please see examples here: https://pytorch.org/docs/stable/notes/cuda.html
  
  Reply
Winry October 12, 2021 at 2:19 am #

What is the “path” argument in __init__(self, path)?

I’m lost on where you’re actually importing the data. I see the read_csv line, but how does python know what data to import here when all you have written is “path”?

I’m wondering because I’m applying this tutorial to my own data, and when I import the data with read_csv using the actual file location (C:/Users/…), the argument “path” in __init__(self, path) is unused and I’m confused as to why it’s even there.

I also tried putting the actual file location in place of the argument “path” in __init__(self, path), but it underlines it and says “formal argument expected.”

Thanks!

Reply
Winry October 12, 2021 at 3:51 am #

Oh wow — nevermind, I did not see the path definition at the bottom of the script. I’m used to MATLAB where everything needs to be defined in a certain order.

Thanks!

Reply
- Adrian Tam October 13, 2021 at 7:29 am #
  
  Good that you found it!
  
  Reply
menahil javeed April 26, 2022 at 6:30 am #

RuntimeError: mat1 and mat2 shapes cannot be multiplied (32×0 and 34×10)
Kindly sir can you tell me how solve this error. Thanks

Reply
Nawel Ben Chaabane June 7, 2022 at 10:21 pm #

Hello,
Thanks for this great post.
I don’t understand how flatten was performed :
# flatten
X = X.view(-1, 4*4*50)
Thanks.
Best

Reply
- James Carmichael June 8, 2022 at 4:10 am #
  
  Hi Nawal…Thank you for the feedback!
  
  The following should hopefully clarify:
  
  https://wandb.ai/ayush-thakur/dl-question-bank/reports/An-Introduction-To-The-PyTorch-View-Function–VmlldzoyMDM0Nzg
  
  Reply
Alex June 21, 2022 at 4:25 am #

Good tutorial, however for some reason my Regression model is returning the following

Accuracy: 0.000
Single row prediction: 1.000
MSE: 0.000, RMSE: 0.000

when prompted to provide a single row prediction. Any help?

Thanks!

Reply
- James Carmichael June 21, 2022 at 9:43 am #
  
  Hi Alex…The following resources may help add clarity:
  
  https://stackoverflow.com/questions/58277179/accuracy-is-zero-all-the-time
  
  You may be working on a regression problem and achieve zero prediction errors.
  
  Alternately, you may be working on a classification problem and achieve 100% accuracy.
  
  This is unusual and there are many possible reasons for this, including:
  
  You are evaluating model performance on the training set by accident.
  Your hold out dataset (train or validation) is too small or unrepresentative.
  You have introduced a bug into your code and it is doing something different from what you expect.
  Your prediction problem is easy or trivial and may not require machine learning.
  The most common reason is that your hold out dataset is too small or not representative of the broader problem.
  
  This can be addressed by:
  
  Using k-fold cross-validation to estimate model performance instead of a train/test split.
  Gather more data.
  Use a different split of data for train and test, such as 50/50.
  
  Reply
Winry July 15, 2022 at 4:12 am #

Hello,

Am receiving this error on my regression when calling the prepare_data method. The issue is at:

train_dl = DataLoader(train, batch_size=32, shuffle=True)

The error is:

TypeError: Instance and class checks can only be used with @runtime_checkable protocols

Thank you

Reply
Marilyn Gambone February 20, 2023 at 12:34 am #

What is the operating system are you using for this tutorial? Windows or Linux? That’s kind of the first thing I was looking for.

Reply
- James Carmichael February 20, 2023 at 10:02 am #
  
  Hi Marilyn…We do not promote any particular OS or Python environment. Are you having issues with the code running such that we may help provide suggestions?
  
  Reply
Hafiz September 7, 2023 at 2:45 am #

thank you for this. I want to know how can Deep Learning using PyTouch be used for pit detection?

Reply
- James Carmichael September 7, 2023 at 9:15 am #
  
  Hi Hafiz…Please elaborate on what is meant by “pit detection” so that we can better assist you.
  
  Reply
Liaqat January 21, 2024 at 10:08 am #

Hi Jason,

Do you have any blog on Feaderated Learning?

Reply
- James Carmichael January 21, 2024 at 10:29 am #
  
  Hi Liaqat…We do not currently have content devoted to that topic. The following resource may be of interest:
  
  https://www.analyticsvidhya.com/blog/2021/05/federated-learning-a-beginners-guide/
  
  Reply
Nik V. September 26, 2024 at 6:32 am #

Hello Jason,
First, thank you so much for this great post! I love your posts, it is a great help for me. This is however my first writing to you.

I have a question w.r.t. this line: model = MLP(34). I understand that 34 is the input for the first layer, right? But how did you determine that it should be 34? I did not understand that.

Furthermore, when I tried using other numbers, I got an error. For example, doing MLP(33), I get the following error: “RuntimeError: mat1 and mat2 shapes cannot be multiplied (32×34 and 33×10)”

Reply
- James Carmichael September 26, 2024 at 7:24 am #
  
  Hi Nik…You are very welcome! You’re correct that the 34 in model = MLP(34) represents the number of input features for the first layer of the Multi-Layer Perceptron (MLP). This value is derived directly from the number of features (columns) in your dataset, excluding the target (label) column.
  
  ### Why 34?
  In your code, the dataset is loaded from the CSV file using this line:
  
  python df = read_csv(path, header=None)
  
  This loads all the columns in the CSV file, including the target column. The code then separates the features and the target like this:
  
  python self.X = df.values[:, :-1] # Features (all columns except the last one) self.y = df.values[:, -1] # Target (the last column)
  
  – self.X contains all the columns except the last one, meaning these are the features used for the model input.
  – In the ionosphere.csv dataset you’re using, there are 35 columns in total (based on how the data is structured), where 34 are the input features (columns) and the 35th is the label (target). Hence, the input layer of the model expects 34 features, so you need to pass 34 as the argument to the MLP when initializing it: MLP(34).
  
  ### Error with Other Input Sizes
  When you use a different number, like MLP(33), you get the following error:
  
  RuntimeError: mat1 and mat2 shapes cannot be multiplied (32×34 and 33×10)
  
  This error occurs because the input shape to the network does not match the number of features in your dataset. Specifically, the first dimension of the input tensor (which should match the number of features) is expected to be 34, but you’re passing a number that doesn’t match the actual feature count.
  
  In PyTorch, matrix multiplication (which happens during forward propagation) requires that the number of columns in the input tensor matches the number of rows in the weight matrix. If there’s a mismatch, as in your case when you pass 33 instead of 34, the operation fails because the dimensions don’t align.
  
  ### Solution:
  You need to ensure that the input to the MLP matches the number of features in your dataset. Since the dataset you’re using has 34 input features, the input to the first layer should be 34 in the MLP class:
  
  python model = MLP(34)
  
  If you change the input feature count, you also need to adjust the dataset to match the new number of features accordingly.
  
  Let me know if you’d like further clarification or have additional questions!
  
  Reply
  - Nik V. October 3, 2024 at 8:20 am #
    
    Thank you so much, James! I sincerely appreciate your detailed answer!
    Thanks a million again!
    
    Reply
Oliver Ford January 29, 2025 at 3:41 am #

Great tutorial except for the latest numpy, to avoid a deprecation warning, change the last line to:

print(‘Predicted: %.3f (class=%d)’ % (yhat.item(), yhat.round().item()))

Reply

Navigation

PyTorch Tutorial: How to Develop Deep Learning Models with Python

PyTorch Tutorial Overview

You Can Do Deep Learning in Python!

1. How to Install PyTorch

1.1. What Are Torch and PyTorch?

1.2. How to Install PyTorch

1.3. How to Confirm PyTorch Is Installed

2. PyTorch Deep Learning Model Life-Cycle

Step 1: Prepare the Data

Step 2: Define the Model

Step 3: Train the Model

Step 4: Evaluate the model

Step 5: Make predictions

3. How to Develop PyTorch Deep Learning Models

3.1. How to Develop an MLP for Binary Classification

Further Reading

Books

PyTorch Project

APIs

Summary

Get Started on Deep Learning with PyTorch!

Learn how to build deep learning models

Kick-start your deep learning journey with hands-on exercises

More On This Topic

130 Responses to PyTorch Tutorial: How to Develop Deep Learning Models with Python

Leave a Reply Click here to cancel reply.