Understand Model Behavior During Training by Visualizing Metrics

By Adrian Tam on April 8, 2023 in Deep Learning with PyTorch 0

You can learn a lot about neural networks and deep learning models by observing their performance over time during training. For example, if you see the training accuracy went worse with training epochs, you know you have issue with the optimization. Probably your learning rate is too fast. In this post, you will discover how you can review and visualize the performance of PyTorch models over time during training. After completing this post, you will know:

What metrics to collect during training
How to plot the metrics on training and validation datasets from training
How to interpret the plot to tell about the model and training progress

Kick-start your project with my book Deep Learning with PyTorch. It provides self-study tutorials with working code.

Let’s get started.

Understand Model Behavior During Training by Visualizing Metrics
Photo by Alison Pang. Some rights reserved.

Overview

This chapter is in two parts; they are:

Collecting Metrics from a Training Loop
Plotting the Training History

Collecting Metrics from a Training Loop

In deep learning, training a model with gradient descent algorithm means to take a forward pass to infer loss metric from the input using the model and a loss function, then a backward pass to compute the gradient from the loss metric, and a update process to apply the gradient to update the model parameters. While these are the basic steps you must take, you can do a bit more along the process to collect additional information.

A model that trained correctly should expect the loss metric to decrease, as the loss is the objective to optimize. The loss metric to use should depends on the problem.

For regression problems, the closer the model’s prediction to the actual value the better. Therefore you want to keep track on the mean square error (MSE), or sometimes root mean square error (RMSE), mean absolute error (MAE), or mean absolute percentage error (MAPE). Although not used as a loss metric, you may also interested in the maximum error produced by your model.

For classification problems, usually the loss metric is cross entropy. But the value of cross entropy is not very intuitive. Therefore you may also want to keep track on the accuracy of prediction, true positive rate, precision, recall, F1 scores, and so on.

Collecting these metrics from a training loop is trivial. Let’s start with a basic regression example of deep learning using PyTorch with the California housing dataset:

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Read data
data = fetch_california_housing()
X, y = data.data, data.target

# train-test split for model evaluation
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# Standardizing data
scaler = StandardScaler()
scaler.fit(X_train_raw)
X_train = scaler.transform(X_train_raw)
X_test = scaler.transform(X_test_raw)

# Convert to 2D PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32).reshape(-1, 1)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32).reshape(-1, 1)

# Define the model
model = nn.Sequential(
    nn.Linear(8, 24),
    nn.ReLU(),
    nn.Linear(24, 12),
    nn.ReLU(),
    nn.Linear(12, 6),
    nn.ReLU(),
    nn.Linear(6, 1)
)

# loss function and optimizer
loss_fn = nn.MSELoss()  # mean square error
optimizer = optim.Adam(model.parameters(), lr=0.001)

n_epochs = 100   # number of epochs to run
batch_size = 32  # size of each batch
batch_start = torch.arange(0, len(X_train), batch_size)

for epoch in range(n_epochs):
    for start in batch_start:
        # take a batch
        X_batch = X_train[start:start+batch_size]
        y_batch = y_train[start:start+batch_size]
        # forward pass
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        # backward pass
        optimizer.zero_grad()
        loss.backward()
        # update weights
        optimizer.step()

import torch

import torch.nn as nn

import torch.optim as optim

from sklearn.datasets import fetch_california_housing

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

# Read data

data = fetch_california_housing()

X, y = data.data, data.target

# train-test split for model evaluation

X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# Standardizing data

scaler = StandardScaler()

scaler.fit(X_train_raw)

X_train = scaler.transform(X_train_raw)

X_test = scaler.transform(X_test_raw)

# Convert to 2D PyTorch tensors

X_train = torch.tensor(X_train, dtype=torch.float32)

y_train = torch.tensor(y_train, dtype=torch.float32).reshape(-1, 1)

X_test = torch.tensor(X_test, dtype=torch.float32)

y_test = torch.tensor(y_test, dtype=torch.float32).reshape(-1, 1)

# Define the model

model = nn.Sequential(

nn.Linear(8, 24),

nn.ReLU(),

nn.Linear(24, 12),

nn.ReLU(),

nn.Linear(12, 6),

nn.ReLU(),

nn.Linear(6, 1)

)

# loss function and optimizer

loss_fn = nn.MSELoss() # mean square error

optimizer = optim.Adam(model.parameters(), lr=0.001)

n_epochs = 100 # number of epochs to run

batch_size = 32 # size of each batch

batch_start = torch.arange(0, len(X_train), batch_size)

for epoch in range(n_epochs):

for start in batch_start:

# take a batch

X_batch = X_train[start:start+batch_size]

y_batch = y_train[start:start+batch_size]

# forward pass

y_pred = model(X_batch)

loss = loss_fn(y_pred, y_batch)

# backward pass

optimizer.zero_grad()

loss.backward()

# update weights

optimizer.step()

This implementation is primitive, but you obtained loss as a tensor in each step in the process which provides hints to the optimizer to improve the model. To know about the progress of the training, you can, of course, print this loss metric at every step. But you can also save this value so you can visualize it later. When you do that, beware that you do not want to save a tensor but simply its value. It is because the PyTorch tensor here remembers how it comes with its value so automatic differentiation can be done. These additional data are occupying memory but you do not need them.

Hence you can modify the training loop to the following:

mse_history = []

for epoch in range(n_epochs):
    for start in batch_start:
        # take a batch
        X_batch = X_train[start:start+batch_size]
        y_batch = y_train[start:start+batch_size]
        # forward pass
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        mse_history.append(float(loss))
        # backward pass
        optimizer.zero_grad()
        loss.backward()
        # update weights
        optimizer.step()

mse_history = []

for epoch in range(n_epochs):

for start in batch_start:

# take a batch

X_batch = X_train[start:start+batch_size]

y_batch = y_train[start:start+batch_size]

# forward pass

y_pred = model(X_batch)

loss = loss_fn(y_pred, y_batch)

mse_history.append(float(loss))

# backward pass

optimizer.zero_grad()

loss.backward()

# update weights

optimizer.step()

In training a model, you should evaluate it with a test set which is segregated from the training set. Usually it is done once in an epoch, after all the training steps in that epoch. The test result can also be saved for visualization later. In fact, you can obtain multiple metrics from the test set if you want to. Hence you can add to the training loop as follows:

mae_fn = nn.L1Loss()  # create a function to compute MAE
train_mse_history = []
test_mse_history = []
test_mae_history = []

for epoch in range(n_epochs):
    model.train()
    for start in batch_start:
        # take a batch
        X_batch = X_train[start:start+batch_size]
        y_batch = y_train[start:start+batch_size]
        # forward pass
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        train_mse_history.append(float(loss))
        # backward pass
        optimizer.zero_grad()
        loss.backward()
        # update weights
        optimizer.step()
    # validate model on test set
    model.eval()
    with torch.no_grad():
        y_pred = model(X_test)
        mse = loss_fn(y_pred, y_test)
        mae = mae_fn(y_pred, y_test)
        test_mse_history.append(float(mse))
        test_mae_history.append(float(mae))

mae_fn = nn.L1Loss() # create a function to compute MAE

train_mse_history = []

test_mse_history = []

test_mae_history = []

for epoch in range(n_epochs):

model.train()

for start in batch_start:

# take a batch

X_batch = X_train[start:start+batch_size]

y_batch = y_train[start:start+batch_size]

# forward pass

y_pred = model(X_batch)

loss = loss_fn(y_pred, y_batch)

train_mse_history.append(float(loss))

# backward pass

optimizer.zero_grad()

loss.backward()

# update weights

optimizer.step()

# validate model on test set

model.eval()

with torch.no_grad():

y_pred = model(X_test)

mse = loss_fn(y_pred, y_test)

mae = mae_fn(y_pred, y_test)

test_mse_history.append(float(mse))

test_mae_history.append(float(mae))

You can define your own function to compute the metrics or use one that already implemented from PyTorch library. It is a good practice to switch the model to evaluation mode on evaluation. It is also good practice to run the evaluation under the no_grad() context, in which you explicitly tell PyTorch that you have no intention to run automatic differentiation on the tensors.

However, there is a problem in the code above: The MSE from training set is computed once per training step based on one batch while the metrics from the test set are computed once per epoch and based on the entire test set. They are not directly comparable. In fact, if you look a the MSE from training steps, you will find it very noisy. The better way is to summarize the MSE from the same epoch to one number (e.g., their mean) so you can compare to the test set’s data.

Making this change, following is the complete code:

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Read data
data = fetch_california_housing()
X, y = data.data, data.target

# train-test split for model evaluation
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# Standardizing data
scaler = StandardScaler()
scaler.fit(X_train_raw)
X_train = scaler.transform(X_train_raw)
X_test = scaler.transform(X_test_raw)

# Convert to 2D PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32).reshape(-1, 1)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32).reshape(-1, 1)

# Define the model
model = nn.Sequential(
    nn.Linear(8, 24),
    nn.ReLU(),
    nn.Linear(24, 12),
    nn.ReLU(),
    nn.Linear(12, 6),
    nn.ReLU(),
    nn.Linear(6, 1)
)

# loss function, metrics, and optimizer
loss_fn = nn.MSELoss()  # mean square error
mae_fn = nn.L1Loss()  # mean absolute error
optimizer = optim.Adam(model.parameters(), lr=0.001)

n_epochs = 100   # number of epochs to run
batch_size = 32  # size of each batch
batch_start = torch.arange(0, len(X_train), batch_size)

train_mse_history = []
test_mse_history = []
test_mae_history = []

for epoch in range(n_epochs):
    model.train()
    epoch_mse = []
    for start in batch_start:
        # take a batch
        X_batch = X_train[start:start+batch_size]
        y_batch = y_train[start:start+batch_size]
        # forward pass
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        epoch_mse.append(float(loss))
        # backward pass
        optimizer.zero_grad()
        loss.backward()
        # update weights
        optimizer.step()
    mean_mse = sum(epoch_mse) / len(epoch_mse)
    train_mse_history.append(mean_mse)
    # validate model on test set
    model.eval()
    with torch.no_grad():
        y_pred = model(X_test)
        mse = loss_fn(y_pred, y_test)
        mae = mae_fn(y_pred, y_test)
        test_mse_history.append(float(mse))
        test_mae_history.append(float(mae))

import torch

import torch.nn as nn

import torch.optim as optim

from sklearn.datasets import fetch_california_housing

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

# Read data

data = fetch_california_housing()

X, y = data.data, data.target

# train-test split for model evaluation

X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# Standardizing data

scaler = StandardScaler()

scaler.fit(X_train_raw)

X_train = scaler.transform(X_train_raw)

X_test = scaler.transform(X_test_raw)

# Convert to 2D PyTorch tensors

X_train = torch.tensor(X_train, dtype=torch.float32)

y_train = torch.tensor(y_train, dtype=torch.float32).reshape(-1, 1)

X_test = torch.tensor(X_test, dtype=torch.float32)

y_test = torch.tensor(y_test, dtype=torch.float32).reshape(-1, 1)

# Define the model

model = nn.Sequential(

nn.Linear(8, 24),

nn.ReLU(),

nn.Linear(24, 12),

nn.ReLU(),

nn.Linear(12, 6),

nn.ReLU(),

nn.Linear(6, 1)

)

# loss function, metrics, and optimizer

loss_fn = nn.MSELoss() # mean square error

mae_fn = nn.L1Loss() # mean absolute error

optimizer = optim.Adam(model.parameters(), lr=0.001)

n_epochs = 100 # number of epochs to run

batch_size = 32 # size of each batch

batch_start = torch.arange(0, len(X_train), batch_size)

train_mse_history = []

test_mse_history = []

test_mae_history = []

for epoch in range(n_epochs):

model.train()

epoch_mse = []

for start in batch_start:

# take a batch

X_batch = X_train[start:start+batch_size]

y_batch = y_train[start:start+batch_size]

# forward pass

y_pred = model(X_batch)

loss = loss_fn(y_pred, y_batch)

epoch_mse.append(float(loss))

# backward pass

optimizer.zero_grad()

loss.backward()

# update weights

optimizer.step()

mean_mse = sum(epoch_mse) / len(epoch_mse)

train_mse_history.append(mean_mse)

# validate model on test set

model.eval()

with torch.no_grad():

y_pred = model(X_test)

mse = loss_fn(y_pred, y_test)

mae = mae_fn(y_pred, y_test)

test_mse_history.append(float(mse))

test_mae_history.append(float(mae))

Want to Get Started With Deep Learning with PyTorch?

Take my free email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Plotting the Training History

In the code above, you collected metrics in a Python list, one each per epoch. Therefore, it is trivial to plot them into a line graph using matplotlib. Below is an example:

import matplotlib.pyplot as plt
import numpy as np

plt.plot(np.sqrt(train_mse_history), label="Train RMSE")
plt.plot(np.sqrt(test_mse_history), label="Test RMSE")
plt.plot(test_mae_history, label="Test MAE")
plt.xlabel("epochs")
plt.legend()
plt.show()

import matplotlib.pyplot as plt

import numpy as np

plt.plot(np.sqrt(train_mse_history), label="Train RMSE")

plt.plot(np.sqrt(test_mse_history), label="Test RMSE")

plt.plot(test_mae_history, label="Test MAE")

plt.xlabel("epochs")

plt.legend()

plt.show()

It plots, for example, the following:

Plots like this can provide an indication of useful things about the training of the model, such as:

Its speed of convergence over epochs (slope)
Whether the model may have already converged (plateau of the line)
Whether the model may be over-learning the training data (inflection for validation line)

In a regression example like the above, the metrics MAE and MSE should both decrease if the model gets better. In a classification example, however, accuracy metric should increase while the cross entropy loss should decrease as more training has been done. This is what you are expected to see from the plot.

These curves should eventually flatten, meaning you cannot improve the model any further based on the current dataset, model design, and algorithms. You want this to happen as soon as possible, so your model converge faster as your training is efficient. You also want the metric to flatten at a high accuracy or low loss region, so your model is effective in prediction.

The other property to watch for in the plots is how different are the metrics from training and validation. In the above, you see the training set’s RMSE is higher than test set’s RMSE at the beginning but very soon, the curves crossed and the test set’s RMSE is higher at the end. This is expected, as eventually the model will fit better to the training set but it is the test set that can predict how the model performs on future, unseen data.

You need to be careful to interpret the curves or metrics in a microscopic scale. In the plot above, you see that the training set’s RMSE is extremely large compare to that of test set’s in epoch 0. Their difference may not be that drastic, but since you collected the training set’s RMSE by taking the MSE of each steps during the first epoch, your model probably not doing well in the first few steps but much better at the last few steps of the epoch. Taking average across all the steps may not be a fair comparison as the MSE from test set is based on the model after the last step.

Your model is overfit if you see the training set’s metric is much better than that from test set. This can hint that you should stop your training at an earlier epoch or your model’s design need some regularization, such as dropout layer.

In the plot above, while you collected mean square error (MSE) for the regression problem but you plotted root mean square error (RMSE) instead, so you can compare to the mean absolute error (MAE) in the same scale. Probably you should also collect the MAE of the training set as well. The two MAE curves should behave similarly to that of the RMSE curves.

Putting everything together, the following is the complete code:

import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Read data
data = fetch_california_housing()
X, y = data.data, data.target

# train-test split for model evaluation
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# Standardizing data
scaler = StandardScaler()
scaler.fit(X_train_raw)
X_train = scaler.transform(X_train_raw)
X_test = scaler.transform(X_test_raw)

# Convert to 2D PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32).reshape(-1, 1)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32).reshape(-1, 1)

# Define the model
model = nn.Sequential(
    nn.Linear(8, 24),
    nn.ReLU(),
    nn.Linear(24, 12),
    nn.ReLU(),
    nn.Linear(12, 6),
    nn.ReLU(),
    nn.Linear(6, 1)
)

# loss function, metrics, and optimizer
loss_fn = nn.MSELoss()  # mean square error
mae_fn = nn.L1Loss()  # mean absolute error
optimizer = optim.Adam(model.parameters(), lr=0.001)

n_epochs = 100   # number of epochs to run
batch_size = 32  # size of each batch
batch_start = torch.arange(0, len(X_train), batch_size)

train_mse_history = []
test_mse_history = []
test_mae_history = []

for epoch in range(n_epochs):
    model.train()
    epoch_mse = []
    for start in batch_start:
        # take a batch
        X_batch = X_train[start:start+batch_size]
        y_batch = y_train[start:start+batch_size]
        # forward pass
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        epoch_mse.append(float(loss))
        # backward pass
        optimizer.zero_grad()
        loss.backward()
        # update weights
        optimizer.step()
    mean_mse = sum(epoch_mse) / len(epoch_mse)
    train_mse_history.append(mean_mse)
    # validate model on test set
    model.eval()
    with torch.no_grad():
        y_pred = model(X_test)
        mse = loss_fn(y_pred, y_test)
        mae = mae_fn(y_pred, y_test)
        test_mse_history.append(float(mse))
        test_mae_history.append(float(mae))

plt.plot(np.sqrt(train_mse_history), label="Train RMSE")
plt.plot(np.sqrt(test_mse_history), label="Test RMSE")
plt.plot(test_mae_history, label="Test MAE")
plt.xlabel("epochs")
plt.legend()
plt.show()

import matplotlib.pyplot as plt

import numpy as np

import torch

import torch.nn as nn

import torch.optim as optim

from sklearn.datasets import fetch_california_housing

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

# Read data

data = fetch_california_housing()

X, y = data.data, data.target

# train-test split for model evaluation

X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# Standardizing data

scaler = StandardScaler()

scaler.fit(X_train_raw)

X_train = scaler.transform(X_train_raw)

X_test = scaler.transform(X_test_raw)

# Convert to 2D PyTorch tensors

X_train = torch.tensor(X_train, dtype=torch.float32)

y_train = torch.tensor(y_train, dtype=torch.float32).reshape(-1, 1)

X_test = torch.tensor(X_test, dtype=torch.float32)

y_test = torch.tensor(y_test, dtype=torch.float32).reshape(-1, 1)

# Define the model

model = nn.Sequential(

nn.Linear(8, 24),

nn.ReLU(),

nn.Linear(24, 12),

nn.ReLU(),

nn.Linear(12, 6),

nn.ReLU(),

nn.Linear(6, 1)

)

# loss function, metrics, and optimizer

loss_fn = nn.MSELoss() # mean square error

mae_fn = nn.L1Loss() # mean absolute error

optimizer = optim.Adam(model.parameters(), lr=0.001)

n_epochs = 100 # number of epochs to run

batch_size = 32 # size of each batch

batch_start = torch.arange(0, len(X_train), batch_size)

train_mse_history = []

test_mse_history = []

test_mae_history = []

for epoch in range(n_epochs):

model.train()

epoch_mse = []

for start in batch_start:

# take a batch

X_batch = X_train[start:start+batch_size]

y_batch = y_train[start:start+batch_size]

# forward pass

y_pred = model(X_batch)

loss = loss_fn(y_pred, y_batch)

epoch_mse.append(float(loss))

# backward pass

optimizer.zero_grad()

loss.backward()

# update weights

optimizer.step()

mean_mse = sum(epoch_mse) / len(epoch_mse)

train_mse_history.append(mean_mse)

# validate model on test set

model.eval()

with torch.no_grad():

y_pred = model(X_test)

mse = loss_fn(y_pred, y_test)

mae = mae_fn(y_pred, y_test)

test_mse_history.append(float(mse))

test_mae_history.append(float(mae))

plt.plot(np.sqrt(train_mse_history), label="Train RMSE")

plt.plot(np.sqrt(test_mse_history), label="Test RMSE")

plt.plot(test_mae_history, label="Test MAE")

plt.xlabel("epochs")

plt.legend()

plt.show()

Summary

In this chapter, you discovered the importance of collecting and reviewing metrics while training your deep learning models. You learned:

What metrics to look for during model training
How to compute and collect metrics in a PyTorch training loop
How to visualize the metrics from a training loop
How to interpret the metrics to infer details about the training experience

Navigation

Understand Model Behavior During Training by Visualizing Metrics

Overview

Collecting Metrics from a Training Loop

Want to Get Started With Deep Learning with PyTorch?

Plotting the Training History

Further Readings

APIs

Summary

Get Started on Deep Learning with PyTorch!

Learn how to build deep learning models

Kick-start your deep learning journey with hands-on exercises

More On This Topic

No comments yet.

Leave a Reply Click here to cancel reply.