Use PyTorch Deep Learning Models with scikit-learn

By Adrian Tam on April 8, 2023 in Deep Learning with PyTorch 10

The most popular deep learning libraries in Python for research and development are TensorFlow/Keras and PyTorch, due to their simplicity. The scikit-learn library, however, is the most popular library for general machine learning in Python. In this post, you will discover how to use deep learning models from PyTorch with the scikit-learn library in Python. This will allow you to leverage the power of the scikit-learn library for tasks like model evaluation and model hyper-parameter optimization. After completing this lesson you will know:

How to wrap a PyTorch model for use with the scikit-learn machine learning library
How to easily evaluate PyTorch models using cross-validation in scikit-learn
How to tune PyTorch model hyperparameters using grid search in scikit-learn

Kick-start your project with my book Deep Learning with PyTorch. It provides self-study tutorials with working code.

Let’s get started.

Use PyTorch Deep Learning Models with scikit-learn
Photo by Priyanka Neve. Some rights reserved.

Overview

This chapter is in four parts; they are:

Overview of skorch
Evaluate Deep Learning Models with Cross-Validation
Running k-Fold Cross-validation with scikit-learn
Grid Search Deep Learning Model Parameters

Overview of skorch

PyTorch is a popular library for deep learning in Python, but the focus of the library is deep learning, not all of machine learning. In fact, it strives for minimalism, focusing on only what you need to quickly and simply define and build deep learning models. The scikit-learn library in Python is built upon the SciPy stack for efficient numerical computation. It is a fully featured library for general purpose machine learning and provides many useful utilities in developing deep learning models. Not least of which are:

Evaluation of models using resampling methods like k-fold cross-validation
Efficient search and evaluation of model hyperparameters
Connecting multiple steps of a machine learning workflow into a pipeline

PyTorch cannot work with scikit-learn directly. But thanks to the duck-typing nature of Python language, it is easy to adapt a PyTorch model for use with scikit-learn. Indeed, the skorch module is built for this purpose. With skorch, you can make your PyTorch model work just like a scikit-learn model. You may find it easier to use.

In the following sections, you will work through examples of using the NeuralNetClassifier wrapper for a classification neural network created in PyTorch and used in the scikit-learn library. The test problem is the Sonar dataset. This is a small dataset with all numerical attributes that is easy to work with.

The following examples assume you have successfully installed PyTorch, skorch, and scikit-learn. If you use the pip for your Python modules, you may install them with:

pip install torch skorch scikit-learn

1	pip install torch skorch scikit-learn

Evaluate Deep Learning Models with Cross-Validation

The NeuralNet class, or more specialized NeuralNetClassifier, NeuralNetBinaryClassifier, and NeuralNetRegressor classes in skorch are factory wrappers for PyTorch models. They take an argument model which is a class or a function to call to get your model. In return, these wrapper classes allows you to specify loss function and optimizer, then the training loop comes for free. This is the convenience compare to using PyTorch directly.

Below is a simple example of training a binary classifier on the Sonar dataset:

import copy

import numpy as np
from sklearn.model_selection import StratifiedKFold, train_test_split

import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.preprocessing import LabelEncoder
from skorch import NeuralNetBinaryClassifier

# Read data
data = pd.read_csv("sonar.csv", header=None)
X = data.iloc[:, 0:60]
y = data.iloc[:, 60]

# Binary encoding of labels
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

# Convert to 2D PyTorch tensors
X = torch.tensor(X.values, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32)

# Define the model
class SonarClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(60, 60)
        self.act1 = nn.ReLU()
        self.layer2 = nn.Linear(60, 60)
        self.act2 = nn.ReLU()
        self.layer3 = nn.Linear(60, 60)
        self.act3 = nn.ReLU()
        self.output = nn.Linear(60, 1)

    def forward(self, x):
        x = self.act1(self.layer1(x))
        x = self.act2(self.layer2(x))
        x = self.act3(self.layer3(x))
        x = self.output(x)
        return x

# create the skorch wrapper
model = NeuralNetBinaryClassifier(
    SonarClassifier,
    criterion=torch.nn.BCEWithLogitsLoss,
    optimizer=torch.optim.Adam,
    lr=0.0001,
    max_epochs=150,
    batch_size=10
)

# run
model.fit(X, y)

import copy

import numpy as np

from sklearn.model_selection import StratifiedKFold, train_test_split

import pandas as pd

import torch

import torch.nn as nn

import torch.optim as optim

from sklearn.preprocessing import LabelEncoder

from skorch import NeuralNetBinaryClassifier

# Read data

data = pd.read_csv("sonar.csv", header=None)

X = data.iloc[:, 0:60]

y = data.iloc[:, 60]

# Binary encoding of labels

encoder = LabelEncoder()

encoder.fit(y)

y = encoder.transform(y)

# Convert to 2D PyTorch tensors

X = torch.tensor(X.values, dtype=torch.float32)

y = torch.tensor(y, dtype=torch.float32)

# Define the model

class SonarClassifier(nn.Module):

def __init__(self):

super().__init__()

self.layer1 = nn.Linear(60, 60)

self.act1 = nn.ReLU()

self.layer2 = nn.Linear(60, 60)

self.act2 = nn.ReLU()

self.layer3 = nn.Linear(60, 60)

self.act3 = nn.ReLU()

self.output = nn.Linear(60, 1)

def forward(self, x):

x = self.act1(self.layer1(x))

x = self.act2(self.layer2(x))

x = self.act3(self.layer3(x))

x = self.output(x)

return x

# create the skorch wrapper

model = NeuralNetBinaryClassifier(

SonarClassifier,

criterion=torch.nn.BCEWithLogitsLoss,

optimizer=torch.optim.Adam,

lr=0.0001,

max_epochs=150,

batch_size=10

)

# run

model.fit(X, y)

In this model, you used torch.nn.BCEWithLogitsLoss as the loss function (that is indeed the default of NeuralNetBinaryClassifier). It is to combine the sigmoid function with binary cross entropy loss, so that you don’t need to put the sigmoid function at the output of the model. It is sometimes preferred to provide better numerical stability.

In addition, you specified the training parameters such as the number of epochs and batch size in the skorch wrapper. Then you just need to call fit() function with the input feature and target. The wrapper will help you initialize a model and train it.

Running the above will produce the following:

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        0.6952       0.5476        0.6921  0.0135
      2        0.6930       0.5476        0.6920  0.0114
      3        0.6925       0.5476        0.6919  0.0104
      4        0.6922       0.5238        0.6918  0.0118
      5        0.6919       0.5238        0.6917  0.0112
...
    146        0.2942       0.4524        0.9425  0.0115
    147        0.2920       0.4524        0.9465  0.0123
    148        0.2899       0.4524        0.9495  0.0112
    149        0.2879       0.4524        0.9544  0.0121
    150        0.2859       0.4524        0.9583  0.0118

epoch train_loss valid_acc valid_loss dur

------- ------------ ----------- ------------ ------

1 0.6952 0.5476 0.6921 0.0135

2 0.6930 0.5476 0.6920 0.0114

3 0.6925 0.5476 0.6919 0.0104

4 0.6922 0.5238 0.6918 0.0118

5 0.6919 0.5238 0.6917 0.0112

...

146 0.2942 0.4524 0.9425 0.0115

147 0.2920 0.4524 0.9465 0.0123

148 0.2899 0.4524 0.9495 0.0112

149 0.2879 0.4524 0.9544 0.0121

150 0.2859 0.4524 0.9583 0.0118

Note that skorch is positioned as a wrapper for PyTorch models to adapt to scikit-learn interface. Therefore, you should use the model as if it is a scikit-learn model. For example, to train your binary classification model, it is expected the target to be a vector rather than an $n\times 1$ matrix. And to run the model for inference, you should use model.predict(X) or model.predict_proba(X). It is also why you should use NeuralNetBinaryClassifier, such that the classification-related scikit-learn functions are provided as model methods.

Want to Get Started With Deep Learning with PyTorch?

Take my free email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Running k-Fold Cross-validation with scikit-learn

Using a wrapper over your PyTorch model already save you a lot of boilerplate code on building your own training loop. But the entire suite of machine learning functions from scikit-learn is the real productivity boost.

One example is to use the model selection functions from scikit-learn. Let’s say you want to evaluate this model design with k-fold cross-validation. Normally, it means to take a dataset, split it into $k$ portions, then run a loop to select one of these portion as test set and the rest as training set to train a model from scratch and obtain an evaluation score. It is not difficult to do but you need to write several lines of code to implement these.

Indeed, we can make use of the k-fold and cross validation function from scikit-learn, as follows:

from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score

model = NeuralNetBinaryClassifier(
    SonarClassifier,
    criterion=torch.nn.BCEWithLogitsLoss,
    optimizer=torch.optim.Adam,
    lr=0.0001,
    max_epochs=150,
    batch_size=10,
    verbose=False
)

kfold = StratifiedKFold(n_splits=5, shuffle=True)
results = cross_val_score(model, X, y, cv=kfold)
print(results)

from sklearn.model_selection import StratifiedKFold

from sklearn.model_selection import cross_val_score

model = NeuralNetBinaryClassifier(

SonarClassifier,

criterion=torch.nn.BCEWithLogitsLoss,

optimizer=torch.optim.Adam,

lr=0.0001,

max_epochs=150,

batch_size=10,

verbose=False

)

kfold = StratifiedKFold(n_splits=5, shuffle=True)

results = cross_val_score(model, X, y, cv=kfold)

print(results)

The parameter verbose=False in NeuralNetBinaryClassifier is to stop the display of progress while the model is trained, since there was a lot. The above code will print the validation score, as follows:

[0.76190476 0.76190476 0.78571429 0.75609756 0.75609756]

1	[0.76190476 0.76190476 0.78571429 0.75609756 0.75609756]

These are the evaluation scores. Because it is a binary classification model, they are the average accuracy. There are five of them because it is obtained from a k-fold cross-validation with $k=5$, each for a different test set. Usually you evaluate a model with the mean and standard deviation of the cross-validation scores:

print("mean = %.3f; std = %.3f" % (results.mean(), results.std()))

1	print("mean = %.3f; std = %.3f" % (results.mean(), results.std()))

which is

mean = 0.764; std = 0.011

1	mean = 0.764; std = 0.011

A good model should produce a high score (in this case, accuracy close to 1) and low standard deviation. A high standard deviation means the model is not very consistent with different test sets.

Putting everything together, the following is the complete code:

import copy

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.preprocessing import LabelEncoder
from skorch import NeuralNetBinaryClassifier
from sklearn.model_selection import StratifiedKFold, train_test_split, cross_val_score

# Read data
data = pd.read_csv("sonar.csv", header=None)
X = data.iloc[:, 0:60]
y = data.iloc[:, 60]

# Binary encoding of labels
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

# Convert to 2D PyTorch tensors
X = torch.tensor(X.values, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32)

# Define the model
class SonarClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(60, 60)
        self.act1 = nn.ReLU()
        self.layer2 = nn.Linear(60, 60)
        self.act2 = nn.ReLU()
        self.layer3 = nn.Linear(60, 60)
        self.act3 = nn.ReLU()
        self.output = nn.Linear(60, 1)

    def forward(self, x):
        x = self.act1(self.layer1(x))
        x = self.act2(self.layer2(x))
        x = self.act3(self.layer3(x))
        x = self.output(x)
        return x

# create the skorch wrapper
model = NeuralNetBinaryClassifier(
    SonarClassifier,
    criterion=torch.nn.BCEWithLogitsLoss,
    optimizer=torch.optim.Adam,
    lr=0.0001,
    max_epochs=150,
    batch_size=10,
    verbose=False
)

# k-fold
kfold = StratifiedKFold(n_splits=5, shuffle=True)
results = cross_val_score(model, X, y, cv=kfold)
print("mean = %.3f; std = %.3f" % (results.mean(), results.std()))

import copy

import numpy as np

import pandas as pd

import torch

import torch.nn as nn

import torch.optim as optim

from sklearn.preprocessing import LabelEncoder

from skorch import NeuralNetBinaryClassifier

from sklearn.model_selection import StratifiedKFold, train_test_split, cross_val_score

# Read data

data = pd.read_csv("sonar.csv", header=None)

X = data.iloc[:, 0:60]

y = data.iloc[:, 60]

# Binary encoding of labels

encoder = LabelEncoder()

encoder.fit(y)

y = encoder.transform(y)

# Convert to 2D PyTorch tensors

X = torch.tensor(X.values, dtype=torch.float32)

y = torch.tensor(y, dtype=torch.float32)

# Define the model

class SonarClassifier(nn.Module):

def __init__(self):

super().__init__()

self.layer1 = nn.Linear(60, 60)

self.act1 = nn.ReLU()

self.layer2 = nn.Linear(60, 60)

self.act2 = nn.ReLU()

self.layer3 = nn.Linear(60, 60)

self.act3 = nn.ReLU()

self.output = nn.Linear(60, 1)

def forward(self, x):

x = self.act1(self.layer1(x))

x = self.act2(self.layer2(x))

x = self.act3(self.layer3(x))

x = self.output(x)

return x

# create the skorch wrapper

model = NeuralNetBinaryClassifier(

SonarClassifier,

criterion=torch.nn.BCEWithLogitsLoss,

optimizer=torch.optim.Adam,

lr=0.0001,

max_epochs=150,

batch_size=10,

verbose=False

)

# k-fold

kfold = StratifiedKFold(n_splits=5, shuffle=True)

results = cross_val_score(model, X, y, cv=kfold)

print("mean = %.3f; std = %.3f" % (results.mean(), results.std()))

In comparison, the following is an equivalent implementation with a neural network model in scikit-learn:

from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.neural_network import MLPClassifier
import numpy as np

# load dataset
data = pd.read_csv("sonar.csv", header=None)
# split into input (X) and output (Y) variables, in numpy arrays
X = data.iloc[:, 0:60].values
y = data.iloc[:, 60].values

# binary encoding of labels
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

# create model
model = MLPClassifier(hidden_layer_sizes=(60,60,60), activation='relu',
                      max_iter=150, batch_size=10, verbose=False)

# evaluate using 10-fold cross validation
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)
results = cross_val_score(model, X, y, cv=kfold)
print("mean = %.3f; std = %.3f" % (results.mean(), results.std()))

from sklearn.model_selection import StratifiedKFold

from sklearn.model_selection import cross_val_score

from sklearn.neural_network import MLPClassifier

import numpy as np

# load dataset

data = pd.read_csv("sonar.csv", header=None)

# split into input (X) and output (Y) variables, in numpy arrays

X = data.iloc[:, 0:60].values

y = data.iloc[:, 60].values

# binary encoding of labels

encoder = LabelEncoder()

encoder.fit(y)

y = encoder.transform(y)

# create model

model = MLPClassifier(hidden_layer_sizes=(60,60,60), activation='relu',

max_iter=150, batch_size=10, verbose=False)

# evaluate using 10-fold cross validation

kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)

results = cross_val_score(model, X, y, cv=kfold)

print("mean = %.3f; std = %.3f" % (results.mean(), results.std()))

Which you should see how skorch is to make a drop-in replacement of scikit-learn model with a model from PyTorch.

Grid Search Deep Learning Model Parameters

The previous example showed how easy it is to wrap your deep learning model from PyTorch and use it in functions from the scikit-learn library. In this example, you will go a step further. The function that you specify to the model argument when creating the NeuralNetBinaryClassifier or NeuralNetClassifier wrapper can take many arguments. You can use these arguments to further customize the construction of the model. In addition, you know you can provide arguments to the fit() function.

In this example, you will use grid search to evaluate different configurations for your neural network model and report on the combination that provides the best estimated performance. To make it interesting, let’s modify the PyTorch model such that it takes a parameter to decide how deep you want it to be:

class SonarClassifier(nn.Module):
    def __init__(self, n_layers=3):
        super().__init__()
        self.layers = []
        self.acts = []
        for i in range(n_layers):
            self.layers.append(nn.Linear(60, 60))
            self.acts.append(nn.ReLU())
            self.add_module(f"layer{i}", self.layers[-1])
            self.add_module(f"act{i}", self.acts[-1])
        self.output = nn.Linear(60, 1)

    def forward(self, x):
        for layer, act in zip(self.layers, self.acts):
            x = act(layer(x))
        x = self.output(x)
        return x

class SonarClassifier(nn.Module):

def __init__(self, n_layers=3):

super().__init__()

self.layers = []

self.acts = []

for i in range(n_layers):

self.layers.append(nn.Linear(60, 60))

self.acts.append(nn.ReLU())

self.add_module(f"layer{i}", self.layers[-1])

self.add_module(f"act{i}", self.acts[-1])

self.output = nn.Linear(60, 1)

def forward(self, x):

for layer, act in zip(self.layers, self.acts):

x = act(layer(x))

x = self.output(x)

return x

In this design, we hold the hidden layers and their activation functions in Python lists. Because the PyTorch components are not immediate attributes of the class, you will not see them in model.parameters(). That will be a problem on training. This can be mitigated by using self.add_module() to register the components. An alternative is to use nn.ModuleList() instead of a Python list, so that you provided enough clues to tell where to find the components of the model.

The skorch wrapper is still the same. With it, you can have a model compatible to scikit-learn. As you can see, there are parameters to set up the deep learning model as well as training parameters such as learning rate (lr) specified in the wrapper, you have many possible variations. The GridSearchCV function from scikit-learn is to provide grid search cross validation. You can provide a list of values for each parameter and ask scikit-learn to try out all combinations and report the best set of parameters according to the metric you specified. An example is as follows:

from sklearn.model_selection import GridSearchCV

model = NeuralNetBinaryClassifier(
    SonarClassifier,
    criterion=torch.nn.BCEWithLogitsLoss,
    optimizer=torch.optim.Adam,
    lr=0.0001,
    max_epochs=150,
    batch_size=10,
    verbose=False
)

param_grid = {
    'module__n_layers': [1, 3, 5], 
    'lr': [0.1, 0.01, 0.001, 0.0001],
    'max_epochs': [100, 150],
}

grid_search = GridSearchCV(model, param_grid, scoring='accuracy', verbose=1, cv=3)
result = grid_search.fit(X, y)

from sklearn.model_selection import GridSearchCV

model = NeuralNetBinaryClassifier(

SonarClassifier,

criterion=torch.nn.BCEWithLogitsLoss,

optimizer=torch.optim.Adam,

lr=0.0001,

max_epochs=150,

batch_size=10,

verbose=False

)

param_grid = {

'module__n_layers': [1, 3, 5],

'lr': [0.1, 0.01, 0.001, 0.0001],

'max_epochs': [100, 150],

}

grid_search = GridSearchCV(model, param_grid, scoring='accuracy', verbose=1, cv=3)

result = grid_search.fit(X, y)

You passed in model to GridSearchCV(), which is a skorch wrapper. You also passed in param_grid, which specified to vary:

the parameter n_layers in he PyTorch model (i.e., the SonarClassifier class), that controls the depth of the neural network
the parameter lr in the wrapper, that controls the learning rate at the optimizer
the parameter max_epochs in the wrapper, that controls the number of training epochs to run

Note the use of double underscore to pass on parameters to the PyTorch model. In fact, this allows you to configure other parameters too. For example, you can set up optimizer__weight_decay to pass on weight_decay parameters to the Adam optimizer (which is for setting up L2 regularization).

Running this can take a while to compute because it tries all combinations, each evaluated with 3-fold cross validation. You do not want to run this often but it can be useful for you to design models.

After the grid search is finished, the performance and combination of configurations for the best model are displayed, followed by the performance of all combinations of parameters, as below:

print("Best: %f using %s" % (result.best_score_, result.best_params_))
means = result.cv_results_['mean_test_score']
stds = result.cv_results_['std_test_score']
params = result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

print("Best: %f using %s" % (result.best_score_, result.best_params_))

means = result.cv_results_['mean_test_score']

stds = result.cv_results_['std_test_score']

params = result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

It gives:

Best: 0.649551 using {'lr': 0.001, 'max_epochs': 150, 'module__n_layers': 1}
0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 100, 'module__n_layers': 1}
0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 100, 'module__n_layers': 3}
0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 100, 'module__n_layers': 5}
0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 150, 'module__n_layers': 1}
0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 150, 'module__n_layers': 3}
0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 150, 'module__n_layers': 5}
0.644651 (0.062160) with: {'lr': 0.01, 'max_epochs': 100, 'module__n_layers': 1}
0.567495 (0.049728) with: {'lr': 0.01, 'max_epochs': 100, 'module__n_layers': 3}
0.533678 (0.003611) with: {'lr': 0.01, 'max_epochs': 100, 'module__n_layers': 5}
0.615804 (0.061966) with: {'lr': 0.01, 'max_epochs': 150, 'module__n_layers': 1}
0.620290 (0.078243) with: {'lr': 0.01, 'max_epochs': 150, 'module__n_layers': 3}
0.533678 (0.003611) with: {'lr': 0.01, 'max_epochs': 150, 'module__n_layers': 5}
0.635335 (0.108412) with: {'lr': 0.001, 'max_epochs': 100, 'module__n_layers': 1}
0.582126 (0.058072) with: {'lr': 0.001, 'max_epochs': 100, 'module__n_layers': 3}
0.563423 (0.136916) with: {'lr': 0.001, 'max_epochs': 100, 'module__n_layers': 5}
0.649551 (0.075676) with: {'lr': 0.001, 'max_epochs': 150, 'module__n_layers': 1}
0.558178 (0.071443) with: {'lr': 0.001, 'max_epochs': 150, 'module__n_layers': 3}
0.567909 (0.088623) with: {'lr': 0.001, 'max_epochs': 150, 'module__n_layers': 5}
0.557971 (0.041416) with: {'lr': 0.0001, 'max_epochs': 100, 'module__n_layers': 1}
0.587026 (0.079951) with: {'lr': 0.0001, 'max_epochs': 100, 'module__n_layers': 3}
0.606349 (0.092394) with: {'lr': 0.0001, 'max_epochs': 100, 'module__n_layers': 5}
0.563147 (0.099652) with: {'lr': 0.0001, 'max_epochs': 150, 'module__n_layers': 1}
0.534023 (0.057187) with: {'lr': 0.0001, 'max_epochs': 150, 'module__n_layers': 3}
0.634921 (0.057235) with: {'lr': 0.0001, 'max_epochs': 150, 'module__n_layers': 5}

Best: 0.649551 using {'lr': 0.001, 'max_epochs': 150, 'module__n_layers': 1}

0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 100, 'module__n_layers': 1}

0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 100, 'module__n_layers': 3}

0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 100, 'module__n_layers': 5}

0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 150, 'module__n_layers': 1}

0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 150, 'module__n_layers': 3}

0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 150, 'module__n_layers': 5}

0.644651 (0.062160) with: {'lr': 0.01, 'max_epochs': 100, 'module__n_layers': 1}

0.567495 (0.049728) with: {'lr': 0.01, 'max_epochs': 100, 'module__n_layers': 3}

0.533678 (0.003611) with: {'lr': 0.01, 'max_epochs': 100, 'module__n_layers': 5}

0.615804 (0.061966) with: {'lr': 0.01, 'max_epochs': 150, 'module__n_layers': 1}

0.620290 (0.078243) with: {'lr': 0.01, 'max_epochs': 150, 'module__n_layers': 3}

0.533678 (0.003611) with: {'lr': 0.01, 'max_epochs': 150, 'module__n_layers': 5}

0.635335 (0.108412) with: {'lr': 0.001, 'max_epochs': 100, 'module__n_layers': 1}

0.582126 (0.058072) with: {'lr': 0.001, 'max_epochs': 100, 'module__n_layers': 3}

0.563423 (0.136916) with: {'lr': 0.001, 'max_epochs': 100, 'module__n_layers': 5}

0.649551 (0.075676) with: {'lr': 0.001, 'max_epochs': 150, 'module__n_layers': 1}

0.558178 (0.071443) with: {'lr': 0.001, 'max_epochs': 150, 'module__n_layers': 3}

0.567909 (0.088623) with: {'lr': 0.001, 'max_epochs': 150, 'module__n_layers': 5}

0.557971 (0.041416) with: {'lr': 0.0001, 'max_epochs': 100, 'module__n_layers': 1}

0.587026 (0.079951) with: {'lr': 0.0001, 'max_epochs': 100, 'module__n_layers': 3}

0.606349 (0.092394) with: {'lr': 0.0001, 'max_epochs': 100, 'module__n_layers': 5}

0.563147 (0.099652) with: {'lr': 0.0001, 'max_epochs': 150, 'module__n_layers': 1}

0.534023 (0.057187) with: {'lr': 0.0001, 'max_epochs': 150, 'module__n_layers': 3}

0.634921 (0.057235) with: {'lr': 0.0001, 'max_epochs': 150, 'module__n_layers': 5}

This might take about 5 minutes to complete on your workstation executed on the CPU (rather than GPU). Running the example shows the results below. You can see that the grid search discovered that using a learning rate of 0.001 with 150 epochs and only a single hidden layer achieved the best cross-validation score of approximately 65% on this problem.

In fact, you can see if you can improve the result by first standardizing input features. Since the wrapper allows you to use PyTorch model with scikit-learn, you can also use the scikit-learn’s standardizer in realtime, and create a machine learning pipeline:

from sklearn.pipeline import Pipeline, FunctionTransformer
from sklearn.preprocessing import StandardScaler

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('float32', FunctionTransformer(func=lambda X: torch.tensor(X, dtype=torch.float32),
                                    validate=False)),
    ('sonarmodel', model.initialize()),
])

from sklearn.pipeline import Pipeline, FunctionTransformer

from sklearn.preprocessing import StandardScaler

pipe = Pipeline([

('scaler', StandardScaler()),

('float32', FunctionTransformer(func=lambda X: torch.tensor(X, dtype=torch.float32),

validate=False)),

('sonarmodel', model.initialize()),

])

The new object pipe you created is another scikit-learn model that works just like the model object, except a standard scaler is applied before the data is passed on to the neural network. Therefore you can run a grid search on this pipeline, with a little tweak on the way parameters are specified:

param_grid = {
    'sonarmodel__module__n_layers': [1, 3, 5], 
    'sonarmodel__lr': [0.1, 0.01, 0.001, 0.0001],
    'sonarmodel__max_epochs': [100, 150],
}

grid_search = GridSearchCV(pipe, param_grid, scoring='accuracy', verbose=1, cv=3)
result = grid_search.fit(X, y)
print("Best: %f using %s" % (result.best_score_, result.best_params_))
means = result.cv_results_['mean_test_score']
stds = result.cv_results_['std_test_score']
params = result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

param_grid = {

'sonarmodel__module__n_layers': [1, 3, 5],

'sonarmodel__lr': [0.1, 0.01, 0.001, 0.0001],

'sonarmodel__max_epochs': [100, 150],

}

grid_search = GridSearchCV(pipe, param_grid, scoring='accuracy', verbose=1, cv=3)

result = grid_search.fit(X, y)

print("Best: %f using %s" % (result.best_score_, result.best_params_))

means = result.cv_results_['mean_test_score']

stds = result.cv_results_['std_test_score']

params = result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

Two key points to note here: Since PyTorch models are running on 32-bit floats by default but NumPy arrays are usually 64-bit floats. These data types are not aligned, but scikit-learn’s scaler always return you a NumPy array. Therefore you need to do type conversion in the middle of the pipeline, using a FunctionTransformer object.

Moreover, in a scikit-learn pipeline, each step is referred by a name, such as scaler and sonarmodel. Therefore, the parameters set for the pipeline need to carry the name as well. In the example above, we use sonarmodel__module__n_layers as a parameter for grid search. This refers to the sonarmodel part of the pipeline (which is your skorch wrapper), the module part therein (which is your PyTorch model), and its n_layers parameter. Note the use of double underscore for hierarchy separation.

Putting everything together, the following is the complete code:

import copy

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import GridSearchCV, StratifiedKFold, train_test_split, cross_val_score
from sklearn.pipeline import Pipeline, FunctionTransformer
from sklearn.preprocessing import StandardScaler, LabelEncoder
from skorch import NeuralNetBinaryClassifier

# Read data
data = pd.read_csv("sonar.csv", header=None)
X = data.iloc[:, 0:60]
y = data.iloc[:, 60]

# Binary encoding of labels
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

# Convert to 2D PyTorch tensors
X = torch.tensor(X.values, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32)


class SonarClassifier(nn.Module):
    def __init__(self, n_layers=3):
        super().__init__()
        self.layers = []
        self.acts = []
        for i in range(n_layers):
            self.layers.append(nn.Linear(60, 60))
            self.acts.append(nn.ReLU())
            self.add_module(f"layer{i}", self.layers[-1])
            self.add_module(f"act{i}", self.acts[-1])
        self.output = nn.Linear(60, 1)

    def forward(self, x):
        for layer, act in zip(self.layers, self.acts):
            x = act(layer(x))
        x = self.output(x)
        return x

model = NeuralNetBinaryClassifier(
    SonarClassifier,
    criterion=torch.nn.BCEWithLogitsLoss,
    optimizer=torch.optim.Adam,
    lr=0.0001,
    max_epochs=150,
    batch_size=10,
    verbose=False
)

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('float32', FunctionTransformer(func=lambda X: torch.tensor(X, dtype=torch.float32),
                                    validate=False)),
    ('sonarmodel', model.initialize()),
])

param_grid = {
    'sonarmodel__module__n_layers': [1, 3, 5], 
    'sonarmodel__lr': [0.1, 0.01, 0.001, 0.0001],
    'sonarmodel__max_epochs': [100, 150],
}

grid_search = GridSearchCV(pipe, param_grid, scoring='accuracy', verbose=1, cv=3)
result = grid_search.fit(X, y)
print("Best: %f using %s" % (result.best_score_, result.best_params_))
means = result.cv_results_['mean_test_score']
stds = result.cv_results_['std_test_score']
params = result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

import copy

import numpy as np

import pandas as pd

import torch

import torch.nn as nn

import torch.optim as optim

from sklearn.model_selection import GridSearchCV, StratifiedKFold, train_test_split, cross_val_score

from sklearn.pipeline import Pipeline, FunctionTransformer

from sklearn.preprocessing import StandardScaler, LabelEncoder

from skorch import NeuralNetBinaryClassifier

# Read data

data = pd.read_csv("sonar.csv", header=None)

X = data.iloc[:, 0:60]

y = data.iloc[:, 60]

# Binary encoding of labels

encoder = LabelEncoder()

encoder.fit(y)

y = encoder.transform(y)

# Convert to 2D PyTorch tensors

X = torch.tensor(X.values, dtype=torch.float32)

y = torch.tensor(y, dtype=torch.float32)

class SonarClassifier(nn.Module):

def __init__(self, n_layers=3):

super().__init__()

self.layers = []

self.acts = []

for i in range(n_layers):

self.layers.append(nn.Linear(60, 60))

self.acts.append(nn.ReLU())

self.add_module(f"layer{i}", self.layers[-1])

self.add_module(f"act{i}", self.acts[-1])

self.output = nn.Linear(60, 1)

def forward(self, x):

for layer, act in zip(self.layers, self.acts):

x = act(layer(x))

x = self.output(x)

return x

model = NeuralNetBinaryClassifier(

SonarClassifier,

criterion=torch.nn.BCEWithLogitsLoss,

optimizer=torch.optim.Adam,

lr=0.0001,

max_epochs=150,

batch_size=10,

verbose=False

)

pipe = Pipeline([

('scaler', StandardScaler()),

('float32', FunctionTransformer(func=lambda X: torch.tensor(X, dtype=torch.float32),

validate=False)),

('sonarmodel', model.initialize()),

])

param_grid = {

'sonarmodel__module__n_layers': [1, 3, 5],

'sonarmodel__lr': [0.1, 0.01, 0.001, 0.0001],

'sonarmodel__max_epochs': [100, 150],

}

grid_search = GridSearchCV(pipe, param_grid, scoring='accuracy', verbose=1, cv=3)

result = grid_search.fit(X, y)

print("Best: %f using %s" % (result.best_score_, result.best_params_))

means = result.cv_results_['mean_test_score']

stds = result.cv_results_['std_test_score']

params = result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

Summary

In this chapter, you discovered how to wrap your PyTorch deep learning models and use them in the scikit-learn general machine learning library. You learned:

Specifically how to wrap PyTorch models so that they can be used with the scikit-learn machine learning library.
How to use a wrapped PyTorch model as part of evaluating model performance in scikit-learn.
How to perform hyperparameter tuning in scikit-learn using a wrapped PyTorch model.

You can see that using scikit-learn for standard machine learning operations such as model evaluation and model hyperparameter optimization can save a lot of time over implementing these schemes yourself. Wrapping your model allowed you to leverage powerful tools from scikit-learn to fit your deep learning models into your general machine learning process.

10 Responses to Use PyTorch Deep Learning Models with scikit-learn

james February 14, 2023 at 1:50 pm #

Thanks for the tutorial.

I faced the following error when executing the code.

—————————————————————————
TypeError Traceback (most recent call last)
Cell In[15], line 7
4 y = encoder.transform(y)
6 # Convert to 2D PyTorch tensors
—-> 7 X = torch.tensor(X.values, dtype=torch.float32)
8 y = torch.tensor(y, dtype=torch.float32)

TypeError: can’t convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

- Adrian Tam February 15, 2023 at 4:29 am #
  
  Check what was X before this line. Probably you read some non-number into it.
  
Geoff Hardy March 26, 2023 at 2:04 am #

Hi James!
Thanks for the tutorial… just a typo comment… FunctionTransformer should be imported from sklearn.preprocessing ???

- James Carmichael March 26, 2023 at 10:29 am #
  
  Hi Geoff…You are correct! Thank you for the feedback!
  
zhao hongwei April 15, 2023 at 11:27 pm #

how to use the gridsearchcv and neuralnetclassifier optimize the weight_decay and lr_decay

- James Carmichael April 16, 2023 at 9:26 am #
  
  Hi Zhoa…You can learn more here:
  
  https://machinelearningmastery.com/how-to-grid-search-hyperparameters-for-pytorch-models/
  
WuGang February 15, 2024 at 7:08 pm #

Why is the same hyperparameter trained by MLPClassifier so much better than by torch

- James Carmichael February 16, 2024 at 10:32 am #
  
  Hi WuGang…What are the differences in accuracy? Is this difference seen on average of multiple executions of training?
  
shadow_ February 18, 2024 at 10:09 pm #

It seems that pipe does not need to call the initialization method of model, directly written (‘sonarmodel’, model) can be, because GridSearchCV has automatic initialization, plus initialize some cases will be prone to serialization problems

- James Carmichael February 19, 2024 at 8:21 am #
  
  Hi shadow_…Thank you for your feedback!

Navigation

Use PyTorch Deep Learning Models with scikit-learn

Overview

Overview of skorch

Evaluate Deep Learning Models with Cross-Validation

Want to Get Started With Deep Learning with PyTorch?

Running k-Fold Cross-validation with scikit-learn

Grid Search Deep Learning Model Parameters

Further Reading

Online Resources

Summary

Get Started on Deep Learning with PyTorch!

Learn how to build deep learning models

Kick-start your deep learning journey with hands-on exercises

More On This Topic

10 Responses to Use PyTorch Deep Learning Models with scikit-learn

Leave a Reply Click here to cancel reply.