[New Book] Click to get The Beginner's Guide to Data Science!
Use the offer code 20offearlybird to get 20% off. Hurry, sale ends soon!

Use PyTorch Deep Learning Models with scikit-learn

The most popular deep learning libraries in Python for research and development are TensorFlow/Keras and PyTorch, due to their simplicity. The scikit-learn library, however, is the most popular library for general machine learning in Python. In this post, you will discover how to use deep learning models from PyTorch with the scikit-learn library in Python. This will allow you to leverage the power of the scikit-learn library for tasks like model evaluation and model hyper-parameter optimization. After completing this lesson you will know:

  • How to wrap a PyTorch model for use with the scikit-learn machine learning library
  • How to easily evaluate PyTorch models using cross-validation in scikit-learn
  • How to tune PyTorch model hyperparameters using grid search in scikit-learn

Kick-start your project with my book Deep Learning with PyTorch. It provides self-study tutorials with working code.


Let’s get started.

Use PyTorch Deep Learning Models with scikit-learn
Photo by Priyanka Neve. Some rights reserved.

Overview

This chapter is in four parts; they are:

  • Overview of skorch
  • Evaluate Deep Learning Models with Cross-Validation
  • Running k-Fold Cross-validation with scikit-learn
  • Grid Search Deep Learning Model Parameters

Overview of skorch

PyTorch is a popular library for deep learning in Python, but the focus of the library is deep learning, not all of machine learning. In fact, it strives for minimalism, focusing on only what you need to quickly and simply define and build deep learning models. The scikit-learn library in Python is built upon the SciPy stack for efficient numerical computation. It is a fully featured library for general purpose machine learning and provides many useful utilities in developing deep learning models. Not least of which are:

  • Evaluation of models using resampling methods like k-fold cross-validation
  • Efficient search and evaluation of model hyperparameters
  • Connecting multiple steps of a machine learning workflow into a pipeline

PyTorch cannot work with scikit-learn directly. But thanks to the duck-typing nature of Python language, it is easy to adapt a PyTorch model for use with scikit-learn. Indeed, the skorch module is built for this purpose. With skorch, you can make your PyTorch model work just like a scikit-learn model. You may find it easier to use.

In the following sections, you will work through examples of using the NeuralNetClassifier wrapper for a classification neural network created in PyTorch and used in the scikit-learn library. The test problem is the Sonar dataset. This is a small dataset with all numerical attributes that is easy to work with.

The following examples assume you have successfully installed PyTorch, skorch, and scikit-learn. If you use the pip for your Python modules, you may install them with:

Evaluate Deep Learning Models with Cross-Validation

The NeuralNet class, or more specialized NeuralNetClassifier, NeuralNetBinaryClassifier, and NeuralNetRegressor classes in skorch are factory wrappers for PyTorch models. They take an argument model which is a class or a function to call to get your model. In return, these wrapper classes allows you to specify loss function and optimizer, then the training loop comes for free. This is the convenience compare to using PyTorch directly.

Below is a simple example of training a binary classifier on the Sonar dataset:

In this model, you used torch.nn.BCEWithLogitsLoss as the loss function (that is indeed the default of NeuralNetBinaryClassifier). It is to combine the sigmoid function with binary cross entropy loss, so that you don’t need to put the sigmoid function at the output of the model. It is sometimes preferred to provide better numerical stability.

In addition, you specified the training parameters such as the number of epochs and batch size in the skorch wrapper. Then you just need to call fit() function with the input feature and target. The wrapper will help you initialize a model and train it.

Running the above will produce the following:

Note that skorch is positioned as a wrapper for PyTorch models to adapt to scikit-learn interface. Therefore, you should use the model as if it is a scikit-learn model. For example, to train your binary classification model, it is expected the target to be a vector rather than an $n\times 1$ matrix. And to run the model for inference, you should use model.predict(X) or model.predict_proba(X). It is also why you should use NeuralNetBinaryClassifier, such that the classification-related scikit-learn functions are provided as model methods.

Want to Get Started With Deep Learning with PyTorch?

Take my free email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Running k-Fold Cross-validation with scikit-learn

Using a wrapper over your PyTorch model already save you a lot of boilerplate code on building your own training loop. But the entire suite of machine learning functions from scikit-learn is the real productivity boost.

One example is to use the model selection functions from scikit-learn. Let’s say you want to evaluate this model design with k-fold cross-validation. Normally, it means to take a dataset, split it into $k$ portions, then run a loop to select one of these portion as test set and the rest as training set to train a model from scratch and obtain an evaluation score. It is not difficult to do but you need to write several lines of code to implement these.

Indeed, we can make use of the k-fold and cross validation function from scikit-learn, as follows:

The parameter verbose=False in NeuralNetBinaryClassifier is to stop the display of progress while the model is trained, since there was a lot. The above code will print the validation score, as follows:

These are the evaluation scores. Because it is a binary classification model, they are the average accuracy. There are five of them because it is obtained from a k-fold cross-validation with $k=5$, each for a different test set. Usually you evaluate a model with the mean and standard deviation of the cross-validation scores:

which is

A good model should produce a high score (in this case, accuracy close to 1) and low standard deviation. A high standard deviation means the model is not very consistent with different test sets.

Putting everything together, the following is the complete code:

In comparison, the following is an equivalent implementation with a neural network model in scikit-learn:

Which you should see how skorch is to make a drop-in replacement of scikit-learn model with a model from PyTorch.

Grid Search Deep Learning Model Parameters

The previous example showed how easy it is to wrap your deep learning model from PyTorch and use it in functions from the scikit-learn library. In this example, you will go a step further. The function that you specify to the model argument when creating the NeuralNetBinaryClassifier or NeuralNetClassifier wrapper can take many arguments. You can use these arguments to further customize the construction of the model. In addition, you know you can provide arguments to the fit() function.

In this example, you will use grid search to evaluate different configurations for your neural network model and report on the combination that provides the best estimated performance. To make it interesting, let’s modify the PyTorch model such that it takes a parameter to decide how deep you want it to be:

In this design, we hold the hidden layers and their activation functions in Python lists. Because the PyTorch components are not immediate attributes of the class, you will not see them in model.parameters(). That will be a problem on training. This can be mitigated by using self.add_module() to register the components. An alternative is to use nn.ModuleList() instead of a Python list, so that you provided enough clues to tell where to find the components of the model.

The skorch wrapper is still the same. With it, you can have a model compatible to scikit-learn. As you can see, there are parameters to set up the deep learning model as well as training parameters such as learning rate (lr) specified in the wrapper, you have many possible variations. The GridSearchCV function from scikit-learn is to provide grid search cross validation. You can provide a list of values for each parameter and ask scikit-learn to try out all combinations and report the best set of parameters according to the metric you specified. An example is as follows:

You passed in model to GridSearchCV(), which is a skorch wrapper. You also passed in param_grid, which specified to vary:

  • the parameter n_layers in he PyTorch model (i.e., the SonarClassifier class), that controls the depth of the neural network
  • the parameter lr in the wrapper, that controls the learning rate at the optimizer
  • the parameter max_epochs in the wrapper, that controls the number of training epochs to run

Note the use of double underscore to pass on parameters to the PyTorch model. In fact, this allows you to configure other parameters too. For example, you can set up optimizer__weight_decay to pass on weight_decay parameters to the Adam optimizer (which is for setting up L2 regularization).

Running this can take a while to compute because it tries all combinations, each evaluated with 3-fold cross validation. You do not want to run this often but it can be useful for you to design models.

After the grid search is finished, the performance and combination of configurations for the best model are displayed, followed by the performance of all combinations of parameters, as below:

It gives:

This might take about 5 minutes to complete on your workstation executed on the CPU (rather than GPU). Running the example shows the results below. You can see that the grid search discovered that using a learning rate of 0.001 with 150 epochs and only a single hidden layer achieved the best cross-validation score of approximately 65% on this problem.

In fact, you can see if you can improve the result by first standardizing input features. Since the wrapper allows you to use PyTorch model with scikit-learn, you can also use the scikit-learn’s standardizer in realtime, and create a machine learning pipeline:

The new object pipe you created is another scikit-learn model that works just like the model object, except a standard scaler is applied before the data is passed on to the neural network. Therefore you can run a grid search on this pipeline, with a little tweak on the way parameters are specified:

Two key points to note here: Since PyTorch models are running on 32-bit floats by default but NumPy arrays are usually 64-bit floats. These data types are not aligned, but scikit-learn’s scaler always return you a NumPy array. Therefore you need to do type conversion in the middle of the pipeline, using a FunctionTransformer object.

Moreover, in a scikit-learn pipeline, each step is referred by a name, such as scaler and sonarmodel. Therefore, the parameters set for the pipeline need to carry the name as well. In the example above, we use sonarmodel__module__n_layers as a parameter for grid search. This refers to the sonarmodel part of the pipeline (which is your skorch wrapper), the module part therein (which is your PyTorch model), and its n_layers parameter. Note the use of double underscore for hierarchy separation.

Putting everything together, the following is the complete code:

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Online Resources

Summary

In this chapter, you discovered how to wrap your PyTorch deep learning models and use them in the scikit-learn general machine learning library. You learned:

  • Specifically how to wrap PyTorch models so that they can be used with the scikit-learn machine learning library.
  • How to use a wrapped PyTorch model as part of evaluating model performance in scikit-learn.
  • How to perform hyperparameter tuning in scikit-learn using a wrapped PyTorch model.

You can see that using scikit-learn for standard machine learning operations such as model evaluation and model hyperparameter optimization can save a lot of time over implementing these schemes yourself. Wrapping your model allowed you to leverage powerful tools from scikit-learn to fit your deep learning models into your general machine learning process.

Get Started on Deep Learning with PyTorch!

Deep Learning with PyTorch

Learn how to build deep learning models

...using the newly released PyTorch 2.0 library

Discover how in my new Ebook:
Deep Learning with PyTorch

It provides self-study tutorials with hundreds of working code to turn you from a novice to expert. It equips you with
tensor operation, training, evaluation, hyperparameter optimization, and much more...

Kick-start your deep learning journey with hands-on exercises


See What's Inside

10 Responses to Use PyTorch Deep Learning Models with scikit-learn

  1. Avatar
    james February 14, 2023 at 1:50 pm #

    Thanks for the tutorial.

    I faced the following error when executing the code.

    —————————————————————————
    TypeError Traceback (most recent call last)
    Cell In[15], line 7
    4 y = encoder.transform(y)
    6 # Convert to 2D PyTorch tensors
    —-> 7 X = torch.tensor(X.values, dtype=torch.float32)
    8 y = torch.tensor(y, dtype=torch.float32)

    TypeError: can’t convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

    • Avatar
      Adrian Tam February 15, 2023 at 4:29 am #

      Check what was X before this line. Probably you read some non-number into it.

  2. Avatar
    Geoff Hardy March 26, 2023 at 2:04 am #

    Hi James!
    Thanks for the tutorial… just a typo comment… FunctionTransformer should be imported from sklearn.preprocessing ???

    • Avatar
      James Carmichael March 26, 2023 at 10:29 am #

      Hi Geoff…You are correct! Thank you for the feedback!

  3. Avatar
    zhao hongwei April 15, 2023 at 11:27 pm #

    how to use the gridsearchcv and neuralnetclassifier optimize the weight_decay and lr_decay

  4. Avatar
    WuGang February 15, 2024 at 7:08 pm #

    Why is the same hyperparameter trained by MLPClassifier so much better than by torch

    • Avatar
      James Carmichael February 16, 2024 at 10:32 am #

      Hi WuGang…What are the differences in accuracy? Is this difference seen on average of multiple executions of training?

  5. Avatar
    shadow_ February 18, 2024 at 10:09 pm #

    It seems that pipe does not need to call the initialization method of model, directly written (‘sonarmodel’, model) can be, because GridSearchCV has automatic initialization, plus initialize some cases will be prone to serialization problems

    • Avatar
      James Carmichael February 19, 2024 at 8:21 am #

      Hi shadow_…Thank you for your feedback!

Leave a Reply