Gaussian Processes for Classification With Python

The Gaussian Processes Classifier is a classification machine learning algorithm.

Gaussian Processes are a generalization of the Gaussian probability distribution and can be used as the basis for sophisticated non-parametric machine learning algorithms for classification and regression.

They are a type of kernel model, like SVMs, and unlike SVMs, they are capable of predicting highly calibrated class membership probabilities, although the choice and configuration of the kernel used at the heart of the method can be challenging.

In this tutorial, you will discover the Gaussian Processes Classifier classification machine learning algorithm.

After completing this tutorial, you will know:

  • The Gaussian Processes Classifier is a non-parametric algorithm that can be applied to binary classification tasks.
  • How to fit, evaluate, and make predictions with the Gaussian Processes Classifier model with Scikit-Learn.
  • How to tune the hyperparameters of the Gaussian Processes Classifier algorithm on a given dataset.

Let’s get started.

Gaussian Processes for Classification With Python

Gaussian Processes for Classification With Python
Photo by Mark Kao, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Gaussian Processes for Classification
  2. Gaussian Processes With Scikit-Learn
  3. Tune Gaussian Processes Hyperparameters

Gaussian Processes for Classification

Gaussian Processes, or GP for short, are a generalization of the Gaussian probability distribution (e.g. the bell-shaped function).

Gaussian probability distribution functions summarize the distribution of random variables, whereas Gaussian processes summarize the properties of the functions, e.g. the parameters of the functions. As such, you can think of Gaussian processes as one level of abstraction or indirection above Gaussian functions.

A Gaussian process is a generalization of the Gaussian probability distribution. Whereas a probability distribution describes random variables which are scalars or vectors (for multivariate distributions), a stochastic process governs the properties of functions.

— Page 2, Gaussian Processes for Machine Learning, 2006.

Gaussian processes can be used as a machine learning algorithm for classification predictive modeling.

Gaussian processes are a type of kernel method, like SVMs, although they are able to predict highly calibrated probabilities, unlike SVMs.

Gaussian processes require specifying a kernel that controls how examples relate to each other; specifically, it defines the covariance function of the data. This is called the latent function or the “nuisance” function.

The latent function f plays the role of a nuisance function: we do not observe values of f itself (we observe only the inputs X and the class labels y) and we are not particularly interested in the values of f …

— Page 40, Gaussian Processes for Machine Learning, 2006.

The way that examples are grouped using the kernel controls how the model “perceives” the examples, given that it assumes that examples that are “close” to each other have the same class label.

Therefore, it is important to both test different kernel functions for the model and different configurations for sophisticated kernel functions.

… a covariance function is the crucial ingredient in a Gaussian process predictor, as it encodes our assumptions about the function which we wish to learn.

— Page 79, Gaussian Processes for Machine Learning, 2006.

It also requires a link function that interprets the internal representation and predicts the probability of class membership. The logistic function can be used, allowing the modeling of a Binomial probability distribution for binary classification.

For the binary discriminative case one simple idea is to turn the output of a regression model into a class probability using a response function (the inverse of a link function), which “squashes” its argument, which can lie in the domain (−inf, inf), into the range [0, 1], guaranteeing a valid probabilistic interpretation.

— Page 35, Gaussian Processes for Machine Learning, 2006.

Gaussian processes and Gaussian processes for classification is a complex topic.

To learn more see the text:

Gaussian Processes With Scikit-Learn

The Gaussian Processes Classifier is available in the scikit-learn Python machine learning library via the GaussianProcessClassifier class.

The class allows you to specify the kernel to use via the “kernel” argument and defaults to 1 * RBF(1.0), e.g. a RBF kernel.

Given that a kernel is specified, the model will attempt to best configure the kernel for the training dataset.

This is controlled via setting an “optimizer“, the number of iterations for the optimizer via the “max_iter_predict“, and the number of repeats of this optimization process performed in an attempt to overcome local optima “n_restarts_optimizer“.

By default, a single optimization run is performed, and this can be turned off by setting “optimize” to None.

We can demonstrate the Gaussian Processes Classifier with a worked example.

First, let’s define a synthetic classification dataset.

We will use the make_classification() function to create a dataset with 100 examples, each with 20 input variables.

The example below creates and summarizes the dataset.

Running the example creates the dataset and confirms the number of rows and columns of the dataset.

We can fit and evaluate a Gaussian Processes Classifier model using repeated stratified k-fold cross-validation via the RepeatedStratifiedKFold class. We will use 10 folds and three repeats in the test harness.

We will use the default configuration.

The complete example of evaluating the Gaussian Processes Classifier model for the synthetic binary classification task is listed below.

Running the example evaluates the Gaussian Processes Classifier algorithm on the synthetic dataset and reports the average accuracy across the three repeats of 10-fold cross-validation.

Your specific results may vary given the stochastic nature of the learning algorithm. Consider running the example a few times.

In this case, we can see that the model achieved a mean accuracy of about 79.0 percent.

We may decide to use the Gaussian Processes Classifier as our final model and make predictions on new data.

This can be achieved by fitting the model pipeline on all available data and calling the predict() function passing in a new row of data.

We can demonstrate this with a complete example listed below.

Running the example fits the model and makes a class label prediction for a new row of data.

Next, we can look at configuring the model hyperparameters.

Tune Gaussian Processes Hyperparameters

The hyperparameters for the Gaussian Processes Classifier method must be configured for your specific dataset.

Perhaps the most important hyperparameter is the kernel controlled via the “kernel” argument. The scikit-learn library provides many built-in kernels that can be used.

Perhaps some of the more common examples include:

  • RBF
  • DotProduct
  • Matern
  • RationalQuadratic
  • WhiteKernel

You can learn more about the kernels offered by the library here:

We will evaluate the performance of the Gaussian Processes Classifier with each of these common kernels, using default arguments.

The example below demonstrates this using the GridSearchCV class with a grid of values we have defined.

Running the example will evaluate each combination of configurations using repeated cross-validation.

Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.

In this case, we can see that the RationalQuadratic kernel achieved a lift in performance with an accuracy of about 91.3 percent as compared to 79.0 percent achieved with the RBF kernel in the previous section.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Books

APIs

Articles

Summary

In this tutorial, you discovered the Gaussian Processes Classifier classification machine learning algorithm.

Specifically, you learned:

  • The Gaussian Processes Classifier is a non-parametric algorithm that can be applied to binary classification tasks.
  • How to fit, evaluate, and make predictions with the Gaussian Processes Classifier model with Scikit-Learn.
  • How to tune the hyperparameters of the Gaussian Processes Classifier algorithm on a given dataset.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Discover Fast Machine Learning in Python!

Master Machine Learning With Python

Develop Your Own Models in Minutes

...with just a few lines of scikit-learn code

Learn how in my new Ebook:
Machine Learning Mastery With Python

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, modeling, tuning, and much more...

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

See What's Inside

16 Responses to Gaussian Processes for Classification With Python

  1. Avatar
    Anthony The Koala October 2, 2020 at 3:29 pm #

    Dear Dr Jason,
    Could you elaborate please on the dictionary used for the grid search
    In the code above, the grid is defined as:

    what does 1*RBF(), 1*DotProduct() mean. Yes I know that RBF and DotProduct are functions defined earlier in the code.
    Yet whey I print the grid, I get this that does not look like the definition.

    Before you had say for RBF:

    When you print the grid you get additional information such as 1**2*RBF with parameters set to length_score = 1. . Where did the extra information come from.

    In sum:
    When setting RBF in the grid, what is the meaning of

    When printing the grid, you get the extra information

    Thank you,
    Anthony of Sydney

  2. Avatar
    shashank October 4, 2020 at 3:18 am #

    hey thanks for this informative blog
    i really like this and I learned a lot

  3. Avatar
    DIPTENDU ROY October 13, 2020 at 8:24 pm #

    Dear Dr. Jason,

    Here you have shown a classification problem using gaussian process regression module of scikit learn. Could you please elaborate a regression project including code using same module sklearn of python.

  4. Avatar
    Amelie December 11, 2020 at 9:22 am #

    Hi Mr Jason,
    Can we use the Gaussian process for time series clustering?
    In the case that possible, can you explain how can we do it.
    thks

  5. Avatar
    Ron January 18, 2021 at 11:36 pm #

    Hi Jason,

    Check this article: https://medium.com/ai-in-plain-english/gaussian-processes-for-classification-cdd6e25a37e0
    No credit is given to you. It is exact copy of this blog.

    • Avatar
      Jason Brownlee January 19, 2021 at 6:37 am #

      That is very disappointing. And very common. Some people have no shame.

  6. Avatar
    SULAIMAN KHAN February 4, 2021 at 5:56 pm #

    how to create gaussian process for multi class problem?

  7. Avatar
    Ono Teas August 26, 2021 at 8:34 pm #

    Would we be able to utilize a Binomial perception probability for multi-class issue? Particularly when the quantity of classes is distinctive for various examples.

Leave a Reply