The **Gaussian Processes Classifier** is a classification machine learning algorithm.

Gaussian Processes are a generalization of the Gaussian probability distribution and can be used as the basis for sophisticated non-parametric machine learning algorithms for classification and regression.

They are a type of kernel model, like SVMs, and unlike SVMs, they are capable of predicting highly calibrated class membership probabilities, although the choice and configuration of the kernel used at the heart of the method can be challenging.

In this tutorial, you will discover the Gaussian Processes Classifier classification machine learning algorithm.

After completing this tutorial, you will know:

- The Gaussian Processes Classifier is a non-parametric algorithm that can be applied to binary classification tasks.
- How to fit, evaluate, and make predictions with the Gaussian Processes Classifier model with Scikit-Learn.
- How to tune the hyperparameters of the Gaussian Processes Classifier algorithm on a given dataset.

Let’s get started.

## Tutorial Overview

This tutorial is divided into three parts; they are:

- Gaussian Processes for Classification
- Gaussian Processes With Scikit-Learn
- Tune Gaussian Processes Hyperparameters

## Gaussian Processes for Classification

Gaussian Processes, or GP for short, are a generalization of the Gaussian probability distribution (e.g. the bell-shaped function).

Gaussian probability distribution functions summarize the distribution of random variables, whereas Gaussian processes summarize the properties of the functions, e.g. the parameters of the functions. As such, you can think of Gaussian processes as one level of abstraction or indirection above Gaussian functions.

A Gaussian process is a generalization of the Gaussian probability distribution. Whereas a probability distribution describes random variables which are scalars or vectors (for multivariate distributions), a stochastic process governs the properties of functions.

— Page 2, Gaussian Processes for Machine Learning, 2006.

Gaussian processes can be used as a machine learning algorithm for classification predictive modeling.

Gaussian processes are a type of kernel method, like SVMs, although they are able to predict highly calibrated probabilities, unlike SVMs.

Gaussian processes require specifying a kernel that controls how examples relate to each other; specifically, it defines the covariance function of the data. This is called the latent function or the “*nuisance*” function.

The latent function f plays the role of a nuisance function: we do not observe values of f itself (we observe only the inputs X and the class labels y) and we are not particularly interested in the values of f …

— Page 40, Gaussian Processes for Machine Learning, 2006.

The way that examples are grouped using the kernel controls how the model “*perceives*” the examples, given that it assumes that examples that are “*close*” to each other have the same class label.

Therefore, it is important to both test different kernel functions for the model and different configurations for sophisticated kernel functions.

… a covariance function is the crucial ingredient in a Gaussian process predictor, as it encodes our assumptions about the function which we wish to learn.

— Page 79, Gaussian Processes for Machine Learning, 2006.

It also requires a link function that interprets the internal representation and predicts the probability of class membership. The logistic function can be used, allowing the modeling of a Binomial probability distribution for binary classification.

For the binary discriminative case one simple idea is to turn the output of a regression model into a class probability using a response function (the inverse of a link function), which “squashes” its argument, which can lie in the domain (−inf, inf), into the range [0, 1], guaranteeing a valid probabilistic interpretation.

— Page 35, Gaussian Processes for Machine Learning, 2006.

Gaussian processes and Gaussian processes for classification is a complex topic.

To learn more see the text:

## Gaussian Processes With Scikit-Learn

The Gaussian Processes Classifier is available in the scikit-learn Python machine learning library via the GaussianProcessClassifier class.

The class allows you to specify the kernel to use via the “*kernel*” argument and defaults to 1 * RBF(1.0), e.g. a RBF kernel.

1 2 3 |
... # define model model = GaussianProcessClassifier(kernel=1*RBF(1.0)) |

Given that a kernel is specified, the model will attempt to best configure the kernel for the training dataset.

This is controlled via setting an “*optimizer*“, the number of iterations for the optimizer via the “*max_iter_predict*“, and the number of repeats of this optimization process performed in an attempt to overcome local optima “*n_restarts_optimizer*“.

By default, a single optimization run is performed, and this can be turned off by setting “*optimize*” to *None*.

1 2 3 |
... # define model model = GaussianProcessClassifier(optimizer=None) |

We can demonstrate the Gaussian Processes Classifier with a worked example.

First, let’s define a synthetic classification dataset.

We will use the make_classification() function to create a dataset with 100 examples, each with 20 input variables.

The example below creates and summarizes the dataset.

1 2 3 4 5 6 |
# test classification dataset from sklearn.datasets import make_classification # define dataset X, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1) # summarize the dataset print(X.shape, y.shape) |

Running the example creates the dataset and confirms the number of rows and columns of the dataset.

1 |
(100, 20) (100,) |

We can fit and evaluate a Gaussian Processes Classifier model using repeated stratified k-fold cross-validation via the RepeatedStratifiedKFold class. We will use 10 folds and three repeats in the test harness.

We will use the default configuration.

1 2 3 |
... # create the model model = GaussianProcessClassifier() |

The complete example of evaluating the Gaussian Processes Classifier model for the synthetic binary classification task is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# evaluate a gaussian process classifier model on the dataset from numpy import mean from numpy import std from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedStratifiedKFold from sklearn.gaussian_process import GaussianProcessClassifier # define dataset X, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1) # define model model = GaussianProcessClassifier() # define model evaluation method cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # evaluate model scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1) # summarize result print('Mean Accuracy: %.3f (%.3f)' % (mean(scores), std(scores))) |

Running the example evaluates the Gaussian Processes Classifier algorithm on the synthetic dataset and reports the average accuracy across the three repeats of 10-fold cross-validation.

Your specific results may vary given the stochastic nature of the learning algorithm. Consider running the example a few times.

In this case, we can see that the model achieved a mean accuracy of about 79.0 percent.

1 |
Mean Accuracy: 0.790 (0.101) |

We may decide to use the Gaussian Processes Classifier as our final model and make predictions on new data.

This can be achieved by fitting the model pipeline on all available data and calling the *predict()* function passing in a new row of data.

We can demonstrate this with a complete example listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# make a prediction with a gaussian process classifier model on the dataset from sklearn.datasets import make_classification from sklearn.gaussian_process import GaussianProcessClassifier # define dataset X, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1) # define model model = GaussianProcessClassifier() # fit model model.fit(X, y) # define new data row = [2.47475454,0.40165523,1.68081787,2.88940715,0.91704519,-3.07950644,4.39961206,0.72464273,-4.86563631,-6.06338084,-1.22209949,-0.4699618,1.01222748,-0.6899355,-0.53000581,6.86966784,-3.27211075,-6.59044146,-2.21290585,-3.139579] # make a prediction yhat = model.predict([row]) # summarize prediction print('Predicted Class: %d' % yhat) |

Running the example fits the model and makes a class label prediction for a new row of data.

1 |
Predicted Class: 0 |

Next, we can look at configuring the model hyperparameters.

## Tune Gaussian Processes Hyperparameters

The hyperparameters for the Gaussian Processes Classifier method must be configured for your specific dataset.

Perhaps the most important hyperparameter is the kernel controlled via the “*kernel*” argument. The scikit-learn library provides many built-in kernels that can be used.

Perhaps some of the more common examples include:

- RBF
- DotProduct
- Matern
- RationalQuadratic
- WhiteKernel

You can learn more about the kernels offered by the library here:

We will evaluate the performance of the Gaussian Processes Classifier with each of these common kernels, using default arguments.

1 2 3 4 |
... # define grid grid = dict() grid['kernel'] = [1*RBF(), 1*DotProduct(), 1*Matern(), 1*RationalQuadratic(), 1*WhiteKernel()] |

The example below demonstrates this using the GridSearchCV class with a grid of values we have defined.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# grid search kernel for gaussian process classifier from sklearn.datasets import make_classification from sklearn.model_selection import GridSearchCV from sklearn.model_selection import RepeatedStratifiedKFold from sklearn.gaussian_process import GaussianProcessClassifier from sklearn.gaussian_process.kernels import RBF from sklearn.gaussian_process.kernels import DotProduct from sklearn.gaussian_process.kernels import Matern from sklearn.gaussian_process.kernels import RationalQuadratic from sklearn.gaussian_process.kernels import WhiteKernel # define dataset # define model model = GaussianProcessClassifier() # define model evaluation method cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # define grid grid = dict() grid['kernel'] = [1*RBF(), 1*DotProduct(), 1*Matern(), 1*RationalQuadratic(), 1*WhiteKernel()] # define search search = GridSearchCV(model, grid, scoring='accuracy', cv=cv, n_jobs=-1) # perform the search results = search.fit(X, y) # summarize best print('Best Mean Accuracy: %.3f' % results.best_score_) print('Best Config: %s' % results.best_params_) # summarize all means = results.cv_results_['mean_test_score'] params = results.cv_results_['params'] for mean, param in zip(means, params): print(">%.3f with: %r" % (mean, param)) |

Running the example will evaluate each combination of configurations using repeated cross-validation.

Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.

In this case, we can see that the *RationalQuadratic* kernel achieved a lift in performance with an accuracy of about 91.3 percent as compared to 79.0 percent achieved with the RBF kernel in the previous section.

1 2 3 4 5 6 7 |
Best Mean Accuracy: 0.913 Best Config: {'kernel': 1**2 * RationalQuadratic(alpha=1, length_scale=1)} >0.790 with: {'kernel': 1**2 * RBF(length_scale=1)} >0.800 with: {'kernel': 1**2 * DotProduct(sigma_0=1)} >0.830 with: {'kernel': 1**2 * Matern(length_scale=1, nu=1.5)} >0.913 with: {'kernel': 1**2 * RationalQuadratic(alpha=1, length_scale=1)} >0.510 with: {'kernel': 1**2 * WhiteKernel(noise_level=1)} |

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

### Books

- Gaussian Processes for Machine Learning, 2006.
- Gaussian Processes for Machine Learning, Homepage.
- Machine Learning: A Probabilistic Perspective, 2012.
- Pattern Recognition and Machine Learning, 2006.

### APIs

- sklearn.gaussian_process.GaussianProcessClassifier API.
- sklearn.gaussian_process.GaussianProcessRegressor API.
- Gaussian Processes, Scikit-Learn User Guide.
- Gaussian Process Kernels API.

### Articles

## Summary

In this tutorial, you discovered the Gaussian Processes Classifier classification machine learning algorithm.

Specifically, you learned:

- The Gaussian Processes Classifier is a non-parametric algorithm that can be applied to binary classification tasks.
- How to fit, evaluate, and make predictions with the Gaussian Processes Classifier model with Scikit-Learn.
- How to tune the hyperparameters of the Gaussian Processes Classifier algorithm on a given dataset.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.

Dear Dr Jason,

Could you elaborate please on the dictionary used for the grid search

In the code above, the grid is defined as:

what does 1*RBF(), 1*DotProduct() mean. Yes I know that RBF and DotProduct are functions defined earlier in the code.

Yet whey I print the grid, I get this that does not look like the definition.

Before you had say for RBF:

When you print the grid you get additional information such as 1**2*RBF with parameters set to length_score = 1. . Where did the extra information come from.

In sum:

When setting RBF in the grid, what is the meaning of

When printing the grid, you get the extra information

Thank you,

Anthony of Sydney

Good question, you can learn more about the kernels used within GP here:

https://scikit-learn.org/stable/modules/gaussian_process.html#kernels-for-gaussian-processes

hey thanks for this informative blog

i really like this and I learned a lot

You’re welcome!

Dear Dr. Jason,

Here you have shown a classification problem using gaussian process regression module of scikit learn. Could you please elaborate a regression project including code using same module sklearn of python.

Thanks for the suggestion!

Hi Mr Jason,

Can we use the Gaussian process for time series clustering?

In the case that possible, can you explain how can we do it.

thks

Perhaps, I don’t have an example sorry.

Hi Jason,

Check this article: https://medium.com/ai-in-plain-english/gaussian-processes-for-classification-cdd6e25a37e0

No credit is given to you. It is exact copy of this blog.

That is very disappointing. And very common. Some people have no shame.

how to create gaussian process for multi class problem?

Perhaps you can try a OVR or OVO approach:

https://machinelearningmastery.com/one-vs-rest-and-one-vs-one-for-multi-class-classification/

Can we use a Binomial observation likelihood for multi-class problem? Especially when the number of classes is different for different samples.

Not really, use multinomial distribution for more than two classes.