Gaussian Processes for Classification With Python

By Jason Brownlee on August 3, 2020 in Python Machine Learning 16

The Gaussian Processes Classifier is a classification machine learning algorithm.

Gaussian Processes are a generalization of the Gaussian probability distribution and can be used as the basis for sophisticated non-parametric machine learning algorithms for classification and regression.

They are a type of kernel model, like SVMs, and unlike SVMs, they are capable of predicting highly calibrated class membership probabilities, although the choice and configuration of the kernel used at the heart of the method can be challenging.

In this tutorial, you will discover the Gaussian Processes Classifier classification machine learning algorithm.

After completing this tutorial, you will know:

The Gaussian Processes Classifier is a non-parametric algorithm that can be applied to binary classification tasks.
How to fit, evaluate, and make predictions with the Gaussian Processes Classifier model with Scikit-Learn.
How to tune the hyperparameters of the Gaussian Processes Classifier algorithm on a given dataset.

Let’s get started.

Gaussian Processes for Classification With Python
Photo by Mark Kao, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

Gaussian Processes for Classification
Gaussian Processes With Scikit-Learn
Tune Gaussian Processes Hyperparameters

Gaussian Processes for Classification

Gaussian Processes, or GP for short, are a generalization of the Gaussian probability distribution (e.g. the bell-shaped function).

Gaussian probability distribution functions summarize the distribution of random variables, whereas Gaussian processes summarize the properties of the functions, e.g. the parameters of the functions. As such, you can think of Gaussian processes as one level of abstraction or indirection above Gaussian functions.

A Gaussian process is a generalization of the Gaussian probability distribution. Whereas a probability distribution describes random variables which are scalars or vectors (for multivariate distributions), a stochastic process governs the properties of functions.

— Page 2, Gaussian Processes for Machine Learning, 2006.

Gaussian processes can be used as a machine learning algorithm for classification predictive modeling.

Gaussian processes are a type of kernel method, like SVMs, although they are able to predict highly calibrated probabilities, unlike SVMs.

Gaussian processes require specifying a kernel that controls how examples relate to each other; specifically, it defines the covariance function of the data. This is called the latent function or the “nuisance” function.

The latent function f plays the role of a nuisance function: we do not observe values of f itself (we observe only the inputs X and the class labels y) and we are not particularly interested in the values of f …

— Page 40, Gaussian Processes for Machine Learning, 2006.

The way that examples are grouped using the kernel controls how the model “perceives” the examples, given that it assumes that examples that are “close” to each other have the same class label.

Therefore, it is important to both test different kernel functions for the model and different configurations for sophisticated kernel functions.

… a covariance function is the crucial ingredient in a Gaussian process predictor, as it encodes our assumptions about the function which we wish to learn.

— Page 79, Gaussian Processes for Machine Learning, 2006.

It also requires a link function that interprets the internal representation and predicts the probability of class membership. The logistic function can be used, allowing the modeling of a Binomial probability distribution for binary classification.

For the binary discriminative case one simple idea is to turn the output of a regression model into a class probability using a response function (the inverse of a link function), which “squashes” its argument, which can lie in the domain (−inf, inf), into the range [0, 1], guaranteeing a valid probabilistic interpretation.

— Page 35, Gaussian Processes for Machine Learning, 2006.

Gaussian processes and Gaussian processes for classification is a complex topic.

To learn more see the text:

Gaussian Processes for Machine Learning, 2006.

Gaussian Processes With Scikit-Learn

The Gaussian Processes Classifier is available in the scikit-learn Python machine learning library via the GaussianProcessClassifier class.

The class allows you to specify the kernel to use via the “kernel” argument and defaults to 1 * RBF(1.0), e.g. a RBF kernel.

...
# define model
model = GaussianProcessClassifier(kernel=1*RBF(1.0))

...

# define model

model = GaussianProcessClassifier(kernel=1*RBF(1.0))

Given that a kernel is specified, the model will attempt to best configure the kernel for the training dataset.

This is controlled via setting an “optimizer“, the number of iterations for the optimizer via the “max_iter_predict“, and the number of repeats of this optimization process performed in an attempt to overcome local optima “n_restarts_optimizer“.

By default, a single optimization run is performed, and this can be turned off by setting “optimize” to None.

...
# define model
model = GaussianProcessClassifier(optimizer=None)

...

# define model

model = GaussianProcessClassifier(optimizer=None)

We can demonstrate the Gaussian Processes Classifier with a worked example.

First, let’s define a synthetic classification dataset.

We will use the make_classification() function to create a dataset with 100 examples, each with 20 input variables.

The example below creates and summarizes the dataset.

# test classification dataset
from sklearn.datasets import make_classification
# define dataset
X, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# summarize the dataset
print(X.shape, y.shape)

# test classification dataset

from sklearn.datasets import make_classification

# define dataset

X, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# summarize the dataset

print(X.shape, y.shape)

Running the example creates the dataset and confirms the number of rows and columns of the dataset.

(100, 20) (100,)

1	(100, 20) (100,)

We can fit and evaluate a Gaussian Processes Classifier model using repeated stratified k-fold cross-validation via the RepeatedStratifiedKFold class. We will use 10 folds and three repeats in the test harness.

We will use the default configuration.

...
# create the model
model = GaussianProcessClassifier()

...

# create the model

model = GaussianProcessClassifier()

The complete example of evaluating the Gaussian Processes Classifier model for the synthetic binary classification task is listed below.

# evaluate a gaussian process classifier model on the dataset
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.gaussian_process import GaussianProcessClassifier
# define dataset
X, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# define model
model = GaussianProcessClassifier()
# define model evaluation method
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate model
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# summarize result
print('Mean Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))

# evaluate a gaussian process classifier model on the dataset

from numpy import mean

from numpy import std

from sklearn.datasets import make_classification

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.gaussian_process import GaussianProcessClassifier

# define dataset

X, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# define model

model = GaussianProcessClassifier()

# define model evaluation method

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

# evaluate model

scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)

# summarize result

print('Mean Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))

Running the example evaluates the Gaussian Processes Classifier algorithm on the synthetic dataset and reports the average accuracy across the three repeats of 10-fold cross-validation.

Your specific results may vary given the stochastic nature of the learning algorithm. Consider running the example a few times.

In this case, we can see that the model achieved a mean accuracy of about 79.0 percent.

Mean Accuracy: 0.790 (0.101)

1	Mean Accuracy: 0.790 (0.101)

We may decide to use the Gaussian Processes Classifier as our final model and make predictions on new data.

This can be achieved by fitting the model pipeline on all available data and calling the predict() function passing in a new row of data.

We can demonstrate this with a complete example listed below.

# make a prediction with a gaussian process classifier model on the dataset
from sklearn.datasets import make_classification
from sklearn.gaussian_process import GaussianProcessClassifier
# define dataset
X, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# define model
model = GaussianProcessClassifier()
# fit model
model.fit(X, y)
# define new data
row = [2.47475454,0.40165523,1.68081787,2.88940715,0.91704519,-3.07950644,4.39961206,0.72464273,-4.86563631,-6.06338084,-1.22209949,-0.4699618,1.01222748,-0.6899355,-0.53000581,6.86966784,-3.27211075,-6.59044146,-2.21290585,-3.139579]
# make a prediction
yhat = model.predict([row])
# summarize prediction
print('Predicted Class: %d' % yhat)

# make a prediction with a gaussian process classifier model on the dataset

from sklearn.datasets import make_classification

from sklearn.gaussian_process import GaussianProcessClassifier

# define dataset

X, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# define model

model = GaussianProcessClassifier()

# fit model

model.fit(X, y)

# define new data

row = [2.47475454,0.40165523,1.68081787,2.88940715,0.91704519,-3.07950644,4.39961206,0.72464273,-4.86563631,-6.06338084,-1.22209949,-0.4699618,1.01222748,-0.6899355,-0.53000581,6.86966784,-3.27211075,-6.59044146,-2.21290585,-3.139579]

# make a prediction

yhat = model.predict([row])

# summarize prediction

print('Predicted Class: %d' % yhat)

Running the example fits the model and makes a class label prediction for a new row of data.

Predicted Class: 0

1	Predicted Class: 0

Next, we can look at configuring the model hyperparameters.

Tune Gaussian Processes Hyperparameters

The hyperparameters for the Gaussian Processes Classifier method must be configured for your specific dataset.

Perhaps the most important hyperparameter is the kernel controlled via the “kernel” argument. The scikit-learn library provides many built-in kernels that can be used.

Perhaps some of the more common examples include:

RBF
DotProduct
Matern
RationalQuadratic
WhiteKernel

You can learn more about the kernels offered by the library here:

Kernels for Gaussian Processes, Scikit-Learn User Guide.

We will evaluate the performance of the Gaussian Processes Classifier with each of these common kernels, using default arguments.

...
# define grid
grid = dict()
grid['kernel'] = [1*RBF(), 1*DotProduct(), 1*Matern(), 1*RationalQuadratic(), 1*WhiteKernel()]

...

# define grid

grid = dict()

grid['kernel'] = [1*RBF(), 1*DotProduct(), 1*Matern(), 1*RationalQuadratic(), 1*WhiteKernel()]

The example below demonstrates this using the GridSearchCV class with a grid of values we have defined.

# grid search kernel for gaussian process classifier
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.gaussian_process.kernels import DotProduct
from sklearn.gaussian_process.kernels import Matern
from sklearn.gaussian_process.kernels import RationalQuadratic
from sklearn.gaussian_process.kernels import WhiteKernel
# define dataset
X, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# define model
model = GaussianProcessClassifier()
# define model evaluation method
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# define grid
grid = dict()
grid['kernel'] = [1*RBF(), 1*DotProduct(), 1*Matern(),  1*RationalQuadratic(), 1*WhiteKernel()]
# define search
search = GridSearchCV(model, grid, scoring='accuracy', cv=cv, n_jobs=-1)
# perform the search
results = search.fit(X, y)
# summarize best
print('Best Mean Accuracy: %.3f' % results.best_score_)
print('Best Config: %s' % results.best_params_)
# summarize all
means = results.cv_results_['mean_test_score']
params = results.cv_results_['params']
for mean, param in zip(means, params):
    print(">%.3f with: %r" % (mean, param))

# grid search kernel for gaussian process classifier

from sklearn.datasets import make_classification

from sklearn.model_selection import GridSearchCV

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.gaussian_process import GaussianProcessClassifier

from sklearn.gaussian_process.kernels import RBF

from sklearn.gaussian_process.kernels import DotProduct

from sklearn.gaussian_process.kernels import Matern

from sklearn.gaussian_process.kernels import RationalQuadratic

from sklearn.gaussian_process.kernels import WhiteKernel

# define dataset

X, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# define model

model = GaussianProcessClassifier()

# define model evaluation method

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

# define grid

grid = dict()

grid['kernel'] = [1*RBF(), 1*DotProduct(), 1*Matern(), 1*RationalQuadratic(), 1*WhiteKernel()]

# define search

search = GridSearchCV(model, grid, scoring='accuracy', cv=cv, n_jobs=-1)

# perform the search

results = search.fit(X, y)

# summarize best

print('Best Mean Accuracy: %.3f' % results.best_score_)

print('Best Config: %s' % results.best_params_)

# summarize all

means = results.cv_results_['mean_test_score']

params = results.cv_results_['params']

for mean, param in zip(means, params):

print(">%.3f with: %r" % (mean, param))

Running the example will evaluate each combination of configurations using repeated cross-validation.

Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.

In this case, we can see that the RationalQuadratic kernel achieved a lift in performance with an accuracy of about 91.3 percent as compared to 79.0 percent achieved with the RBF kernel in the previous section.

Best Mean Accuracy: 0.913
Best Config: {'kernel': 1**2 * RationalQuadratic(alpha=1, length_scale=1)}
>0.790 with: {'kernel': 1**2 * RBF(length_scale=1)}
>0.800 with: {'kernel': 1**2 * DotProduct(sigma_0=1)}
>0.830 with: {'kernel': 1**2 * Matern(length_scale=1, nu=1.5)}
>0.913 with: {'kernel': 1**2 * RationalQuadratic(alpha=1, length_scale=1)}
>0.510 with: {'kernel': 1**2 * WhiteKernel(noise_level=1)}

Best Mean Accuracy: 0.913

Best Config: {'kernel': 1**2 * RationalQuadratic(alpha=1, length_scale=1)}

>0.790 with: {'kernel': 1**2 * RBF(length_scale=1)}

>0.800 with: {'kernel': 1**2 * DotProduct(sigma_0=1)}

>0.830 with: {'kernel': 1**2 * Matern(length_scale=1, nu=1.5)}

>0.913 with: {'kernel': 1**2 * RationalQuadratic(alpha=1, length_scale=1)}

>0.510 with: {'kernel': 1**2 * WhiteKernel(noise_level=1)}

Summary

In this tutorial, you discovered the Gaussian Processes Classifier classification machine learning algorithm.

Specifically, you learned:

The Gaussian Processes Classifier is a non-parametric algorithm that can be applied to binary classification tasks.
How to fit, evaluate, and make predictions with the Gaussian Processes Classifier model with Scikit-Learn.
How to tune the hyperparameters of the Gaussian Processes Classifier algorithm on a given dataset.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

16 Responses to Gaussian Processes for Classification With Python

Anthony The Koala October 2, 2020 at 3:29 pm #

Dear Dr Jason,
Could you elaborate please on the dictionary used for the grid search
In the code above, the grid is defined as:

grid = dict()
grid['kernel'] = [1*RBF(), 1*DotProduct(), 1*Matern(),  1*RationalQuadratic(), 1*WhiteKernel()]

1 2	grid = dict() grid['kernel'] = [1RBF(), 1DotProduct(), 1Matern(), 1RationalQuadratic(), 1*WhiteKernel()]

what does 1*RBF(), 1*DotProduct() mean. Yes I know that RBF and DotProduct are functions defined earlier in the code.
Yet whey I print the grid, I get this that does not look like the definition.

grid
{'kernel': [1**2 * RBF(length_scale=1), 1**2 * DotProduct(sigma_0=1), 1**2 * Matern(length_scale=1, nu=1.5), 1**2 * RationalQuadratic(alpha=1, length_scale=1), 1**2 * WhiteKernel(noise_level=1)]}

1 2	grid {'kernel': [1*2 RBF(length_scale=1), 1*2 DotProduct(sigma_0=1), 1*2 Matern(length_scale=1, nu=1.5), 1*2 RationalQuadratic(alpha=1, length_scale=1), 1*2 WhiteKernel(noise_level=1)]}

Before you had say for RBF:

1*RBF()

1*RBF()

When you print the grid you get additional information such as 1**2*RBF with parameters set to length_score = 1. . Where did the extra information come from.

1**2 * RBF(length_scale=1)

1	1*2 RBF(length_scale=1)

In sum:
When setting RBF in the grid, what is the meaning of

1*RBF()

1*RBF()

When printing the grid, you get the extra information

1**2 * RBF(length_scale=1)

1	1*2 RBF(length_scale=1)

Thank you,
Anthony of Sydney

Jason Brownlee October 3, 2020 at 6:04 am #

Good question, you can learn more about the kernels used within GP here:
https://scikit-learn.org/stable/modules/gaussian_process.html#kernels-for-gaussian-processes

Reply

shashank October 4, 2020 at 3:18 am #

hey thanks for this informative blog
i really like this and I learned a lot

Reply
- Jason Brownlee October 4, 2020 at 6:53 am #
  
  You’re welcome!
  
  Reply
DIPTENDU ROY October 13, 2020 at 8:24 pm #

Dear Dr. Jason,

Here you have shown a classification problem using gaussian process regression module of scikit learn. Could you please elaborate a regression project including code using same module sklearn of python.

Reply
- Jason Brownlee October 14, 2020 at 6:16 am #
  
  Thanks for the suggestion!
  
  Reply
Amelie December 11, 2020 at 9:22 am #

Hi Mr Jason,
Can we use the Gaussian process for time series clustering?
In the case that possible, can you explain how can we do it.
thks

Reply
- Jason Brownlee December 11, 2020 at 1:30 pm #
  
  Perhaps, I don’t have an example sorry.
  
  Reply
Ron January 18, 2021 at 11:36 pm #

Hi Jason,

Check this article: https://medium.com/ai-in-plain-english/gaussian-processes-for-classification-cdd6e25a37e0
No credit is given to you. It is exact copy of this blog.

Reply
- Jason Brownlee January 19, 2021 at 6:37 am #
  
  That is very disappointing. And very common. Some people have no shame.
  
  Reply
SULAIMAN KHAN February 4, 2021 at 5:56 pm #

how to create gaussian process for multi class problem?

Reply
- Jason Brownlee February 5, 2021 at 5:36 am #
  
  Perhaps you can try a OVR or OVO approach:
  https://machinelearningmastery.com/one-vs-rest-and-one-vs-one-for-multi-class-classification/
  
  Reply
  - Isha July 26, 2021 at 2:33 am #
    
    Can we use a Binomial observation likelihood for multi-class problem? Especially when the number of classes is different for different samples.
    
    Reply
    - Jason Brownlee July 26, 2021 at 5:31 am #
      
      Not really, use multinomial distribution for more than two classes.
      
      Reply
Ono Teas August 26, 2021 at 8:34 pm #

Would we be able to utilize a Binomial perception probability for multi-class issue? Particularly when the quantity of classes is distinctive for various examples.

Reply
- Adrian Tam August 27, 2021 at 6:08 am #
  
  If you consider OvR (one-vs-rest) classification, you can use binomial probability. See this post for the concept: https://machinelearningmastery.com/one-vs-rest-and-one-vs-one-for-multi-class-classification/
  
  Reply

Navigation

Gaussian Processes for Classification With Python

Tutorial Overview

Gaussian Processes for Classification

Gaussian Processes With Scikit-Learn

Tune Gaussian Processes Hyperparameters

Further Reading

Books

APIs

Articles

Summary

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

Finally Bring Machine Learning To
Your Own Projects

More On This Topic

16 Responses to Gaussian Processes for Classification With Python

Leave a Reply Click here to cancel reply.

Navigation

Tutorial Overview

Gaussian Processes for Classification

Gaussian Processes With Scikit-Learn

Tune Gaussian Processes Hyperparameters

Further Reading

Books

APIs

Articles

Summary

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

Finally Bring Machine Learning To Your Own Projects

More On This Topic

16 Responses to Gaussian Processes for Classification With Python

Leave a Reply Click here to cancel reply.

Finally Bring Machine Learning To
Your Own Projects