LOOCV for Evaluating Machine Learning Algorithms

By Jason Brownlee on August 26, 2020 in Python Machine Learning 51

The Leave-One-Out Cross-Validation, or LOOCV, procedure is used to estimate the performance of machine learning algorithms when they are used to make predictions on data not used to train the model.

It is a computationally expensive procedure to perform, although it results in a reliable and unbiased estimate of model performance. Although simple to use and no configuration to specify, there are times when the procedure should not be used, such as when you have a very large dataset or a computationally expensive model to evaluate.

In this tutorial, you will discover how to evaluate machine learning models using leave-one-out cross-validation.

After completing this tutorial, you will know:

The leave-one-out cross-validation procedure is appropriate when you have a small dataset or when an accurate estimate of model performance is more important than the computational cost of the method.
How to use the scikit-learn machine learning library to perform the leave-one-out cross-validation procedure.
How to evaluate machine learning algorithms for classification and regression using leave-one-out cross-validation.

Kick-start your project with my new book Machine Learning Mastery With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

LOOCV for Evaluating Machine Learning Algorithms
Photo by Heather Harvey, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

LOOCV Model Evaluation
LOOCV Procedure in Scikit-Learn
LOOCV to Evaluate Machine Learning Models
1. LOOCV for Classification
2. LOOCV for Regression

LOOCV Model Evaluation

Cross-validation, or k-fold cross-validation, is a procedure used to estimate the performance of a machine learning algorithm when making predictions on data not used during the training of the model.

The cross-validation has a single hyperparameter “k” that controls the number of subsets that a dataset is split into. Once split, each subset is given the opportunity to be used as a test set while all other subsets together are used as a training dataset.

This means that k-fold cross-validation involves fitting and evaluating k models. This, in turn, provides k estimates of a model’s performance on the dataset, which can be reported using summary statistics such as the mean and standard deviation. This score can then be used to compare and ultimately select a model and configuration to use as the “final model” for a dataset.

Typical values for k are k=3, k=5, and k=10, with 10 representing the most common value. This is because, given extensive testing, 10-fold cross-validation provides a good balance of low computational cost and low bias in the estimate of model performance as compared to other k values and a single train-test split.

For more on k-fold cross-validation, see the tutorial:

A Gentle Introduction to k-fold Cross-Validation

Leave-one-out cross-validation, or LOOCV, is a configuration of k-fold cross-validation where k is set to the number of examples in the dataset.

LOOCV is an extreme version of k-fold cross-validation that has the maximum computational cost. It requires one model to be created and evaluated for each example in the training dataset.

The benefit of so many fit and evaluated models is a more robust estimate of model performance as each row of data is given an opportunity to represent the entirety of the test dataset.

Given the computational cost, LOOCV is not appropriate for very large datasets such as more than tens or hundreds of thousands of examples, or for models that are costly to fit, such as neural networks.

Don’t Use LOOCV: Large datasets or costly models to fit.

Given the improved estimate of model performance, LOOCV is appropriate when an accurate estimate of model performance is critical. This particularly case when the dataset is small, such as less than thousands of examples, can lead to model overfitting during training and biased estimates of model performance.

Further, given that no sampling of the training dataset is used, this estimation procedure is deterministic, unlike train-test splits and other k-fold cross-validation confirmations that provide a stochastic estimate of model performance.

Use LOOCV: Small datasets or when estimated model performance is critical.

Once models have been evaluated using LOOCV and a final model and configuration chosen, a final model is then fit on all available data and used to make predictions on new data.

Now that we are familiar with the LOOCV procedure, let’s look at how we can use the method in Python.

LOOCV Procedure in Scikit-Learn

The scikit-learn Python machine learning library provides an implementation of the LOOCV via the LeaveOneOut class.

The method has no configuration, therefore, no arguments are provided to create an instance of the class.

...
# create loocv procedure
cv = LeaveOneOut()

...

# create loocv procedure

cv = LeaveOneOut()

Once created, the split() function can be called and provided the dataset to enumerate.

Each iteration will return the row indices that can be used for the train and test sets from the provided dataset.

...
for train_ix, test_ix in cv.split(X):
	...

...

for train_ix, test_ix in cv.split(X):

...

These indices can be used on the input (X) and output (y) columns of the dataset array to split the dataset.

...
# split data
X_train, X_test = X[train_ix, :], X[test_ix, :]
y_train, y_test = y[train_ix], y[test_ix]

...

# split data

X_train, X_test = X[train_ix, :], X[test_ix, :]

y_train, y_test = y[train_ix], y[test_ix]

The training set can be used to fit a model and the test set can be used to evaluate it by first making a prediction and calculating a performance metric on the predicted values versus the expected values.

...
# fit model
model = RandomForestClassifier(random_state=1)
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)

...

# fit model

model = RandomForestClassifier(random_state=1)

model.fit(X_train, y_train)

# evaluate model

yhat = model.predict(X_test)

Scores can be saved from each evaluation and a final mean estimate of model performance can be presented.

We can tie this together and demonstrate how to use LOOCV to evaluate a RandomForestClassifier model for a synthetic binary classification dataset created with the make_blobs() function.

The complete example is listed below.

# loocv to manually evaluate the performance of a random forest classifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import LeaveOneOut
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# create dataset
X, y = make_blobs(n_samples=100, random_state=1)
# create loocv procedure
cv = LeaveOneOut()
# enumerate splits
y_true, y_pred = list(), list()
for train_ix, test_ix in cv.split(X):
	# split data
	X_train, X_test = X[train_ix, :], X[test_ix, :]
	y_train, y_test = y[train_ix], y[test_ix]
	# fit model
	model = RandomForestClassifier(random_state=1)
	model.fit(X_train, y_train)
	# evaluate model
	yhat = model.predict(X_test)
	# store
	y_true.append(y_test[0])
	y_pred.append(yhat[0])
# calculate accuracy
acc = accuracy_score(y_true, y_pred)
print('Accuracy: %.3f' % acc)

# loocv to manually evaluate the performance of a random forest classifier

from sklearn.datasets import make_blobs

from sklearn.model_selection import LeaveOneOut

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score

# create dataset

X, y = make_blobs(n_samples=100, random_state=1)

# create loocv procedure

cv = LeaveOneOut()

# enumerate splits

y_true, y_pred = list(), list()

for train_ix, test_ix in cv.split(X):

# split data

X_train, X_test = X[train_ix, :], X[test_ix, :]

y_train, y_test = y[train_ix], y[test_ix]

# fit model

model = RandomForestClassifier(random_state=1)

model.fit(X_train, y_train)

# evaluate model

yhat = model.predict(X_test)

# store

y_true.append(y_test[0])

y_pred.append(yhat[0])

# calculate accuracy

acc = accuracy_score(y_true, y_pred)

print('Accuracy: %.3f' % acc)

Running the example manually estimates the performance of the random forest classifier on the synthetic dataset.

Given that the dataset has 100 examples, it means that 100 train/test splits of the dataset were created, with each single row of the dataset given an opportunity to be used as the test set. Similarly, 100 models are created and evaluated.

The classification accuracy across all predictions is then reported, in this case as 99 percent.

Accuracy: 0.990

1	Accuracy: 0.990

A downside of enumerating the folds manually is that it is slow and involves a lot of code that could introduce bugs.

An alternative to evaluating a model using LOOCV is to use the cross_val_score() function.

This function takes the model, the dataset, and the instantiated LOOCV object set via the “cv” argument. A sample of accuracy scores is then returned that can be summarized by calculating the mean and standard deviation.

We can also set the “n_jobs” argument to -1 to use all CPU cores, greatly decreasing the computational cost in fitting and evaluating so many models.

The example below demonstrates evaluating the RandomForestClassifier using LOOCV on the same synthetic dataset using the cross_val_score() function.

# loocv to automatically evaluate the performance of a random forest classifier
from numpy import mean
from numpy import std
from sklearn.datasets import make_blobs
from sklearn.model_selection import LeaveOneOut
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
# create dataset
X, y = make_blobs(n_samples=100, random_state=1)
# create loocv procedure
cv = LeaveOneOut()
# create model
model = RandomForestClassifier(random_state=1)
# evaluate model
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# report performance
print('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))

# loocv to automatically evaluate the performance of a random forest classifier

from numpy import mean

from numpy import std

from sklearn.datasets import make_blobs

from sklearn.model_selection import LeaveOneOut

from sklearn.model_selection import cross_val_score

from sklearn.ensemble import RandomForestClassifier

# create dataset

X, y = make_blobs(n_samples=100, random_state=1)

# create loocv procedure

cv = LeaveOneOut()

# create model

model = RandomForestClassifier(random_state=1)

# evaluate model

scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)

# report performance

print('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))

Running the example automatically estimates the performance of the random forest classifier on the synthetic dataset.

The mean classification accuracy across all folds matches our manual estimate previously.

Accuracy: 0.990 (0.099)

1	Accuracy: 0.990 (0.099)

Now that we are familiar with how to use the LeaveOneOut class, let’s look at how we can use it to evaluate a machine learning model on real datasets.

LOOCV to Evaluate Machine Learning Models

In this section, we will explore using the LOOCV procedure to evaluate machine learning models on standard classification and regression predictive modeling datasets.

LOOCV for Classification

We will demonstrate how to use LOOCV to evaluate a random forest algorithm on the sonar dataset.

The sonar dataset is a standard machine learning dataset comprising 208 rows of data with 60 numerical input variables and a target variable with two class values, e.g. binary classification.

The dataset involves predicting whether sonar returns indicate a rock or simulated mine.

No need to download the dataset; we will download it automatically as part of our worked examples.

The example below downloads the dataset and summarizes its shape.

# summarize the sonar dataset
from pandas import read_csv
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
dataframe = read_csv(url, header=None)
# split into input and output elements
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)

# summarize the sonar dataset

from pandas import read_csv

# load dataset

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'

dataframe = read_csv(url, header=None)

# split into input and output elements

data = dataframe.values

X, y = data[:, :-1], data[:, -1]

print(X.shape, y.shape)

Running the example downloads the dataset and splits it into input and output elements. As expected, we can see that there are 208 rows of data with 60 input variables.

(208, 60) (208,)

1	(208, 60) (208,)

We can now evaluate a model using LOOCV.

First, the loaded dataset must be split into input and output components.

...
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)

...

# split into inputs and outputs

X, y = data[:, :-1], data[:, -1]

print(X.shape, y.shape)

Next, we define the LOOCV procedure.

...
# create loocv procedure
cv = LeaveOneOut()

...

# create loocv procedure

cv = LeaveOneOut()

We can then define the model to evaluate.

...
# create model
model = RandomForestClassifier(random_state=1)

...

# create model

model = RandomForestClassifier(random_state=1)

Then use the cross_val_score() function to enumerate the folds, fit models, then make and evaluate predictions. We can then report the mean and standard deviation of model performance.

...
# evaluate model
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# report performance
print('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))

...

# evaluate model

scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)

# report performance

print('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))

Tying this together, the complete example is listed below.

# loocv evaluate random forest on the sonar dataset
from numpy import mean
from numpy import std
from pandas import read_csv
from sklearn.model_selection import LeaveOneOut
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
dataframe = read_csv(url, header=None)
data = dataframe.values
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
# create loocv procedure
cv = LeaveOneOut()
# create model
model = RandomForestClassifier(random_state=1)
# evaluate model
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# report performance
print('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))

# loocv evaluate random forest on the sonar dataset

from numpy import mean

from numpy import std

from pandas import read_csv

from sklearn.model_selection import LeaveOneOut

from sklearn.model_selection import cross_val_score

from sklearn.ensemble import RandomForestClassifier

# load dataset

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'

dataframe = read_csv(url, header=None)

data = dataframe.values

# split into inputs and outputs

X, y = data[:, :-1], data[:, -1]

print(X.shape, y.shape)

# create loocv procedure

cv = LeaveOneOut()

# create model

model = RandomForestClassifier(random_state=1)

# evaluate model

scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)

# report performance

print('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))

Running the example first loads the dataset and confirms the number of rows in the input and output elements.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

The model is then evaluated using LOOCV and the estimated performance when making predictions on new data has an accuracy of about 82.2 percent.

(208, 60) (208,)
Accuracy: 0.822 (0.382)

1 2	(208, 60) (208,) Accuracy: 0.822 (0.382)

LOOCV for Regression

We will demonstrate how to use LOOCV to evaluate a random forest algorithm on the housing dataset.

The housing dataset is a standard machine learning dataset comprising 506 rows of data with 13 numerical input variables and a numerical target variable.

The dataset involves predicting the house price given details of the house’s suburb in the American city of Boston.

No need to download the dataset; we will download it automatically as part of our worked examples.

The example below downloads and loads the dataset as a Pandas DataFrame and summarizes the shape of the dataset.

# load and summarize the housing dataset
from pandas import read_csv
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
# summarize shape
print(dataframe.shape)

# load and summarize the housing dataset

from pandas import read_csv

# load dataset

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'

dataframe = read_csv(url, header=None)

# summarize shape

print(dataframe.shape)

Running the example confirms the 506 rows of data and 13 input variables and single numeric target variables (14 in total).

(506, 14)

(506, 14)

We can now evaluate a model using LOOCV.

First, the loaded dataset must be split into input and output components.

...
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)

...

# split into inputs and outputs

X, y = data[:, :-1], data[:, -1]

print(X.shape, y.shape)

Next, we define the LOOCV procedure.

...
# create loocv procedure
cv = LeaveOneOut()

...

# create loocv procedure

cv = LeaveOneOut()

We can then define the model to evaluate.

...
# create model
model = RandomForestRegressor(random_state=1)

...

# create model

model = RandomForestRegressor(random_state=1)

Then use the cross_val_score() function to enumerate the folds, fit models, then make and evaluate predictions. We can then report the mean and standard deviation of model performance.

In this case, we use the mean absolute error (MAE) performance metric appropriate for regression.

...
# evaluate model
scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
# force positive
scores = absolute(scores)
# report performance
print('MAE: %.3f (%.3f)' % (mean(scores), std(scores)))

...

# evaluate model

scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)

# force positive

scores = absolute(scores)

# report performance

print('MAE: %.3f (%.3f)' % (mean(scores), std(scores)))

Tying this together, the complete example is listed below.

# loocv evaluate random forest on the housing dataset
from numpy import mean
from numpy import std
from numpy import absolute
from pandas import read_csv
from sklearn.model_selection import LeaveOneOut
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
data = dataframe.values
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
# create loocv procedure
cv = LeaveOneOut()
# create model
model = RandomForestRegressor(random_state=1)
# evaluate model
scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
# force positive
scores = absolute(scores)
# report performance
print('MAE: %.3f (%.3f)' % (mean(scores), std(scores)))

# loocv evaluate random forest on the housing dataset

from numpy import mean

from numpy import std

from numpy import absolute

from pandas import read_csv

from sklearn.model_selection import LeaveOneOut

from sklearn.model_selection import cross_val_score

from sklearn.ensemble import RandomForestRegressor

# load dataset

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'

dataframe = read_csv(url, header=None)

data = dataframe.values

# split into inputs and outputs

X, y = data[:, :-1], data[:, -1]

print(X.shape, y.shape)

# create loocv procedure

cv = LeaveOneOut()

# create model

model = RandomForestRegressor(random_state=1)

# evaluate model

scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)

# force positive

scores = absolute(scores)

# report performance

print('MAE: %.3f (%.3f)' % (mean(scores), std(scores)))

Running the example first loads the dataset and confirms the number of rows in the input and output elements.

The model is evaluated using LOOCV and the performance of the model when making predictions on new data is a mean absolute error of about 2.180 (thousands of dollars).

(506, 13) (506,)
MAE: 2.180 (2.346)

1 2	(506, 13) (506,) MAE: 2.180 (2.346)

Summary

In this tutorial, you discovered how to evaluate machine learning models using leave-one-out cross-validation.

Specifically, you learned:

The leave-one-out cross-validation procedure is appropriate when you have a small dataset or when an accurate estimate of model performance is more important than the computational cost of the method.
How to use the scikit-learn machine learning library to perform the leave-one-out cross-validation procedure.
How to evaluate machine learning algorithms for classification and regression using leave-one-out cross-validation.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

51 Responses to LOOCV for Evaluating Machine Learning Algorithms

Asad Khan July 30, 2020 at 1:38 am #

Are JACKKNIFE and LOOCV the same for Classification? If both are different what is the main difference between them?

Reply
- ue July 30, 2020 at 6:25 am #
  
  Fantastic question!
  
  They sound the same superficially. The main difference appears to be estimating using the in sample (jacknife) vs estimating using the out sample (loocv)
  
  More here:
  https://en.wikipedia.org/wiki/Jackknife_resampling
  
  Reply
Kingsley Udeh July 31, 2020 at 9:47 am #

Hi Dr. Jason,

Is there source one could refer to use LOOCV in time series?

Thanks

Reply
- Jason Brownlee July 31, 2020 at 1:40 pm #
  
  Yes, it is called walk forward validation and I have 100s of examples on the blog.
  
  Perhaps start here:
  https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
  
  Reply
Charles Brauer August 1, 2020 at 12:13 am #

You wrote:

“The model is evaluated using LOOCV and the performance of the model when making predictions on new data is a mean absolute error of about 2.180 (thousands of dollars).”

Did you really mean “thousands of dollars”?

Charles

Reply
- Jason Brownlee August 1, 2020 at 6:11 am #
  
  Yes, the target is defined as:
  
  MEDV Median value of owner-occupied homes in $1000's
  
  1
  
  MEDV Median value of owner-occupied homes in $1000's
  
  Reply
Jason Brownlee October 2, 2020 at 5:55 am #

Thanks.

Reply
madhu krishna October 7, 2020 at 9:37 pm #

your machine learning blog is very good

Reply
- Jason Brownlee October 8, 2020 at 8:30 am #
  
  Thanks!
  
  Reply
Nelson Cárdenas October 12, 2020 at 11:54 pm #

Hi, excellent article and blog.

I’m in doubt. I have a data set splited in train (60%), crossv (20%) and test (20%). I use crossv to estimate hyperparameters and my strategy is train and crossv is a stratified 4-fold cross-validation using scikit-learn. That way I don’t use the test set in the hyperparameter tuning.

I want to use the LOOCV to evaluate the model with the test set in this way:

If I have 120 training examples, 40 crossv examples and 40 test examples, I want to make LOOCV 40 times leaving out just one data from the test examples each time and using 199 examples for the training. This is some kind of hybrid cross validation. I can make it by myself, but without scikitlearn I lost the paralelization. Is there a way to make my strategy work with scikitlearn?

Thanks in advance.

Reply
- Jason Brownlee October 13, 2020 at 6:36 am #
  
  Thanks!
  
  You would use LOOCV instead of a train/test split.
  
  Hyperparameter tuning can be performed within each fold, called nested cross-validation:
  https://machinelearningmastery.com/nested-cross-validation-for-machine-learning-with-python/
  
  Reply
Maria Basic October 17, 2020 at 1:20 am #

Hello! So in the case of small datasets (e.g. less than 1000 datapoints) you suggest to skip the classic train-test procedure (where cross validation, in this case LOOCV, would be performed on the train subset and the chosen model then additionally tested on the test subset) and instead use LOOCV on the whole dataset, because of its small size?

Reply
- Jason Brownlee October 17, 2020 at 6:09 am #
  
  For 1000 data points some type of CV should be used. LOOCV might be appropriate.
  
  Reply
MD MAHMUDUL HASAN November 3, 2020 at 10:50 pm #

Hi Jason, could you kindly help me on how I can do LOOCV for NN in Kears? Thanks. Your website helped me a lot in my Master’s thesis.

Reply
- Jason Brownlee November 4, 2020 at 6:42 am #
  
  Perhaps data the example here:
  https://machinelearningmastery.com/use-keras-deep-learning-models-scikit-learn-python/
  
  Reply
Taseer Suleman November 28, 2020 at 11:44 pm #

What are you storing in :

y_true.append(y_test[0])
y_pred.append(yha[0])

Kindly guide me.What are these location pointing to ?

Reply
- Jason Brownlee November 29, 2020 at 8:14 am #
  
  We are collecting the single predicted and expected values int lists.
  
  Reply
Felipe Araya January 27, 2021 at 11:44 pm #

Hello,

Amazing tutorial as always! Just a few quick questions if you don’t mind please:

1. In the coding example using “cross_val_score” you are essentially doing a nested cross-validation using LOOCV rather than the typical K-folds, right?

2. In the example using the “split()” method, why not using a “train_test_split” to create a train and test set, and then apply the “split()” method on the train set, so you can test performance on a validation set and model generalization on a test set?

3. If you do two “split()” one in the whole data, and one in the training data, would be essentially the same as using nested cross validation or the “cross_val_score”, right?

Thank you very much! Awesome tutorials!

Reply
- Jason Brownlee January 28, 2021 at 5:58 am #
  
  Thanks!
  
  No, we are doing CV, not nested CV.
  
  We are not splitting the dataset, we are doing LOOCV manually in that example. A train/test split would not preserve the LOOCV ordering.
  
  Reply
  - Felipe Araya January 28, 2021 at 7:27 am #
    
    Hi Dr. Jason,
    
    But why not doing a train/test split and then do LOOCV on the training data, that would basically generate a validation set of 1 observation, n times and then we can put those against a test set to see which one generalizes the best. I know that is not what you did, but is that a reasonable approach?
    
    Reply
    - Jason Brownlee January 28, 2021 at 8:03 am #
      
      You can divide the train set within the LOOCV into train/val for hyperparameter tuning if you like.
      
      This tutorial does not do that.
      
      Reply
Fati March 5, 2021 at 3:05 am #

Very interesting

Reply
- Jason Brownlee March 5, 2021 at 5:35 am #
  
  Thanks!
  
  Reply
Ulisses Braga-Neto April 30, 2021 at 12:39 pm #

Leave-one-out is very bad as an estimator of classification accuracy. It has much higher variance than other CV estimators (which all tend to be variable, but not as much as leave-one-out). The problem is especially acute with small sample sizes. Leave-one-out is the least biased of all CV estimators, but the most variable (and variance is more important in small-sample cases). The main reason is that your test folds are too small (one point!). You end up with an all-or-nothing estimate out of each fold.

Reply
- Jason Brownlee May 1, 2021 at 6:01 am #
  
  Thank you for sharing.
  
  Reply
Noor June 1, 2021 at 8:38 am #

Thanks for your blog, i tried it with my data and i keep having nan as accuracy?!
i suppose it’s because of my data, i do have 8 variables in my input features and one numerical target and i don’t have null values!!

Reply
- Jason Brownlee June 2, 2021 at 5:35 am #
  
  Ouch, perhaps the model does not work with your data. Try a single standalone fit and prediction with your model to investigate.
  
  Reply
Martin June 18, 2021 at 1:22 am #

I have a question:
My dataset is not really sure. This means, the data does not com from experiment but from real cases and the solution of real cases is not known for sure. I used LOOCV with SVM and GBM with my about 300 datasets of over 1600 variables to identify errors in the training set data. I eliminated the wrong classifications (about 10 %) from the training data. But now I’m not sure if I eventually overfitted the models or choose only “best” data for training respectively and the model isn’t fit for new data any more?

Reply
- Jason Brownlee June 18, 2021 at 5:44 am #
  
  Generally, you should choose an evaluation procedure that gives you confidence in the model on your specific dataset.
  
  Perhaps LOOCV is not sufficient for you.
  
  Reply
Mohit August 15, 2021 at 12:58 am #

Hi Jason,

Which cross validation technique should we choose between K fold and LOOCV? Are there some specific scenarios where we should choose one technique over the other?

Reply
- Adrian Tam August 17, 2021 at 7:17 am #
  
  The reason for cross validation is to check your model with unseen data. Do you see any weakness in choosing one technique over another? That depends on the problem and data, surely, but also we might consider the speed, memory footprint and other practical metrics in choosing one.
  
  Reply
Andriy September 21, 2021 at 12:40 am #

Your code in “LOOCV for Regression” is not working and gives an error message:

UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 18-23: ordinal not in range(128)

Reply
- Adrian Tam September 21, 2021 at 9:30 am #
  
  Try adding encoding=”utf-8″ to the read_csv() function.
  
  Reply
John November 23, 2021 at 6:58 am #

Hi Jason,

Quick question about loocv: Does the testing accuracy necessarily equal the mean training accuracy (i.e., mean accuracy for the n models)? In other words, does training and testing accuracy converge with loocv?

Thanks!

Reply
- Adrian Tam November 23, 2021 at 1:39 pm #
  
  No. If you see test much worse than training, you run into overfitting problem. We expect a good model should have test and training accuracy roughly the same.
  
  Reply
Markus February 12, 2022 at 1:17 am #

Hi Jason,

I have a question regarding the model that is created. Is the model trained inside the cross_val_score function? How can I get access to that model, eg. to dump it or read out the coefficients etc.?

Thanks!

Reply
- James Carmichael February 12, 2022 at 12:49 pm #
  
  Hi Markus…In this case, the model is trained and evaluated within the cross_val_score. You would need to create a model class object in order to capture the model details.
  
  Reply
Markus February 14, 2022 at 6:30 pm #

Hi James,

I see. Thanks! That clarifies a lot, I just don’t understand, why there is no in-built function to get this model… Any hint to find ideas on how to create such a class?

Reply
- Markus February 14, 2022 at 7:18 pm #
  
  Hi James,
  
  don’t bother, I just found some info on that and more theoretical background on loocv.
  
  Thanks!
  
  Reply
Nicola March 10, 2022 at 9:12 am #

Hi James,

quick questions, would make sense to use LOO as outer loop in a nested CV, while still using a k-fold for the parameter optimization (inner loop)? Is there a better strategy to still have a nested CV using LOO?

Thanks so much
Nicola

Reply
- James Carmichael March 10, 2022 at 10:19 am #
  
  Hi Nicola…Yes, that is a reasonable strategy. Work through it and let us know what you find.
  
  Reply
  - Nicola March 11, 2022 at 5:01 am #
    
    Great, thanks!
    
    Reply
Jeetech Academy March 14, 2022 at 3:16 pm #

Every update is exciting. This one however exceptionally exciting!

Reply
Ahmad Syukri June 23, 2022 at 7:10 pm #

LOOCV for (multi-)linear regression is very cheap though? There’s a closed form solution for it.

Reply
- James Carmichael June 24, 2022 at 8:31 am #
  
  Hi Ahmad…the following may be of interest to you:
  
  https://www.statology.org/leave-one-out-cross-validation/
  
  Reply
Miguel Senra August 25, 2022 at 11:14 pm #

Hello James, are u okay? I have some doubt’s:

Does LOOCV and any other kind of CV gives the prediction interval of my model to unseen data? Or just one or other? I mean, i can say that any new prediction is supposed to be between +mae and -mae found by LOOCV (around the predicted value, of course).

Besides, what can it mean when my model have a much better LOOCV score than my training score?

I mean, i compare my training data predicted values with the actually values of it, and its worse than the LOOCV?

I understand that when my training score is much better than LOOCV or any other CV technique is an overfitting situation, but in oppose?

Thanks!

Reply
- James Carmichael August 26, 2022 at 6:41 am #
  
  Hi Miguel…You may find the following resource of interest:
  
  https://towardsdatascience.com/what-is-cross-validation-622d5a962231
  
  Reply
Simon October 21, 2022 at 6:30 am #

Greetings, Jason, very interesting and helpful article.

Following your example I used “scores = cross_val_score(model, X, y, scoring=’neg_mean_absolute_error’, cv=cv, n_jobs=-1)” (with a different set of data) and was cofnused to see that two different scoring methods gave me the same results.

My model is a linear regression, and the two scoring methods I tried are “neg_mean_absolute_error” and “neg_root_mean_squared_error” and for all of my 35 results I got the same scores for both methods, scores from 0.178 to 2.28, so it’s more than simple coincidence that these scores are the same.

Am I doing something wrong? Or is this to be expected under this model and validation system?

Reply
- James Carmichael October 21, 2022 at 7:43 am #
  
  Hi Simon…The following discussion may prove very insightful:
  
  https://stats.stackexchange.com/questions/526316/why-i-get-same-predictions-values-for-diferent-input-data
  
  Reply
Haj July 26, 2023 at 6:24 pm #

Hi Jason,

thanks for this! It’s helpful. I have a question I’d like to double check my understanding with you. the cross_val_score function doesn’t actually fit the model, correct? So the first argument we pass, we can’t necessarily retrieve that model/classifier for later use. That being said, I keep getting ‘model was has not been fitted’ when I attempt to use it, after following your tutorial. Does that mean I have to do cross_val_score on all of X and Y, then to fit the model,I split it in train and test (let’s say 70 / 30 ), fit the model, then use that on the unseen (test) data? I am struggling to capture the final model/classifier so I can use it later with other data, without ending up with N classifiers, N being the number of datapoints. Awaiting your response and thanks very much!

Reply
- James Carmichael July 27, 2023 at 9:18 am #
  
  Hi Haj…You may find the following resource of interest:
  
  https://machinelearningmastery.com/repeated-k-fold-cross-validation-with-python/
  
  Reply

Navigation

LOOCV for Evaluating Machine Learning Algorithms

Tutorial Overview

LOOCV Model Evaluation

LOOCV Procedure in Scikit-Learn

LOOCV to Evaluate Machine Learning Models

LOOCV for Classification

LOOCV for Regression

Further Reading

Tutorials

APIs

Summary

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

Finally Bring Machine Learning To
Your Own Projects

More On This Topic

51 Responses to LOOCV for Evaluating Machine Learning Algorithms

Leave a Reply Click here to cancel reply.

Navigation

Tutorial Overview

LOOCV Model Evaluation

LOOCV Procedure in Scikit-Learn

LOOCV to Evaluate Machine Learning Models

LOOCV for Classification

LOOCV for Regression

Further Reading

Tutorials

APIs

Summary

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

Finally Bring Machine Learning To Your Own Projects

More On This Topic

51 Responses to LOOCV for Evaluating Machine Learning Algorithms

Leave a Reply Click here to cancel reply.

Finally Bring Machine Learning To
Your Own Projects