Last Updated on August 26, 2020

The performance of a machine learning model can be characterized in terms of the **bias** and the **variance** of the model.

A model with high bias makes strong assumptions about the form of the unknown underlying function that maps inputs to outputs in the dataset, such as linear regression. A model with high variance is highly dependent upon the specifics of the training dataset, such as unpruned decision trees. We desire models with low bias and low variance, although there is often a trade-off between these two concerns.

The bias-variance trade-off is a useful conceptualization for selecting and configuring models, although generally cannot be computed directly as it requires full knowledge of the problem domain, which we do not have. Nevertheless, in some cases, we can estimate the error of a model and divide the error down into bias and variance components, which may provide insight into a given model’s behavior.

In this tutorial, you will discover how to calculate the bias and variance for a machine learning model.

After completing this tutorial, you will know:

- Model error consists of model variance, model bias, and irreducible error.
- We seek models with low bias and variance, although typically reducing one results in a rise in the other.
- How to decompose mean squared error into model bias and variance terms.

**Kick-start your project** with my new book Machine Learning Mastery With Python, including *step-by-step tutorials* and the *Python source code* files for all examples.

Let’s get started.

## Tutorial Overview

This tutorial is divided into three parts; they are:

- Bias, Variance, and Irreducible Error
- Bias-Variance Trade-off
- Calculate the Bias and Variance

## Bias, Variance, and Irreducible Error

Consider a machine learning model that makes predictions for a predictive modeling task, such as regression or classification.

The performance of the model on the task can be described in terms of the prediction error on all examples not used to train the model. We will refer to this as the model error.

- Error(Model)

The model error can be decomposed into three sources of error: the **variance** of the model, the **bias** of the model, and the variance of the **irreducible error** in the data.

- Error(Model) = Variance(Model) + Bias(Model) + Variance(Irreducible Error)

Let’s take a closer look at each of these three terms.

### Model Bias

The bias is a measure of how close the model can capture the mapping function between inputs and outputs.

It captures the rigidity of the model: the strength of the assumption the model has about the functional form of the mapping between inputs and outputs.

This reflects how close the functional form of the model can get to the true relationship between the predictors and the outcome.

— Page 97, Applied Predictive Modeling, 2013.

A model with high bias is helpful when the bias matches the true but unknown underlying mapping function for the predictive modeling problem. Yet, a model with a large bias will be completely useless when the functional form for the problem is mismatched with the assumptions of the model, e.g. assuming a linear relationship for data with a high non-linear relationship.

**Low Bias**: Weak assumptions regarding the functional form of the mapping of inputs to outputs.**High Bias**: Strong assumptions regarding the functional form of the mapping of inputs to outputs.

The bias is always positive.

### Model Variance

The variance of the model is the amount the performance of the model changes when it is fit on different training data.

It captures the impact of the specifics the data has on the model.

Variance refers to the amount by which [the model] would change if we estimated it using a different training data set.

— Page 34, An Introduction to Statistical Learning with Applications in R, 2014.

A model with high variance will change a lot with small changes to the training dataset. Conversely, a model with low variance will change little with small or even large changes to the training dataset.

**Low Variance**: Small changes to the model with changes to the training dataset.**High Variance**: Large changes to the model with changes to the training dataset.

The variance is always positive.

### Irreducible Error

On the whole, the error of a model consists of reducible error and irreducible error.

- Model Error = Reducible Error + Irreducible Error

The reducible error is the element that we can improve. It is the quantity that we reduce when the model is learning on a training dataset and we try to get this number as close to zero as possible.

The irreducible error is the error that we can not remove with our model, or with any model.

The error is caused by elements outside our control, such as statistical noise in the observations.

… usually called “irreducible noise” and cannot be eliminated by modeling.

— Page 97, Applied Predictive Modeling, 2013.

As such, although we may be able to squash the reducible error to a very small value close to zero, or even zero in some cases, we will also have some irreducible error. It defines a lower bound in performance on a problem.

It is important to keep in mind that the irreducible error will always provide an upper bound on the accuracy of our prediction for Y. This bound is almost always unknown in practice.

— Page 19, An Introduction to Statistical Learning with Applications in R, 2014.

It is a reminder that no model is perfect.

## Bias-Variance Trade-off

The bias and the variance of a model’s performance are connected.

Ideally, we would prefer a model with low bias and low variance, although in practice, this is very challenging. In fact, this could be described as the goal of applied machine learning for a given predictive modeling problem,

Reducing the bias can easily be achieved by increasing the variance. Conversely, reducing the variance can easily be achieved by increasing the bias.

This is referred to as a trade-off because it is easy to obtain a method with extremely low bias but high variance […] or a method with very low variance but high bias …

— Page 36, An Introduction to Statistical Learning with Applications in R, 2014.

This relationship is generally referred to as the **bias-variance trade-off**. It is a conceptual framework for thinking about how to choose models and model configuration.

We can choose a model based on its bias or variance. Simple models, such as linear regression and logistic regression, generally have a high bias and a low variance. Complex models, such as random forest, generally have a low bias but a high variance.

We may also choose model configurations based on their effect on the bias and variance of the model. The k hyperparameter in k-nearest neighbors controls the bias-variance trade-off. Small values, such as k=1, result in a low bias and a high variance, whereas large k values, such as k=21, result in a high bias and a low variance.

High bias is not always bad, nor is high variance, but they can lead to poor results.

We often must test a suite of different models and model configurations in order to discover what works best for a given dataset. A model with a large bias may be too rigid and underfit the problem. Conversely, a large variance may overfit the problem.

We may decide to increase the bias or the variance as long as it decreases the overall estimate of model error.

## Calculate the Bias and Variance

I get this question all the time:

How can I calculate the bias-variance trade-off for my algorithm on my dataset?

Technically, we cannot perform this calculation.

We cannot calculate the actual bias and variance for a predictive modeling problem.

This is because we do not know the true mapping function for a predictive modeling problem.

Instead, we use the bias, variance, irreducible error, and the bias-variance trade-off as tools to help select models, configure models, and interpret results.

In a real-life situation in which f is unobserved, it is generally not possible to explicitly compute the test MSE, bias, or variance for a statistical learning method. Nevertheless, one should always keep the bias-variance trade-off in mind.

— Page 36, An Introduction to Statistical Learning with Applications in R, 2014.

Even though the bias-variance trade-off is a conceptual tool, we can estimate it in some cases.

The mlxtend library by Sebastian Raschka provides the bias_variance_decomp() function that can estimate the bias and variance for a model over multiple bootstrap samples.

First, you must install the mlxtend library; for example:

1 |
sudo pip install mlxtend |

The example below loads the Boston housing dataset directly via URL, splits it into train and test sets, then estimates the mean squared error (MSE) for a linear regression as well as the bias and variance for the model error over 200 bootstrap samples.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# estimate the bias and variance for a regression model from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from mlxtend.evaluate import bias_variance_decomp # load dataset url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv' dataframe = read_csv(url, header=None) # separate into inputs and outputs data = dataframe.values X, y = data[:, :-1], data[:, -1] # split the data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) # define the model model = LinearRegression() # estimate bias and variance mse, bias, var = bias_variance_decomp(model, X_train, y_train, X_test, y_test, loss='mse', num_rounds=200, random_seed=1) # summarize results print('MSE: %.3f' % mse) print('Bias: %.3f' % bias) print('Variance: %.3f' % var) |

Running the example reports the estimated error as well as the estimated bias and variance for the model error.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the model has a high bias and a low variance. This is to be expected given that we are using a linear regression model. We can also see that the sum of the estimated mean and variance equals the estimated error of the model, e.g. 20.726 + 1.761 = 22.487.

1 2 3 |
MSE: 22.487 Bias: 20.726 Variance: 1.761 |

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

### Tutorials

### Books

- An Introduction to Statistical Learning with Applications in R, 2014.
- Applied Predictive Modeling, 2013.

### Articles

## Summary

In this tutorial, you discovered how to calculate the bias and variance for a machine learning model.

Specifically, you learned:

- Model error consists of model variance, model bias, and irreducible error.
- We seek models with low bias and variance, although typically reducing one results in a rise in the other.
- How to decompose mean squared error into model bias and variance terms.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.

Hi Jason,

It’s wonderful article.Liked it very much. Never knew we could get bias & variance from a model.

One question I have is , using the library mlxtend similar to how you calculated bias ,Variance and compared it with (mse) for linear regression is it also possible for getting bias , variance for classifier models( logistic or Tree based one) – if so against what we should compare ( against which error term)?

Thanks,

Nalla

Thank you!

It is possible I believe, but not straightforward. Perhaps the mlxtend has an implementation for classification too?

Perhaps you can check the literature and try implementing it?

Hi Jason,

It is a wonderful article. Liked the way you explained,

I have a small doubt. you have written reducing the bias cannot easily be achieved by increasing the variance. shouldn’t it be reducing the bias that can easily be achieved by increasing the variance?

You’re right, it looks like a typo. Fixed, thank you!

Hi Jason,

Great article and thank you for all the articles and books!

I tried the bias_variance_decomp() on an XGBRegressor() that I hypertuned and used early_stopping. Below are some metrics. Could we say we are overfitting the model and if so what would be some techniques to lower the bias and increase the variance? Less hypertuning or early_stopping?

Best Params: {‘subsample’: 0.7, ‘n_estimators’: 250, ‘max_depth’: 10, ‘learning_rate’: 0.2, ‘colsample_bynode’: 0.6}

RMSE 1.5669 MSE 2.4553 MAE 1.0522 R2 0.8716

MSE: 4.2508 Bias: 3.2046 Variance: 1.0463

Thanks,

Bill

You’re welcome Bill!

Not sure we can use the tool this way. I need to think about it.

You are overfitting if improved performance on the training set negatively impacts performance on the hold out/test set.

Fewer trees would increase variance, although typically with xgboost we want more bias, not more variance. E.g. try more regularization.

Hi Jason,

I am following your script, but I got this error:

—————————————————————————

ImportError Traceback (most recent call last)

in ()

3 from sklearn.model_selection import train_test_split

4 from sklearn.linear_model import LinearRegression

—-> 5 from mlxtend.evaluate import bias_variance_decomp

ImportError: cannot import name ‘bias_variance_decomp’

—————————————————————————

Any advice about it?

Thanks in advance, JL

It looks like you do not have the mlxtend library installed.

Dear Dr Jason,

I was not quite sure what to get from determining the mse, bias and variance from one model from the mlxtend package.

So what I did was to look at various models at another of your tutorials at https://machinelearningmastery.com/stacking-ensemble-machine-learning-with-python/ where I modified the code to include the measure of bias, variance and mse.

The code is from your site.

The dataset used was not the housing.csv, but synthetically generated data.

I included these models.

I did some further modifications to include into the evaluate_models

I had the following results of not only the mean(scores), std(scores), but also the mse, variance and bias.

A table of the models based on highest to lowest scores is presented:

From the above, SVM had the highest mean score. It also had the lowest stddev, mse,bias and variance.

At the same time, there appears to be no tradeoff between bias and variance.

In the end, you cannot choose a model just only on bias, variance and mse alone. Have to look also at other scoring methods, otherwise LR would have been a candidate. BUT LR has a lower score than SVM.

Thank you,

Anthony of Sydney

Great work.

Agreed, choosing a model based on bias/variance would not be sufficient. MSE should be the focus.

Hi Jason,

with keras sequential model

Keras_fit = Sequential()

Keras_fit.add….

Keras_fit.compile(optimizer = optimizer, loss = 'binary_crossentropy', metrics = ['accuracy'])

mse, bias, var = bias_variance_decomp(Keras_fit, X_train, y_train, X_valid, y_valid, loss='mse', num_rounds=200, random_seed=1)

As keras_fit doesn’t have the model, If I only pass the keras_fit to, I get

AttributeError: 'History' object has no attribute 'predict'

Even passing model, it doesn’t work

Keras_model = Keras_fit.fit(X_train, y_train, validation_data=(X_valid, y_valid), batch_size = batch_size, epochs = epoch, verbose = 2)

May i know how to get that

Thanks

The fit() function returns history, not the model.

Call fit() on the model object itself.

Here’s an example:

https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/

For svm model your “loss” cannot be “mse”. It should be “0-1” or “hinge loss”. svm is not solving the regression problem but the classification problem. So, it is more justifiable to use “0-1 loss” or “hinge loss” (I am not sure whether hinge loss is avaiable or not for this library”

There is a version of SVM for regression called SVR, you can learn more here:

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html