Gentle Introduction to the Bias-Variance Trade-Off in Machine Learning

Supervised machine learning algorithms can best be understood through the lens of the bias-variance trade-off.

In this post, you will discover the Bias-Variance Trade-Off and how to use it to better understand machine learning algorithms and get better performance on your data.

Let’s get started.

Gentle Introduction to the Bias-Variance Trade-Off in Machine Learning

Gentle Introduction to the Bias-Variance Trade-Off in Machine Learning
Photo by Matt Biddulph, some rights reserved.

Overview of Bias and Variance

In supervised machine learning an algorithm learns a model from training data.

The goal of any supervised machine learning algorithm is to best estimate the mapping function (f) for the output variable (Y) given the input data (X). The mapping function is often called the target function because it is the function that a given supervised machine learning algorithm aims to approximate.

The prediction error for any machine learning algorithm can be broken down into three parts:

  • Bias Error
  • Variance Error
  • Irreducible Error

The irreducible error cannot be reduced regardless of what algorithm is used. It is the error introduced from the chosen framing of the problem and may be caused by factors like unknown variables that influence the mapping of the input variables to the output variable.

In this post, we will focus on the two parts we can influence with our machine learning algorithms. The bias error and the variance error.

Get your FREE Algorithms Mind Map

Machine Learning Algorithms Mind Map

Sample of the handy machine learning algorithms mind map.

I've created a handy mind map of 60+ algorithms organized by type.

Download it, print it and use it. 

Download For Free


Also get exclusive access to the machine learning algorithms email mini-course.

 

 

Bias Error

Bias are the simplifying assumptions made by a model to make the target function easier to learn.

Generally, parametric algorithms have a high bias making them fast to learn and easier to understand but generally less flexible. In turn, they have lower predictive performance on complex problems that fail to meet the simplifying assumptions of the algorithms bias.

  • Low Bias: Suggests less assumptions about the form of the target function.
  • High-Bias: Suggests more assumptions about the form of the target function.

Examples of low-bias machine learning algorithms include: Decision Trees, k-Nearest Neighbors and Support Vector Machines.

Examples of high-bias machine learning algorithms include: Linear Regression, Linear Discriminant Analysis and Logistic Regression.

Variance Error

Variance is the amount that the estimate of the target function will change if different training data was used.

The target function is estimated from the training data by a machine learning algorithm, so we should expect the algorithm to have some variance. Ideally, it should not change too much from one training dataset to the next, meaning that the algorithm is good at picking out the hidden underlying mapping between the inputs and the output variables.

Machine learning algorithms that have a high variance are strongly influenced by the specifics of the training data. This means that the specifics of the training have influences the number and types of parameters used to characterize the mapping function.

  • Low Variance: Suggests small changes to the estimate of the target function with changes to the training dataset.
  • High Variance: Suggests large changes to the estimate of the target function with changes to the training dataset.

Generally, nonparametric machine learning algorithms that have a lot of flexibility have a high variance. For example, decision trees have a high variance, that is even higher if the trees are not pruned before use.

Examples of low-variance machine learning algorithms include: Linear Regression, Linear Discriminant Analysis and Logistic Regression.

Examples of high-variance machine learning algorithms include: Decision Trees, k-Nearest Neighbors and Support Vector Machines.

Bias-Variance Trade-Off

The goal of any supervised machine learning algorithm is to achieve low bias and low variance. In turn the algorithm should achieve good prediction performance.

You can see a general trend in the examples above:

  • Parametric or linear machine learning algorithms often have a high bias but a low variance.
  • Non-parametric or non-linear machine learning algorithms often have a low bias but a high variance.

The parameterization of machine learning algorithms is often a battle to balance out bias and variance.

Below are two examples of configuring the bias-variance trade-off for specific algorithms:

  • The k-nearest neighbors algorithm has low bias and high variance, but the trade-off can be changed by increasing the value of k which increases the number of neighbors that contribute t the prediction and in turn increases the bias of the model.
  • The support vector machine algorithm has low bias and high variance, but the trade-off can be changed by increasing the C parameter that influences the number of violations of the margin allowed in the training data which increases the bias but decreases the variance.

There is no escaping the relationship between bias and variance in machine learning.

  • Increasing the bias will decrease the variance.
  • Increasing the variance will decrease the bias.

There is a trade-off at play between these two concerns and the algorithms you choose and the way you choose to configure them are finding different balances in this trade-off for your problem

In reality, we cannot calculate the real bias and variance error terms because we do not know the actual underlying target function. Nevertheless, as a framework, bias and variance provide the tools to understand the behavior of machine learning algorithms in the pursuit of predictive performance.

Further Reading

This section lists some recommend resources if you are looking to learn more about bias, variance and the bias-variance trade-off.

Summary

In this post, you discovered bias, variance and the bias-variance trade-off for machine learning algorithms.

You now know that:

  • Bias is the simplifying assumptions made by the model to make the target function easier to approximate.
  • Variance is the amount that the estimate of the target function will change given different training data.
  • Trade-off is tension between the error introduced by the bias and the variance.

Do you have any questions about bias, variance or the bias-variance trade-off. Leave a comment and ask your question and I will do my best to answer.


Frustrated With Machine Learning Math?

Mater Machine Learning Algorithms

See How Algorithms Work in Minutes

…with just arithmetic and simple examples

Discover how in my new Ebook: Master Machine Learning Algorithms

It covers explanations and examples of 10 top algorithms, like:
Linear Regression, k-Nearest Neighbors, Support Vector Machines and much more…

Finally, Pull Back the Curtain on
Machine Learning Algorithms

Skip the Academics. Just Results.

Click to learn more.


23 Responses to Gentle Introduction to the Bias-Variance Trade-Off in Machine Learning

  1. Akash Ansari March 28, 2016 at 9:21 pm #

    I have just read a nice and delicate blogpost about bias-variance tradeoff. Looking forward to learn more Machine Learning Algorithms in a simpler fashion.

  2. R M Jain September 15, 2016 at 7:44 pm #

    Hi, I want to check bias-variance tradeoff for iris dataset.. Is anyone knows how to find it>?????????
    please tell the solution…

  3. R M Jain September 15, 2016 at 7:45 pm #

    Hi, I required to find using r programming functions.. please reply…

  4. R Karthik February 2, 2017 at 5:12 pm #

    There is a typo i guess…. High bias means more assumptions for the target function. Eg: Linear regression.

    But in the article it is specified as opposite,

    ” High-Bias: Suggests less assumptions about the form of the target function. “

  5. Fei Du February 25, 2017 at 12:31 pm #

    Hi, less assumptions probably mean less complex model, so I guess high-bias may suggest less complex model and less assumptions.

    • Jason Brownlee February 26, 2017 at 5:28 am #

      A high bias assumes a strong assumption or strong restrictions on the model.

      • Massi December 10, 2017 at 8:28 am #

        I agree with Fei Du

  6. Soghra. July 6, 2017 at 4:27 am #

    This is perfect explanation. Thanks for your efforts.

  7. sam August 1, 2017 at 10:14 pm #

    Very simple explanation. thanks.

  8. Raj September 6, 2017 at 8:46 am #

    Good explanation, Thank you!

  9. Ashwin Agrawal October 13, 2017 at 10:12 pm #

    Nice work!!! I have one query if we decrease the variance then we observe the bias increases and vice versa, but is the rate of fall and rise of these parameters is same or constant or it is dependent of specific algorithms used. Can we tune a model based on bias-variance trade off?

    • Jason Brownlee October 14, 2017 at 5:45 am #

      They are tied together.

      Yes, one must tune the trade-off on a specific problem to get the right level of generalization required.

  10. soumya December 7, 2017 at 1:30 am #

    while designing any model which must be considered to minimize first bias or variance so as to get a better model ?

    • Jason Brownlee December 7, 2017 at 8:05 am #

      Perhaps start with something really high bias and slowly move toward higher variance?

  11. soumya December 7, 2017 at 1:35 am #

    Is there any limit or a scale to know the errors are minimum or maximum in bias and variance?

  12. Keval December 8, 2017 at 8:01 am #

    What are some the measures for understanding bias and variance in our model? How can we quantify bias-variance trade-off? Thanks.

    • Jason Brownlee December 8, 2017 at 2:28 pm #

      Good question, it may be possible for specific algorithms, such as in knn increasing k from 1 to n (number of patterns) and plotting model skill in a test set.

Leave a Reply