Why Do Machine Learning Algorithms Work on Data That They Have Not Seen Before?

The superpower of machine learning is generalization.

I recently got the question:

“How can a machine learning model make accurate predictions on data that it has not seen before?”

The answer is generalization, and this is the capability that we seek when we apply machine learning to challenging problems.

In this post, you will discover generalization, the superpower of machine learning

After reading this post, you will know:

  • That machine learning algorithms all seek to learn a mapping from inputs to outputs.
  • That simpler skillful machine learning models are easier to understand and more robust.
  • That machine learning is only suitable when the problem requires generalization.

Let’s get started.

Why Do Machine Learning Algorithms Work on Data They Have Not Seen?

Why Do Machine Learning Algorithms Work on Data They Have Not Seen?
Photo by gnuckx, some rights reserved.

What Do Machine Learning Algorithms Do?

When we fit a machine learning algorithm, we require a training dataset.

This training dataset includes a set of input patterns and the corresponding output patterns. The goal of the machine learning algorithm is to learn a reasonable approximation of the mapping from input patterns to output patterns.

Here are some examples to make this concrete:

  • Mapping from emails to whether they are spam or not for email spam classification.
  • Mapping from house details to house sale price for house sale price regression.
  • Mapping from photograph to text to describe the photo in photo caption generation.

The list could go on.

We can summarize this mapping that machine learning algorithms learn as a function (f) that predicts the output (y) given the input (X), or restated:

Our goal in fitting the machine learning algorithms is to get the best possible f() for our purposes.

We are training the model to make predictions in the future given inputs for cases where we do not have the outputs. Where the outputs are unknown. This requires that the algorithm learn in general how to take observations from the domain and make a prediction, not just the specifics of the training data.

This is called generalization.

Generalization is Hard, but Powerful

A machine learning algorithm must generalize from training data to the entire domain of all unseen observations in the domain so that it can make accurate predictions when you use the model.

This is really hard.

This approach of generalization requires that the data that we use to train the model (X) is a good and reliable sample of the observations in the mapping we want the algorithm to learn. The higher the quality and the more representative, the easier it will be for the model to learn the unknown and underlying “true” mapping that exists from inputs to outputs.

To generalize means to go from something specific to something broad.

It is the way humans we learn.

  • We don’t memorize specific roads when we learn to drive; we learn to drive in general so that we can drive on any road or set of conditions.
  • We don’t memorize specific computer programs when learning to code; we learn general ways to solve problems with code for any business case that might come up.
  • We don’t memorize the specific word order in natural language; we learn general meanings for words and put them together in new sequences as needed.

The list could go on.

Machine learning algorithms are procedures to automatically generalize from historical observations. And they can generalize on more data than a human could consider, faster than a human could consider it.

It is the speed and scale with which these automated generalization machines operate that is what is so exciting in the field of machine learning.

We Prefer Simpler Models

The machine learning model is the result of the automated generalization procedure called the machine learning algorithm.

The model could be said to be a generalization of the mapping from training inputs to training outputs.

There may be many ways to map inputs to outputs for a specific problem and we can navigate these ways by testing different algorithms, different algorithm configurations, different training data, and so on.

We cannot know which approach will result in the most skillful model beforehand, therefore we must test a suite of approaches, configurations, and framings of the problem to discover what works and what the limits of learning are on the problem before selecting a final model to use.

The skill of the model at making predictions determines the quality of the generalization and can help as a guide during the model selection process.

Out of the millions of possible mappings, we prefer simpler mappings over complex mappings. Put another way, we prefer the simplest possible hypothesis that explains the data. This is one way to choose models and comes from Occam’s razor.

The simpler model is often (but not always) easier to understand and maintain and is more robust. In practice, you may want to choose the best performing simplest model.

Machine Learning is Not for Every Problem

The ability to automatically learn by generalization is powerful, but is not suitable for all problems.

  • Some problems require a precise solution, such as arithmetic on a bank account balance.
  • Some problems can be solved by generalization, but simpler solutions exist, such as calculating the square root of positive numbers.
  • Some problems look like they could be solved by generalization but there exists no structured underlying relationship to generalize from the data, or such a function is too complex, such as predicting security prices.

Key to the effective use of machine learning is learning where it can and cannot (or should not) be used.

Sometimes this is obvious, but often it is not. Again, you must use experience and experimentation to help tease out whether a problem is a good fit for being solved by generalization.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Summary

In this post, you discovered generalization, the key capabilities that underlie all supervised machine learning algorithms.

Specifically, you learned:

  • That machine learning algorithms all seek to learn a mapping from inputs to outputs.
  • That simpler skillful machine learning models are easier to understand and more robust.
  • That machine learning is only suitable when the problem requires generalization.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

6 Responses to Why Do Machine Learning Algorithms Work on Data That They Have Not Seen Before?

  1. Anton February 5, 2018 at 10:56 pm #

    Hey Jason,

    Great post once again. How would you go on about predicting a set of numbers, or a pair of numbers, such as the outcome of a sports game? I wasn’t able to find an example on here.

    Regards,
    Anton

  2. Adam February 13, 2018 at 1:53 am #

    Jason, you are a legend , I do deeply appreciate all your posts, very simple, very intuitive, very useful, thanks heaps and heaps,
    Adam

  3. Sanpreet Singh February 16, 2018 at 6:05 pm #

    Jason Brownlee, great work done!!

    Greeting from my side. Following parts I liked the most

    1.) You started the post by asking a question?
    2.) Second part that I liked most is that you have mentioned what readers would get after reading this post.
    3.) You moved to the part Machine Learning Algorithms Do which is quite logical and keeps the interest of the readers.
    4.) You related the concepts with the real time like humans which makes anyone to understand the concept in a faster way.
    5.) At last constructive conclusion.

    I am passionate about reading and reads a lot of post which i like. I would share here one more link which can also be gone through http://bit.ly/2BTs7bv

    With Regards
    Sanpreet Singh

Leave a Reply