What is Machine Learning: A Tour of Authoritative Definitions and a Handy One-Liner You Can Use

In this post I want to address the simple question: What is Machine Learning? 

So you’re interested in Machine Learning and maybe you dabble in it a little. If you talk about Machine Learning with a friend or colleague one day and  run the risk of someone actually asking you “So, what is machine learning?“. The goal of this post is to give you a few definitions to think about and a handy one-liner definition that is easy to remember.

We will start out by getting a feeling for the standard definitions of Machine Learning taken from authoritative textbooks in the field. We’ll finish up by working out a programmers definition of machine learning and a handy one-liner that we can use anytime we’re asked: What is Machine Learning?

Authoritative Definitions

Let’s start out by looking at four textbooks on Machine Learning that are commonly used in university level courses. These are our authoritative definitions and lay our foundation for deeper thought on the subject. I chose these four definitions to highlight some useful and varied perspectives on the field. Through experience we’ll learn that the field really is a mess of methods and choosing a perspective is key to making progress.

Mitchell’s Machine Learning

Tom Mitchell in his book Machine Learning (affiliate link) provides a definition in the opening line of the preface:

The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience.

I like this short and sweet definition and it is the basis for the programmers definition we come up with at the end of the post. Note the mention of “computer programs” and the reference to “automated improvement”. Write programs that improve themselves, it’s provocative!

In his introduction he provides a short formalism that you’ll see much repeated:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

Don’t let the definition of terms scare you off, this is a very useful formalism. We can use this formalism as a template and put E, T, and P at the top of columns in a table and list out complex problems with less ambiguity. It could be used as a design tool to help us think clearly about what data to collect (E), what decisions the software needs to make (T) and how we will evaluate it’s results (P). This power is why it is oft repeated as a standard definition. Keep it in your back pocket.

Elements of Statistical Learning

The Elements of Statistical Learning: Data Mining, Inference, and Prediction (affiliate link) was written by three Stanford statisticians and self-described as a statistical framework to organize their field of inquiry. In the preface is written:

Vast amounts of data are being generated in many fields, and the statisticians’s job is to make sense of it all: to extract important patterns and trends, and to understand “what the data says”. We call this learning from data.

I understand the job of a statistician is to use the tools of statistics to interpret data in the context of the domain. The authors seem to include all of the field of Machine Learning as aids in that pursuit. Interestingly, they chose to include “Data Mining” in the subtitle of the book.

Statisticians learn from data, but software does too and we learn from the things that the software learns. From the decisions made and the results achieved by various machine learning methods.

Pattern Recognition

Bishop in the preface of his book Pattern Recognition and Machine Learning (affiliate link) comments:

Pattern recognition has its origins in engineering, whereas machine learning grew out of computer science. However, these activities can be viewed as two facets of the same field…

Reading this, you get the impression that Bishop came at the field from an engineering perspective and later learned and leveraged the Computer Science take on the same methods. This is a mature approach and one we should emulate. More broadly, regardless of the field that lays claim to a method, if it suits our needs by getting us closer to an insight or a result by “learning from data”, then we can decide to call it machine learning.

An Algorithmic Perspective

Marsland provides adopts the Mitchell definition of Machine Learning in his book Machine Learning: An Algorithmic Perspective (affiliate link). He provides a cogent note in his prologue that motives his writing the book:

One of the most interesting features of machine learning is that it lies on the boundary of several different academic disciplines, principally computer science, statistics, mathematics, and engineering. …machine learning is usually studied as part of artificial intelligence, which puts it firmly into computer science …understanding why these algorithms work requires a certain amount of statistical and mathematical sophistication that is often missing from computer science undergraduates.

This is insightful and instructive. Firstly, he underscores the multidisciplinary nature of the field. We were getting a feeling for that from the above definition, but he draws a big red underline for us. Machine Learning draws from all manner of information sciences. Secondly, he underscores the danger of sticking to a given perspective too tightly. Specifically, the case of a the algorithm-ist who shies away from the mathematical inner workings of a method. No doubt, the counter case of the statistician that shies away from the practical concerns of implementation and deployment is just as limiting.

Venn Diagram

Drew Conway created a nice Venn Diagram in September 2010 that might help. I find it helpful. In his explanation he comments Machine Learning is Hacking + Math & Statistics

Data Science Venn Diagram

Data Science Venn Diagram. Credited to Drew Conway,  Creative Commons licensed as Attribution-NonCommercial.

He also describes the Danger Zone as Hacking Skills + Expertise. Here, he is referring those people that know enough to be dangerous. They can access and structure data, they know the domain and they can run a method and present results, but don’t understand what the results mean. I think this is what Marsland may have been hinting at.

Programmers Definition

We now turn to the need to break all of this down to nuts and bolts for us programmers. We first look at complex problems that resist our decomposition and procedural solutions. This frames the power of machine learning. We then work out a definition that sits well with us programmers that we can use whenever we’re asked, “So, What is Machine Learning?” by other programmers.

Complex Problems

As a programmer, you will eventually encounter classes of problems that stubbornly resist a logical and procedural solution. What I mean is, there are classes of problems were it is not feasible or cost effective to sit down and write out all the if statements needed to solve the problem.

Sacrilege!” I hear your programmer brain shout.

It’s true. Take the every-day case of the decision problem of discriminating spam email from non-spam email. This is an example used all the time when introducing machine learning. How would you write a program to filter emails as they come into your email account and decide to put them in the spam folder or the inbox folder?

You’d probably start out by collecting some examples and having a look at them and a deep think about them. You’d look for patterns in the emails that are spam and those that are not. You’d think about abstracting those patterns so that your heuristics would work with new cases in the future. You’d ignore odd emails that will never be seen again. You’d go for easy wins to get your accuracy up and craft special things for the edge cases. You’d review the email frequently over time and think about abstracting new patterns to improve the decision making

There’s a machine learning algorithm in there, amongst all that, except it was executed by you the programmer rather than the computer. This manually derived hard coded system would only be as good as the programmers ability to extract rules from the data and implement them in the program.

It could be done, but it would take a lot of resources and be a maintenance nightmare.

Machine Learning

In the example above, I’m sure your programmer brain, that part of your brain that ruthlessly seeks to automate, could see the opportunity for automating and optimizing the meta process of extracting patterns from examples. Machine learning methods are this automated process.

In our spam/non-spam example, the examples (E) are emails we have collected. The task (T) was a decision problem (called classification) of marking each email as spam or not, and putting it in the correct folder. Our performance measure would be something like accuracy as a percentage (correct decisions divided by total decisions made multiplied by 100) between 0% (worst) and 100% (best).

Preparing a decision making program like this is typically called training, where collected examples are called the training set and the program is referred to as a model, as in a model of the problem of classifying spam from non-spam. As programmers, we like this terminology, a model has state and needs to be persisted, training is a process that is performed once and is maybe rerun as needed, classification is the task performed. It all makes sense to us.

We can see that some of the terminology used in the above definitions does not sit well for programmers. Technically, all the programs we write are automations, commenting that machine learning automatically learns is not meaningful.

Handy One-liner

So, let’s see if we can use these pieces and construct a programmers definition of machine learning. How about:

Machine Learning is the training of a model from data that generalizes a decision against a performance measure.

Training a model suggests training examples. A model suggests state acquired through experience. Generalizes a decision suggests the capability to make a decision based on inputs and anticipating unseen inputs in the future for which a decision will be required. Finally, against a performance measure suggests a targeted need and directed quality to the model being prepared.

I’m no poet, can you come up with a more accurate or more succinct programmers definition of Machine Learning? Leave a comment.

Resources

I’ve linked to resources throughout this post, but I have listed some useful resources below if you thirst for more reading.

Books

The following are the four textbooks from which definitions were taken. These book links are Amazon affiliate links, which means I will get a few cents if you decide to buy one.

Also, Drew Conway has a book in collaboration with John Myles White that is practical and fun to read titled Machine Learning for Hackers

Question and Answer Sites

There are some interesting discussions on Q&A websites about what exactly machine learning is, below are some picks.

I’ve thought hard about all of this, and my definition is coloured by the books I’ve read and the experiences I’ve had. Let me know if it’s useful.

Leave a comment and let us all know how you understand the field. What is Machine Learning to you? Do you know of any further resources we could fall back to? Leave a note.

10 Responses to What is Machine Learning: A Tour of Authoritative Definitions and a Handy One-Liner You Can Use

  1. Vikash January 2, 2014 at 12:50 pm #

    Thanks for collecting the quotes and coming up with your own in the process. Insightful and enjoyed it. Thanks for posting.

  2. qnaguru February 17, 2014 at 12:34 am #

    Wonderful introduction to Machine Learning – Programmers get that!

  3. hugo June 30, 2015 at 1:41 am #

    Muito bom ter pessoas que gostem de compartilhar conhecimento.Obrigado

  4. hugo June 30, 2015 at 1:50 am #

    Só mais uma coisa.Qual a relação entre aprendizado de máquina e estatística?

  5. Idrees April 2, 2016 at 1:29 am #

    The stress of PhD research and research assistantship had taken me off- guard. I took my time ti go through your post today. Honestly this piece lighten my head to gift of knowledge. Thank Dr Jason.

  6. AnalyticAscent May 7, 2016 at 3:51 pm #

    This is good to know, been struggling to explain to my family what my career path is in terms they can understand 🙂

  7. Yap May 11, 2016 at 8:51 pm #

    Here is mine handy one-liner: ML is a decision problem that needs to be explored from data against a measure outcome.

  8. Jim Kitzmiller July 19, 2016 at 4:50 am #

    Machine learning is the art and science of creating computer software that gets more accurate results after being used repeatedly.

Leave a Reply