What is Machine Learning?

By Jason Brownlee on August 16, 2020 in Start Machine Learning 70

You’re interested in Machine Learning and maybe you dabble in it a little.

If you talk about Machine Learning with a friend or colleague one day, you run the risk of someone actually asking you:

“So, what is machine learning?“

The goal of this post is to give you a few definitions to think about and a handy one-liner definition that is easy to remember.

We will start out by getting a feeling for the standard definitions of Machine Learning taken from authoritative textbooks in the field. We’ll finish up by working out a developers definition of machine learning and a handy one-liner that we can use anytime we’re asked: What is Machine Learning?

Authoritative Definitions

Let’s start out by looking at four textbooks on Machine Learning that are commonly used in university-level courses.

These are our authoritative definitions and lay our foundation for deeper thought on the subject.

I chose these four definitions to highlight some useful and varied perspectives on the field. Through experience, we’ll learn that the field really is a mess of methods and choosing a perspective is key to making progress.

Mitchell’s Machine Learning

Tom Mitchell in his book Machine Learning provides a definition in the opening line of the preface:

The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience.

I like this short and sweet definition and it is the basis for the developers definition we come up with at the end of the post.

Note the mention of “computer programs” and the reference to “automated improvement“. Write programs that improve themselves, it’s provocative!

In his introduction he provides a short formalism that you’ll see much repeated:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

Don’t let the definition of terms scare you off, this is a very useful formalism.

We can use this formalism as a template and put E, T, and P at the top of columns in a table and list out complex problems with less ambiguity. It could be used as a design tool to help us think clearly about what data to collect (E), what decisions the software needs to make (T) and how we will evaluate its results (P). This power is why it is oft repeated as a standard definition. Keep it in your back pocket.

Elements of Statistical Learning

The Elements of Statistical Learning: Data Mining, Inference, and Prediction was written by three Stanford statisticians and self-described as a statistical framework to organize their field of inquiry.

In the preface is written:

Vast amounts of data are being generated in many fields, and the statisticians’s job is to make sense of it all: to extract important patterns and trends, and to understand “what the data says”. We call this learning from data.

I understand the job of a statistician is to use the tools of statistics to interpret data in the context of the domain. The authors seem to include all of the field of Machine Learning as aids in that pursuit. Interestingly, they chose to include “Data Mining” in the subtitle of the book.

Statisticians learn from data, but software does too and we learn from the things that the software learns. From the decisions made and the results achieved by various machine learning methods.

Pattern Recognition

Bishop in the preface of his book Pattern Recognition and Machine Learning comments:

Pattern recognition has its origins in engineering, whereas machine learning grew out of computer science. However, these activities can be viewed as two facets of the same field…

Reading this, you get the impression that Bishop came at the field from an engineering perspective and later learned and leveraged the Computer Science take on the same methods. Pattern recognition is an engineering or signal processing term.

This is a mature approach and one we should emulate. More broadly, regardless of the field that lays claim to a method, if it suits our needs by getting us closer to an insight or a result by “learning from data”, then we can decide to call it machine learning.

An Algorithmic Perspective

Marsland provides adopts the Mitchell definition of Machine Learning in his book Machine Learning: An Algorithmic Perspective.

He provides a cogent note in his prologue that motivates his writing the book:

One of the most interesting features of machine learning is that it lies on the boundary of several different academic disciplines, principally computer science, statistics, mathematics, and engineering. …machine learning is usually studied as part of artificial intelligence, which puts it firmly into computer science …understanding why these algorithms work requires a certain amount of statistical and mathematical sophistication that is often missing from computer science undergraduates.

This is insightful and instructive.

Firstly, he underscores the multidisciplinary nature of the field. We were getting a feeling for that from the above definition, but he draws a big red underline for us. Machine Learning draws from all manner of information sciences.

Secondly, he underscores the danger of sticking to a given perspective too tightly. Specifically, the case of a the algorithmist who shies away from the mathematical inner workings of a method.

No doubt, the counter case of the statistician that shies away from the practical concerns of implementation and deployment is just as limiting.

Venn Diagram

Drew Conway created a nice Venn Diagram in September 2010 that might help.

In his explanation, he comments: Machine Learning = Hacking + Math & Statistics

Data Science Venn Diagram. Credited to Drew Conway, Creative Commons licensed as Attribution-NonCommercial.

He also describes the Danger Zone as Hacking Skills + Expertise.

Here, he is referring to those people that know enough to be dangerous. They can access and structure data, they know the domain and they can run a method and present results, but don’t understand what the results mean. I think this is what Marsland may have been hinting at.

Developers Definition of Machine Learning

We now turn to the need to break all of this down to nuts and bolts for us developers.

We first look at complex problems that resist our decomposition and procedural solutions. This frames the power of machine learning. We then work out a definition that sits well with us developers that we can use whenever we’re asked, “So, What is Machine Learning?” by other developers.

Complex Problems

As a developer, you will eventually encounter classes of problems that stubbornly resist a logical and procedural solution.

What I mean is, there are classes of problems where it is not feasible or cost-effective to sit down and write out all the if statements needed to solve the problem.

“Sacrilege!” I hear your developer’s brain shout.

It’s true.

Take the every-day case of the decision problem of discriminating spam email from non-spam email. This is an example used all the time when introducing machine learning. How would you write a program to filter emails as they come into your email account and decide to put them in the spam folder or the inbox folder?

You’d probably start out by collecting some examples and having a look at them and a deep think about them. You’d look for patterns in the emails that are spam and those that are not. You’d think about abstracting those patterns so that your heuristics would work with new cases in the future. You’d ignore odd emails that will never be seen again. You’d go for easy wins to get your accuracy up and craft special things for the edge cases. You’d review the email frequently over time and think about abstracting new patterns to improve the decision making.

There’s a machine learning algorithm in there, amongst all that, except it was executed by you the programmer rather than the computer. This manually derived hardcoded system would only be as good as the programmer’s ability to extract rules from the data and implement them in the program.

It could be done, but it would take a lot of resources and be a maintenance nightmare.

Machine Learning

In the example above, I’m sure your developer brain, that part of your brain that ruthlessly seeks to automate, could see the opportunity for automating and optimizing the meta-process of extracting patterns from examples.

Machine learning methods are this automated process.

In our spam/non-spam example, the examples (E) are emails we have collected. The task (T) was a decision problem (called classification) of marking each email as spam or not, and putting it in the correct folder. Our performance measure (P) would be something like accuracy as a percentage (correct decisions divided by total decisions made multiplied by 100) between 0% (worst) and 100% (best).

Preparing a decision making program like this is typically called training, where collected examples are called the training set and the program is referred to as a model, as in a model of the problem of classifying spam from non-spam. As developers, we like this terminology, a model has state and needs to be persisted, training is a process that is performed once and is maybe rerun as needed, classification is the task performed. It all makes sense to us.

We can see that some of the terminology used in the above definitions does not sit well for programmers. Technically, all the programs we write are automations, commenting that machine learning automatically learns is not meaningful.

Handy One-liner

So, let’s see if we can use these pieces and construct a developers definition of machine learning. How about:

Machine Learning is the training of a model from data that generalizes a decision against a performance measure.

Training a model suggests training examples. A model suggests state acquired through experience. Generalizes a decision suggests the capability to make a decision based on inputs and anticipating unseen inputs in the future for which a decision will be required. Finally, against a performance measure suggests a targeted need and directed quality to the model being prepared.

I’m no poet, can you come up with a more accurate or more succinct developers definition of Machine Learning?
Share your definition in the comments below.

Quora is well suited to high-level questions like this, have a browse through some. My picks are: What is machine learning in layman’s terms? and What is data science?
Cross Validated has some great discussions on this higher-level question. See The Two Cultures: statistics vs. machine learning? Two resources mentioned in this discussion are the blog post Statistics vs. Machine Learning, fight! and the paper Statistical Modeling: The Two Cultures.
Stack Overflow also has some discussion, for example, checkout What is machine learning?

I’ve thought hard about all of this, and my definition is coloured by the books I’ve read and the experiences I’ve had. Let me know if it’s useful.

Leave a comment and let us all know how you understand the field. What is Machine Learning to you? Do you know of any further resources we could fall back to?
Let me know in the comments below.

70 Responses to What is Machine Learning?

Vikash January 2, 2014 at 12:50 pm #

Thanks for collecting the quotes and coming up with your own in the process. Insightful and enjoyed it. Thanks for posting.

Reply
- James Carmichael January 7, 2022 at 7:55 am #
  
  You very welcome Vikash!
  
  Reply
qnaguru February 17, 2014 at 12:34 am #

Wonderful introduction to Machine Learning – Programmers get that!

Reply
hugo June 30, 2015 at 1:41 am #

Muito bom ter pessoas que gostem de compartilhar conhecimento.Obrigado

Reply
hugo June 30, 2015 at 1:50 am #

Só mais uma coisa.Qual a relação entre aprendizado de máquina e estatística?

Reply
- Lucas Lopes January 19, 2018 at 10:33 am #
  
  Como descrito na parte do diagrama de Venn, é possível desenvolver aplicações que aprendem através de dados, sem necessariamente dominar estatística. Mas as chances de criar um modelo que não extrai a real informação dos dados, é alta.
  
  Reply
  - Jason Brownlee January 20, 2018 at 8:15 am #
    
    Not sure I agree.
    
    I think you can drive a car without understanding how an engine works, or solve a business problem with code without understanding the theory of computation.
    
    Reply
    - steveK November 21, 2019 at 5:43 am #
      
      True, but sometimes understanding helps. I “hacked” my Kia Sorento to get incredible mileage over hilly terrain. It was the last two of eight hours of driving that day to get my daughter to a writer’s retreat. I drove the plateau and slight up hills at 45 mph, a bit below the speed limit. When I got near where to go downhill, I slowed down and used the clutchless shift to go down. At one point, a truck was behind me, and that was just where there was a second driving lane, so I let him by. Otherwise, I found that by downshifting on the hills, I would approach the curves at a more reasonable speed, without braking, and I wouldn’t overcompensate on the curve. I got 26.5 mpg on an AWD that is rated for 24 highway. Part of my success was gravity, but using engine friction was part of my success! So, that’s not a lot of understanding of the engine, but I did understand the system.
      
      Reply
      - Jason Brownlee November 21, 2019 at 6:13 am #
        
        Yes, but you did not need that understanding to get started, or even drive across country.
        
        It’s a great next step once the basics are covered and value is being delivered.
Idrees April 2, 2016 at 1:29 am #

The stress of PhD research and research assistantship had taken me off- guard. I took my time ti go through your post today. Honestly this piece lighten my head to gift of knowledge. Thank Dr Jason.

Reply
- Jason Brownlee April 8, 2016 at 1:38 pm #
  
  You’re very welcome.
  
  Reply
AnalyticAscent May 7, 2016 at 3:51 pm #

This is good to know, been struggling to explain to my family what my career path is in terms they can understand 🙂

Reply
Yap May 11, 2016 at 8:51 pm #

Here is mine handy one-liner: ML is a decision problem that needs to be explored from data against a measure outcome.

Reply
Jim Kitzmiller July 19, 2016 at 4:50 am #

Machine learning is the art and science of creating computer software that gets more accurate results after being used repeatedly.

Reply
- Jason Brownlee July 19, 2016 at 5:25 am #
  
  Thanks Jim
  
  Reply
Yasmina October 20, 2016 at 8:04 am #

it’s very well explain Dr Jason, I work on prediction of crbonation depth of concrete by using artificial neural natwork, Ive prepared algorithm with collaboration of my collegue, but computer programming is not easy for civil engineer. I’ll read your course in future, thanks

Reply
- Jason Brownlee October 20, 2016 at 8:41 am #
  
  I’m glad you found it useful Yasmina.
  
  Reply
Van-Hau Nguyen November 7, 2016 at 5:17 pm #

Thanks for the great posts. You have been doing a lot of interesting works.
How could you keep your enthusiasm in ML?

Reply
- Jason Brownlee November 8, 2016 at 9:51 am #
  
  Thanks Van-Hau Nguyen.
  
  It keeps me interested because every day is a new challenge. There are no “right” answers and there are always more things to learn and improve upon.
  
  Also, I love to help beginners get started and see how easy it is to apply.
  
  Reply
Julien March 22, 2017 at 1:22 pm #

This great! As a non-programmer, my one-liner might be something like: Machine Learning is using data to create a model and then use that model to make predictions.

Am I understanding it correctly or is there something missing in my one-liner?

Reply
- Jason Brownlee March 23, 2017 at 8:45 am #
  
  I love it Julien. Clear, simple and useful.
  
  Reply
Gokul Krishnaraj March 28, 2017 at 11:08 pm #

Solid ML material for references, started with a search on predictive analytics query and I am here motivated to look all the way to the 101 of your posts. Great work, Thanks.

Reply
- Jason Brownlee March 29, 2017 at 9:07 am #
  
  Thanks.
  
  Reply
Bolanle April 14, 2017 at 12:12 am #

my one-liner will be making a better prediction by using computer algorithms to train data for maximum accuracy.

Reply
- Jason Brownlee April 14, 2017 at 8:46 am #
  
  Nice Bolanle.
  
  Reply
Avornyo Charles May 23, 2017 at 5:54 am #

Thank for very much! I love it.
I will like to know if you need to be a very good python programming in other to use machine learning techniques.
I realize, when I was doing my Masters thesis. My supervise asked to implement a model my self and I needed to modify a package in python to make the model work. However, I struggle allot because of the fact that, I am not good at object oriented programming in python.
I like to do a PhD in Machine Learning applied to Health. However, I am current thinking if I will survive because of my mediocre programming skills.
What do you think I should do?

Reply
- Jason Brownlee May 23, 2017 at 8:00 am #
  
  I do not think you need to be an excellent programmer to be able to deliver useful and valuable results with applied machine learning.
  
  See this post:
  https://machinelearningmastery.com/machine-learning-for-programmers/
  
  Generally, I would recommend you focus on learning how to get good at working through predictive modeling problems end to end and delivering a result using libraries like sklearn and tools like Weka.
  
  Reply
- Rafael Espericueta August 23, 2017 at 8:53 pm #
  
  Practice your Python programming skills, and they will improve.
  And it’s even fun! 🙂
  
  Reply
  - Jason Brownlee August 24, 2017 at 6:33 am #
    
    Seconded.
    
    Reply
Phillip June 21, 2017 at 6:27 am #

Thank you Jason,
I’m just starting my journey in ML and your articles are very enlightening and easy to understand.

Reply
- Jason Brownlee June 21, 2017 at 8:19 am #
  
  Thanks Phillip, I’m glad to hear that.
  
  Reply
Ankit Mistry August 23, 2017 at 12:30 pm #

Checkout this video on layman understanding of machine learning :

https://www.youtube.com/watch?v=RaDFiMd-Amg

Reply
- Jason Brownlee August 23, 2017 at 4:24 pm #
  
  Thanks for sharing.
  
  Reply
Gaurav Jain December 8, 2017 at 3:14 am #

Thanks for such a beautiful blog! 🙂

Reply
- Jason Brownlee December 8, 2017 at 5:43 am #
  
  You’re welcome.
  
  Reply
Jesús December 13, 2017 at 2:42 am #

What a great article, Jason! Congratulations!

Reply
- Jason Brownlee December 13, 2017 at 5:42 am #
  
  Thanks.
  
  Reply
Ayyappan February 6, 2018 at 7:30 am #

Some time i think in the traditional programing, for a developers “program” is the primary focus but but in Machine learning program the focus shift to data.
A program work like the same no matter how much / long data is tunneled through, but the machine learning programs becomes smarter as much/long we tunnel data through.

Reply
- Jason Brownlee February 6, 2018 at 9:26 am #
  
  That is a really good observation!
  
  Reply
Pravin February 27, 2018 at 12:46 am #

Thanks for the lovely introduction!

Reply
- Jason Brownlee February 27, 2018 at 6:33 am #
  
  You’re welcome.
  
  Reply
Shravan May 1, 2018 at 11:53 am #

Wonderful kick start to understand machine learning covering lot of material. I think with experience it makes more sense of what is all said.

Reply
- Jason Brownlee May 2, 2018 at 5:37 am #
  
  I’m glad it helped.
  
  Reply
Christopher July 3, 2018 at 4:46 am #

Very nice piece!!!

You asked for feedback:

Your final definition seems to me to lack any reference to computers or programming. Though implied for those who think in computer jargon, I’d add it in: “Machine Learning is computer training of a model from data that generalizes a decision against a performance measure.”

Then, I’d ask, is this limited to labeled data? A pre-existing model? If not, wouldn’t it be, “training TO a model”? I think that would include both cases.??

Finally, in Conway’s Venn Diagram, I see no mention of ethics. This may be the biggest danger for mechanics (developers) and for the future, and deserves mention in every article on AI. IMHO Thanks!

Reply
- Jason Brownlee July 3, 2018 at 6:28 am #
  
  Thanks.
  
  Reply
Jesus Suniaga July 12, 2018 at 5:27 am #

My concept is that machine learning is a field to create useful mathematical models from data. (the word useful involves measure )

Reply
- Jason Brownlee July 12, 2018 at 6:30 am #
  
  Nice.
  
  Reply
Efstathios Chatzikyriakidis December 10, 2018 at 8:29 pm #

“Machine Learning (ML) is the science of learning from seen data in order to create models that will either extract from it hidden information or recognize, generate and predict unseen data.”

How do you think?

It leads to supervised / unsupervised methods, discriminative / generative models, classification / regression tasks.

Reply
- Jason Brownlee December 11, 2018 at 7:42 am #
  
  Very nice!
  
  Reply
Sridhar February 1, 2019 at 10:54 pm #

Thank You, Jason. I am new to Data Science and was looking for a better way to learn Machine Learning. Your blog is nice and actually has shown me a well-structured way to start looking into ML.

My definition is, Machine Learning is the science of generalizing a model based on the data available and used that model to predict future patterns.

What do you think?

Reply
- Jason Brownlee February 2, 2019 at 6:17 am #
  
  Sounds great Sridhar.
  
  Reply
Rushil Patel May 24, 2019 at 7:15 am #

Hi, read your machine blog. It’s really nice.
I have also started writing a blog on machine learning. Can you please check it and give feedback so I can improve.
link: https://learn-ml.com/

Reply
- Jason Brownlee May 24, 2019 at 8:07 am #
  
  Well done!
  
  Reply
Debesh Choudhury September 4, 2019 at 12:31 pm #

Simple and very good introduction of ML indeed .. I like Jim Kitzmiller’s comment – “Machine learning is the art and science of creating computer software that gets more accurate results after being used repeatedly”.

Reply
- Jason Brownlee September 4, 2019 at 1:45 pm #
  
  Thanks.
  
  Reply
Abdul-Aziz January 6, 2020 at 3:39 am #

I really discover how fascinating Machine learning is , After reading your blog.
Thanks for this amazing introduction.

Reply
- Jason Brownlee January 6, 2020 at 7:13 am #
  
  You’re welcome!
  
  Reply
Sumit Das April 22, 2020 at 3:33 pm #

Great Post

Reply
- Jason Brownlee April 23, 2020 at 5:56 am #
  
  Thank you!
  
  Reply
Prabuddha Dissanyakae May 11, 2020 at 11:21 am #

Great stuff, I just wonder your thinking towards the others, its amazing.

Reply
- Jason Brownlee May 11, 2020 at 1:36 pm #
  
  Thanks.
  
  Reply
Al_Hasib November 9, 2020 at 4:01 pm #

Well write Brother…

Reply
- Jason Brownlee November 10, 2020 at 6:36 am #
  
  Thanks!
  
  Reply
Fernando Augusto Deheza Zambrana February 26, 2021 at 1:43 pm #

Las definiciones, como aproximaciones relativas, ofrecen puntos de vista diversos que constituyen información importante para el propósito de construir un modelo conceptual multidimensional. Gracias por compartir.

Reply
- Jason Brownlee February 27, 2021 at 5:59 am #
  
  Thanks for sharing.
  
  Reply
Edy Gianez Silva June 12, 2022 at 7:52 pm #

Hi, Jason! How are you?

It’s been two years since i decided to mess with data science and only now that i find your lessons i think that i find a way to go through that

So, probably you’ll see me quite a lot here 🙂

My definition of ML is:

“ML is when we use a set of resources (computational and theoric) to build a tool that help us to make decisions in face of complex problems”

I try to address the following concerns with this definition:

Complex problems
– the reason why we need a machine
– sets the boundaries of the learning space

Computational resources
– We basically (and roughly speaking) use computers to store and manipulate data. When trying to make a a computer learn, we elevate these fundamental things to the higher power
– store data ->big data -> distribuited storage, etc.
– manipulate data -> use of gpu, parallel processing, quantic processing, etc
– better algorithms to handle computations

Theoric resources
– Reveals the multidisciplinarity of the field: math, stats, information theory, software eng, all guys comes to help us to learn from data

Build a tool
– Machine learning results in a model
– A model is made by a representation of the problem we’re handling and that will be evaluate and optimize (i’m thinking in Palacio’s framework), once it’s an approximate picture of the real problem-world
– We can use severel techniques to figure out what is the hidden structure that lays behind the data set (our problem-world distillied in numbers and texts and images, etc).
– We can call these techniques learning algorithms, objective functions and optimization functions

Make a decision
– the goal of all this is to infer a response that guide us thorugh a better choice when a new and complex situation is presented to us
– our model will help us to make that generalization from a known situation to this new one

Sorry to take so much space,

Best regards

Edy

Reply
- James Carmichael June 13, 2022 at 11:23 am #
  
  Hi Edy…please narrow your query to a single question so that we may better assist you.
  
  Reply
  - Edy June 28, 2022 at 4:20 am #
    
    Hi, James! How are you?
    
    I was thinking about this definition of ML:
    
    “ML is when we use a set of resources (computational and theoric) to build a tool that help us to make decisions in face of complex problems”
    
    Does it make sense?
    
    Thanks
    
    Reply
    - James Carmichael June 28, 2022 at 12:49 pm #
      
      Hi Edy…That is a fantastic definition! The following is a great starting point for your machine learning journey:
      
      https://machinelearningmastery.com/start-here/
      
      Reply
Edy Gianez Silva June 13, 2022 at 4:13 pm #

Hello,

So, i was wondering with this definition on ML is ok:

“ML is when we use a set of resources (computational and theoric) to build a tool that help us to make decisions in face of complex problems”

Thanks

Reply

Navigation