Resources for Getting Started With Probability in Machine Learning

Last Updated on September 25, 2019

Machine Learning is a field of computer science concerned with developing systems that can learn from data.

Like statistics and linear algebra, probability is another foundational field that supports machine learning. Probability is a field of mathematics concerned with quantifying uncertainty.

Many aspects of machine learning are uncertain, including, most critically, observations from the problem domain and the relationships learned by models from that data. As such, some understanding of probability and tools and methods used in the field are required by a machine learning practitioner to be effective. Perhaps not initially, but certainly in the long run.

In this post, you will discover some of the key resources that you can use to learn about the parts of probability required for machine learning.

After reading this post, you will know:

  • References that you can use to discover topics on probability.
  • Books, chapters, and sections that cover probability in the context of machine learning.
  • A division between foundational probability topics and machine learning methods that leverage probability.

Kick-start your project with my new book Probability for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Resources for Getting Started With Probability in Machine Learning

Resources for Getting Started With Probability in Machine Learning
Photo by dragonseye, some rights reserved.


This tutorial is divided into three parts; they are:

  1. Topics in Probability for Machine Learning
  2. Probability Covered in Machine Learning Books
  3. Foundation Probability vs. Machine Learning With Probability

Topics in Probability for Machine Learning

Probability is a large field of mathematics with many fascinating findings and useful tools.

Although much of the field probability may be interesting to a machine learning practitioner, not all of it is directly relevant. Therefore, it is important to narrow the scope of the field of probability to the aspects that can directly help a practitioner.

One approach might be to review the topics in probability and select those that might be helpful or relevant.

Wikipedia has many good overview articles on the field that could be used as a starting point. For example:

Probability Wikipedia Articles

Another source of topics might be those covered by top textbooks on probability written for advanced undergraduates and graduate students.

For example:

Probability Textbooks

This is a good start but challenging, as how can the wealth of interesting topics be effectively filtered to those most relevant to applied machine learning.

The risk of this approach is that too much time would be spent learning probability and developing too broad a foundation in the field (e.g. inefficient).

An approach that I prefer is to review the coverage of the field of probability by top machine learning books.

The authors of these books are both experts in the field of machine learning and have used this expertise to filter the field of probability to the points most salient to the field of machine learning.

Want to Learn Probability for Machine Learning

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Probability Covered in Machine Learning Books

There are many excellent machine learning textbooks, but in this post, we will review some of the more popular books that you may own or have access to and can reference the relevant sections. They are:

Let’s take a closer look at each in turn.

“Machine Learning”

Machine Learning” is Tom Mitchell’s seminal 1997 book that defined the field for many practitioners and books that followed.

Probability is the focus of the following chapters of this book:

  • Chapter 6: Bayesian Learning

This chapter is dedicated to Bayesian methods relevant to machine learning, including:

  • Section 6.2. Bayes Theorem
  • Section 6.3. Bayes Theorem and Concept Learning
  • Section 6.4. Maximum Likelihood and Least-squares Error Hypothesis
  • Section 6.5. Maximum Likelihood Hypothesis for Predicting Probabilities
  • Section 6.6. Minimum Description Length Principle
  • Section 6.7. Bayes Optimal Classifier
  • Section 6.8. Gibbs Algorithm
  • Section 6.9. Naive Bayes Classifier
  • Section 6.10. An Example: Learning to Classify Text
  • Section 6.11. Bayesian Belief Networks
  • Section 6.12. The EM Algorithm

“Pattern Recognition and Machine Learning”

Pattern Recognition and Machine Learning” is Christopher Bishop’s masterpiece book on machine learning, building on and broadening his prior book, Neural Networks for Pattern Recognition.

It is very likely the book used by many modern practitioners that came out of a graduate degree program on machine learning.

Probability is the focus of the following chapters of this book:

  • Chapter 1: Introduction
    • Section 1.2. Probability Theory
    • Section 1.5. Decision Theory
    • Section 1.6. Information Theory
  • Chapter 2: Probability Distributions

The second chapter is dedicated to the topic and focuses on probability distributions and sets up density estimation, covering the following topics:

  • 2.1. Binary Variables
  • 2.2. Multinomial Variables
  • 2.3. Gaussian Distribution
  • 2.4. The Exponential Family
  • 2.5. Nonparametric Methods

“Data Mining: Practical Machine Learning Tools and Techniques”

Data Mining: Practical Machine Learning Tools and Techniques” by Witten and Frank (and others) has had many editions, and because of its practical nature and the Weka platform, has been many practitioners entry point into the field.

Probability is the focus of the following Chapters of this book:

  • Section 4.2: Simple Probabilistic Modeling
  • Chapter 9: Probabilistic Methods

Section 4.2 provides an introduction, but Chapter 9 goes into depth and covers the following topics:

  • 9.1. Foundations
  • 9.2. Bayesian Networks
  • 9.3. Clustering and Point Density Estimation
  • 9.4. Hidden Variable Models
  • 9.5. Bayesian Estimation and Prediction
  • 9.6. Graphical Models and Factor Graphs
  • 9.7. Conditional Probability Models
  • 9.8. Sequential and Temporal Models
  • 9.9. Further Reading and Bibliographic Notes
  • 9.10. Weka Implementations

“Machine Learning: A Probabilistic Perspective”

Machine Learning: A Probabilistic Perspective” by Kevin Murphy from 2013 is a textbook that focuses on teaching machine learning through the lens of probability.

Probability was the focus of the following chapters of this book:

  • Chapter 2: Probability
  • Chapter 5: Bayesian Statistics
  • Chapter 6: Frequentist Statistics

Chapters 5 and 6 really focus on machine learning methods that build on Bayesian and Frequentist methods, e.g. a focus on distribution estimation.

Chapter 2 is more focused on the foundations in probability required, including the subsections:

  • Section 2.1. Introduction
  • Section 2.2. A brief review of probability theory
  • Section 2.3. Some common discrete distributions
  • Section 2.4. Some common continuous distributions
  • Section 2.5. Joint probability distributions
  • Section 2.6. Transforms of random variables
  • Section 2.7. Monte Carlo approximation
  • Section 2.8. Information theory.

“Deep Learning”

Deep Learning” is Ian Goodfellow, et al’s 2016 seminal textbook on the emerging field of deep learning.

Part I of this book is titled “Applied Math and Machine Learning Basics” and covers a range of important foundation topics required to become productive with deep learning neural networks, including probability.

Probability is the focus of the following chapters of this book:

  • Chapter 3: Probability and Information Theory

This chapter is divided into the following subsections:

  • 3.1. Why Probability?
  • 3.2. Random Variables
  • 3.3. Probability Distributions
  • 3.4. Marginal Probability
  • 3.5. Conditional Probability
  • 3.6. The Chain Rule of Conditional Probabilities
  • 3.7. Independence and Conditional Dependence
  • 3.8. Expectation, Variance and Covariance
  • 3.9. Common Probability Distributions
  • 3.10. Useful Properties of Common Functions
  • 3.11. Bayes’ Rule
  • 3.12. Technical Details of Continuous Variables
  • 3.13. Information Theory
  • 3.14. Structured Probabilistic Models

Foundation Probability vs. Machine Learning with Probability

Reviewing the chapters and sections covered in the top machine learning books, it is clear that there are two main aspects to probability in machine learning.

There are the foundational topics that a practitioner should be familiar with in order to be effective at machine learning generally. We might call this “probability theory for machine learning.”

Then there are machine learning methods that are explicitly constructed from tools and techniques from the field of probability. We might call this “probabilistic methods for machine learning.”

It is not a clear division as there is a lot of overlap, but it is a good basis for a division.

Foundation Probability Topics

These are the topics covered in books like “Deep Learning.” They are also the basis for cheat sheets and refreshers for machine learning courses like the “Probabilities and Statistics refresher” from Stanford.

Some of the topics in probability theory for machine learning might include: probability axioms, probability distributions, probability moments, Bayes theorem, joint, marginal and conditional probability, etc.

This might also include more advanced and related topics such as: likelihood functions, maximum likelihood estimation, entropy from information theory, Monte Carlo and Gibbs Sampling for distributions, and parameter estimation.

Machine Learning With Probability

These are the topics covered in the later chapters of “Machine Learning: A Probabilistic Perspective.”

Some topics in probabilistic methods for machine learning might include: density estimation, kernel density estimation, divergence estimation, etc.

This would also include techniques such as Naive Bayes and graphical models such as Bayesian belief networks.

What do you think?
What topics would you place on either side of this split?

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Machine Learning Books

Probability Textbooks



In this post, you discovered some of the key resources that you can use to learn about the parts of probability required for machine learning

Specifically, you learned:

  • References that you can use to discover topics on probability.
  • Books, chapters, and sections that cover probability in the context of machine learning.
  • A division between foundational probability topics and machine learning methods that leverage probability.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Get a Handle on Probability for Machine Learning!

Probability for Machine Learning

Develop Your Understanding of Probability

...with just a few lines of python code

Discover how in my new Ebook:
Probability for Machine Learning

It provides self-study tutorials and end-to-end projects on:
Bayes Theorem, Bayesian Optimization, Distributions, Maximum Likelihood, Cross-Entropy, Calibrating Models
and much more...

Finally Harness Uncertainty in Your Projects

Skip the Academics. Just Results.

See What's Inside

16 Responses to Resources for Getting Started With Probability in Machine Learning

  1. John Candy September 9, 2019 at 7:22 pm #

    Nice Article.
    This post is really very informative and knowledgeable.
    Thanks for sharing this amazing post.
    I am glad to have found your fantastic blog.
    Keep Blogging !!

  2. Matthew Teow September 9, 2019 at 10:49 pm #

    Would you love to write a book on probability?

    • Jason Brownlee September 10, 2019 at 5:47 am #

      Thanks, I am in fact!

      Are there specific topics that you’d love for me to cover?

      • Matthew Teow September 10, 2019 at 10:38 pm #

        Great! What about Bayesian statistics?

  3. nick ursa September 13, 2019 at 6:45 am #

    I’d like to add Betancourt’s:

  4. equismas September 13, 2019 at 7:15 pm #

    This is very resourceful. Thank you.

  5. Amine September 15, 2019 at 2:33 am #

    Thanks for this information it was well structured ! I only wished that you included some free references. Anyway, thanks a lot !

    • Jason Brownlee September 15, 2019 at 6:25 am #


      I hope to offer 30+ blog tutorials on the topic over the next few months.

  6. Ikkaro September 17, 2019 at 12:35 am #

    Thanks a lot for the information. I see, unlike many people, you insist on knowing probability.

    • Jason Brownlee September 17, 2019 at 6:30 am #

      I don’t insist, but I recommend it at some point.

  7. Ken Popkin September 24, 2019 at 6:30 am #

    Just read three of your post on probability, Jason. I’m looking forward to the upcoming posts, which sound like the start of another book – which I’ll probably buy.

    • Jason Brownlee September 24, 2019 at 7:55 am #

      Thanks Ken! I appreciate your support.

      I’m really excited about it. There are some great/fun tutorials on info theory that I think are really important, and same with some great stuff on bayes theorem, maximum likelihood and more.

      It’s a topic that is critical, but does not get enough attention.

      Eager to hear what everyone things about the new book!

Leave a Reply