Machine Learning is a field of computer science concerned with developing systems that can learn from data.

Like statistics and linear algebra, probability is another foundational field that supports machine learning. Probability is a field of mathematics concerned with quantifying uncertainty.

Many aspects of machine learning are uncertain, including, most critically, observations from the problem domain and the relationships learned by models from that data. As such, some understanding of probability and tools and methods used in the field are required by a machine learning practitioner to be effective. Perhaps not initially, but certainly in the long run.

In this post, you will discover some of the key resources that you can use to learn about the parts of probability required for machine learning.

After reading this post, you will know:

- References that you can use to discover topics on probability.
- Books, chapters, and sections that cover probability in the context of machine learning.
- A division between foundational probability topics and machine learning methods that leverage probability.

Let’s get started.

## Overview

This tutorial is divided into three parts; they are:

- Topics in Probability for Machine Learning
- Probability Covered in Machine Learning Books
- Foundation Probability vs. Machine Learning With Probability

## Topics in Probability for Machine Learning

Probability is a large field of mathematics with many fascinating findings and useful tools.

Although much of the field probability may be interesting to a machine learning practitioner, not all of it is directly relevant. Therefore, it is important to narrow the scope of the field of probability to the aspects that can directly help a practitioner.

One approach might be to review the topics in probability and select those that might be helpful or relevant.

Wikipedia has many good overview articles on the field that could be used as a starting point. For example:

### Probability Wikipedia Articles

- Probability, Wikipedia.
- Probability theory, Wikipedia.
- List of probability topics, Wikipedia.
- Catalog of articles in probability theory, Wikipedia.
- Notation in probability and statistics, Wikipedia.

Another source of topics might be those covered by top textbooks on probability written for advanced undergraduates and graduate students.

For example:

### Probability Textbooks

- Probability Theory: The Logic of Science, 2003.
- Introduction to Probability, Second Edition, 2019.
- Introduction to Probability, Second Edition, 2008.

This is a good start but challenging, as how can the wealth of interesting topics be effectively filtered to those most relevant to applied machine learning.

The risk of this approach is that too much time would be spent learning probability and developing too broad a foundation in the field (e.g. inefficient).

An approach that I prefer is to review the coverage of the field of probability by top machine learning books.

The authors of these books are both experts in the field of machine learning and have used this expertise to filter the field of probability to the points most salient to the field of machine learning.

## Probability Covered in Machine Learning Books

There are many excellent machine learning textbooks, but in this post, we will review some of the more popular books that you may own or have access to and can reference the relevant sections. They are:

- Machine Learning, 1997.
- Pattern Recognition and Machine Learning, 2006.
- Data Mining: Practical Machine Learning Tools and Techniques, 4th edition, 2016.
- Machine Learning: A Probabilistic Perspective, 2012.
- Deep Learning, 2016.

Let’s take a closer look at each in turn.

### “Machine Learning”

“Machine Learning” is Tom Mitchell’s seminal 1997 book that defined the field for many practitioners and books that followed.

Probability is the focus of the following chapters of this book:

- Chapter 6: Bayesian Learning

This chapter is dedicated to Bayesian methods relevant to machine learning, including:

- Section 6.2. Bayes Theorem
- Section 6.3. Bayes Theorem and Concept Learning
- Section 6.4. Maximum Likelihood and Least-squares Error Hypothesis
- Section 6.5. Maximum Likelihood Hypothesis for Predicting Probabilities
- Section 6.6. Minimum Description Length Principle
- Section 6.7. Bayes Optimal Classifier
- Section 6.8. Gibbs Algorithm
- Section 6.9. Naive Bayes Classifier
- Section 6.10. An Example: Learning to Classify Text
- Section 6.11. Bayesian Belief Networks
- Section 6.12. The EM Algorithm

### “Pattern Recognition and Machine Learning”

“Pattern Recognition and Machine Learning” is Christopher Bishop’s masterpiece book on machine learning, building on and broadening his prior book, Neural Networks for Pattern Recognition.

It is very likely the book used by many modern practitioners that came out of a graduate degree program on machine learning.

Probability is the focus of the following chapters of this book:

- Chapter 1: Introduction
- Section 1.2. Probability Theory
- Section 1.5. Decision Theory
- Section 1.6. Information Theory

- Chapter 2: Probability Distributions

The second chapter is dedicated to the topic and focuses on probability distributions and sets up density estimation, covering the following topics:

- 2.1. Binary Variables
- 2.2. Multinomial Variables
- 2.3. Gaussian Distribution
- 2.4. The Exponential Family
- 2.5. Nonparametric Methods

### “Data Mining: Practical Machine Learning Tools and Techniques”

“Data Mining: Practical Machine Learning Tools and Techniques” by Witten and Frank (and others) has had many editions, and because of its practical nature and the Weka platform, has been many practitioners entry point into the field.

Probability is the focus of the following Chapters of this book:

- Section 4.2: Simple Probabilistic Modeling
- Chapter 9: Probabilistic Methods

Section 4.2 provides an introduction, but Chapter 9 goes into depth and covers the following topics:

- 9.1. Foundations
- 9.2. Bayesian Networks
- 9.3. Clustering and Point Density Estimation
- 9.4. Hidden Variable Models
- 9.5. Bayesian Estimation and Prediction
- 9.6. Graphical Models and Factor Graphs
- 9.7. Conditional Probability Models
- 9.8. Sequential and Temporal Models
- 9.9. Further Reading and Bibliographic Notes
- 9.10. Weka Implementations

### “Machine Learning: A Probabilistic Perspective”

“Machine Learning: A Probabilistic Perspective” by Kevin Murphy from 2013 is a textbook that focuses on teaching machine learning through the lens of probability.

Probability was the focus of the following chapters of this book:

- Chapter 2: Probability
- Chapter 5: Bayesian Statistics
- Chapter 6: Frequentist Statistics

Chapters 5 and 6 really focus on machine learning methods that build on Bayesian and Frequentist methods, e.g. a focus on distribution estimation.

Chapter 2 is more focused on the foundations in probability required, including the subsections:

- Section 2.1. Introduction
- Section 2.2. A brief review of probability theory
- Section 2.3. Some common discrete distributions
- Section 2.4. Some common continuous distributions
- Section 2.5. Joint probability distributions
- Section 2.6. Transforms of random variables
- Section 2.7. Monte Carlo approximation
- Section 2.8. Information theory.

### “Deep Learning”

“Deep Learning” is Ian Goodfellow, et al’s 2016 seminal textbook on the emerging field of deep learning.

Part I of this book is titled “*Applied Math and Machine Learning Basics*” and covers a range of important foundation topics required to become productive with deep learning neural networks, including probability.

Probability is the focus of the following chapters of this book:

- Chapter 3: Probability and Information Theory

This chapter is divided into the following subsections:

- 3.1. Why Probability?
- 3.2. Random Variables
- 3.3. Probability Distributions
- 3.4. Marginal Probability
- 3.5. Conditional Probability
- 3.6. The Chain Rule of Conditional Probabilities
- 3.7. Independence and Conditional Dependence
- 3.8. Expectation, Variance and Covariance
- 3.9. Common Probability Distributions
- 3.10. Useful Properties of Common Functions
- 3.11. Bayes’ Rule
- 3.12. Technical Details of Continuous Variables
- 3.13. Information Theory
- 3.14. Structured Probabilistic Models

## Foundation Probability vs. Machine Learning with Probability

Reviewing the chapters and sections covered in the top machine learning books, it is clear that there are two main aspects to probability in machine learning.

There are the foundational topics that a practitioner should be familiar with in order to be effective at machine learning generally. We might call this “*probability theory for machine learning*.”

Then there are machine learning methods that are explicitly constructed from tools and techniques from the field of probability. We might call this “*probabilistic methods for machine learning*.”

It is not a clear division as there is a lot of overlap, but it is a good basis for a division.

### Foundation Probability Topics

These are the topics covered in books like “*Deep Learning*.” They are also the basis for cheat sheets and refreshers for machine learning courses like the “Probabilities and Statistics refresher” from Stanford.

Some of the topics in probability theory for machine learning might include: probability axioms, probability distributions, probability moments, Bayes theorem, joint, marginal and conditional probability, etc.

This might also include more advanced and related topics such as: likelihood functions, maximum likelihood estimation, entropy from information theory, Monte Carlo and Gibbs Sampling for distributions, and parameter estimation.

### Machine Learning With Probability

These are the topics covered in the later chapters of “*Machine Learning: A Probabilistic Perspective*.”

Some topics in probabilistic methods for machine learning might include: density estimation, kernel density estimation, divergence estimation, etc.

This would also include techniques such as Naive Bayes and graphical models such as Bayesian belief networks.

**What do you think?**

What topics would you place on either side of this split?

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

### Machine Learning Books

- Machine Learning, 1997.
- Pattern Recognition and Machine Learning, 2006.
- Data Mining: Practical Machine Learning Tools and Techniques, 4th edition, 2016.
- Machine Learning: A Probabilistic Perspective, 2012.
- Deep Learning, 2016.

### Probability Textbooks

- Probability Theory: The Logic of Science, 2003
- Introduction to Probability, 2nd edition, 2019.
- Introduction to Probability, 2nd edition, 2008.

### Articles

- Probability, Wikipedia.
- Probability theory, Wikipedia.
- List of probability topics, Wikipedia.
- Catalog of articles in probability theory, Wikipedia.
- Notation in probability and statistics, Wikipedia.
- Probabilities and Statistics refresher, Stanford.

## Summary

In this post, you discovered some of the key resources that you can use to learn about the parts of probability required for machine learning

Specifically, you learned:

- References that you can use to discover topics on probability.
- Books, chapters, and sections that cover probability in the context of machine learning.
- A division between foundational probability topics and machine learning methods that leverage probability.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

Nice Article.

This post is really very informative and knowledgeable.

Thanks for sharing this amazing post.

I am glad to have found your fantastic blog.

Keep Blogging !!

Thanks for your support John!

Would you love to write a book on probability?

Thanks, I am in fact!

Are there specific topics that you’d love for me to cover?

Great! What about Bayesian statistics?

Great suggestion!

I think it might be a whole different topic.

Nevertheless, resources on the topic I like:

– Probabilistic Graphical Models: Principles and Techniques https://amzn.to/324l0tT

– Bayesian Reasoning and Machine Learning https://amzn.to/2YoHbgV

– Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference https://amzn.to/2Khk3bq

I’d like to add Betancourt’s: https://betanalpha.github.io/assets/case_studies/probability_theory.html

Thanks for sharing!

This is very resourceful. Thank you.

Thanks, I’m glad it’s helpful.

Thanks for this information it was well structured ! I only wished that you included some free references. Anyway, thanks a lot !

Thanks.

I hope to offer 30+ blog tutorials on the topic over the next few months.

Thanks a lot for the information. I see, unlike many people, you insist on knowing probability.

I don’t insist, but I recommend it at some point.