Archive | Statistics

Box and Whisker Plot of Classification Accuracy Scores for Two Algorithms

Hypothesis Test for Comparing Machine Learning Algorithms

By Jason Brownlee on September 1, 2020 in Statistics 43

Machine learning models are chosen based on their mean performance, often calculated using k-fold cross-validation. The algorithm with the best mean performance is expected to be better than those algorithms with worse mean performance. But what if the difference in the mean performance is caused by a statistical fluke? The solution is to use a […]

A Gentle Introduction to Degrees of Freedom in Machine Learning

By Jason Brownlee on August 19, 2020 in Statistics 14

Degrees of freedom is an important concept from statistics and engineering. It is often employed to summarize the number of values used in the calculation of a statistic, such as a sample statistic or in a statistical hypothesis test. In machine learning, the degrees of freedom may refer to the number of parameters in the […]

Arithmetic, Geometric, and Harmonic Means for Machine Learning

By Jason Brownlee on August 19, 2020 in Statistics 27

Calculating the average of a variable or a list of numbers is a common operation in machine learning. It is an operation you may use every day either directly, such as when summarizing data, or indirectly, such as a smaller step in a larger procedure when fitting a model. The average is a synonym for […]

17 Statistical Hypothesis Tests in Python (Cheat Sheet)

By Jason Brownlee on November 7, 2021 in Statistics 99

Quick-reference guide to the 17 statistical hypothesis tests that you need in applied machine learning, with sample code in Python. Although there are hundreds of statistical hypothesis tests that you could use, there is only a small subset that you may need to use in a machine learning project. In this post, you will discover […]

Statistics for Machine Learning (7-Day Mini-Course)

By Jason Brownlee on August 8, 2019 in Statistics 328

Statistics for Machine Learning Crash Course. Get on top of the statistics used in machine learning in 7 Days. Statistics is a field of mathematics that is universally agreed to be a prerequisite for a deeper understanding of machine learning. Although statistics is a large field with many esoteric theories and findings, the nuts and […]

How to Code the Student’s t-Test from Scratch in Python

By Jason Brownlee on August 8, 2019 in Statistics 52

Perhaps one of the most widely used statistical hypothesis tests is the Student’s t test. Because you may use this test yourself someday, it is important to have a deep understanding of how the test works. As a developer, this understanding is best achieved by implementing the hypothesis test yourself from scratch. In this tutorial, […]

How to Calculate McNemar's Test for Two Machine Learning Classifiers

How to Calculate McNemar’s Test to Compare Two Machine Learning Classifiers

By Jason Brownlee on August 8, 2019 in Statistics 99

The choice of a statistical hypothesis test is a challenging open problem for interpreting machine learning results. In his widely cited 1998 paper, Thomas Dietterich recommended the McNemar’s test in those cases where it is expensive or impractical to train multiple copies of classifier models. This describes the current situation with deep learning models that […]

The Role of Randomization to Address Confounding Variables in Machine Learning

By Jason Brownlee on July 31, 2020 in Statistics 8

A large part of applied machine learning is about running controlled experiments to discover what algorithm or algorithm configuration to use on a predictive modeling problem. A challenge is that there are aspects of the problem and the algorithm called confounding variables that cannot be controlled (held constant) and must be controlled-for. An example is […]

All of Statistics for Machine Learning

By Jason Brownlee on August 8, 2019 in Statistics 12

A foundation in statistics is required to be effective as a machine learning practitioner. The book “All of Statistics” was written specifically to provide a foundation in probability and statistics for computer science undergraduates that may have an interest in data mining and machine learning. As such, it is often recommended as a book to […]

A Gentle Introduction to Statistical Power and Power Analysis in Python

By Jason Brownlee on April 24, 2020 in Statistics 70

The statistical power of a hypothesis test is the probability of detecting an effect, if there is a true effect present to detect. Power can be calculated and reported for a completed experiment to comment on the confidence one might have in the conclusions drawn from the results of the study. It can also be […]

1 2 … 5 Next →