Author Archive | Jason Brownlee

thomas-lipke-oIuDXlOJSiE-unsplash

How to Generate Random Numbers in Python

The use of randomness is an important part of the configuration and evaluation of machine learning algorithms. From the random initialization of weights in an artificial neural network, to the splitting of data into random train and test sets, to the random shuffling of a training dataset in stochastic gradient descent, generating random numbers and […]

Continue Reading
Statistics for Evaluating Machine Learning Models

Statistics for Evaluating Machine Learning Models

Tom Mitchell’s classic 1997 book “Machine Learning” provides a chapter dedicated to statistical methods for evaluating machine learning models. Statistics provides an important set of tools used at each step of a machine learning project. A practitioner cannot effectively evaluate the skill of a machine learning model without using statistical methods. Unfortunately, statistics is an […]

Continue Reading
The Close Relationship Between Applied Statistics and Machine Learning

The Close Relationship Between Applied Statistics and Machine Learning

The machine learning practitioner has a tradition of algorithms and a pragmatic focus on results and model skill above other concerns such as model interpretability. Statisticians work on much the same type of modeling problems under the names of applied statistics and statistical learning. Coming from a mathematical background, they have more of a focus […]

Continue Reading
Controlled Experiments in Machine Learning

Controlled Experiments in Machine Learning

Systematic experimentation is a key part of applied machine learning. Given the complexity of machine learning methods, they resist formal analysis methods. Therefore, we must learn about the behavior of algorithms on our specific problems empirically. We do this using controlled experiments. In this tutorial, you will discover the important role that controlled experiments play […]

Continue Reading
Statistical Significance Tests for Comparing Machine Learning Algorithms

Statistical Significance Tests for Comparing Machine Learning Algorithms

Comparing machine learning methods and selecting a final model is a common operation in applied machine learning. Models are commonly evaluated using resampling methods like k-fold cross-validation from which mean skill scores are calculated and compared directly. Although simple, this approach can be misleading as it is hard to know whether the difference between mean […]

Continue Reading
A Gentle Introduction to Statistical Sampling and Resampling

A Gentle Introduction to Statistical Sampling and Resampling

Data is the currency of applied machine learning. Therefore, it is important that it is both collected and used effectively. Data sampling refers to statistical methods for selecting observations from the domain with the objective of estimating a population parameter. Whereas data resampling refers to methods for economically using a collected dataset to improve the […]

Continue Reading