The Role of Randomization to Address Confounding Variables in Machine Learning

The Role of Randomization to Address Confounding Variables in Machine Learning

A large part of applied machine learning is about running controlled experiments to discover what algorithm or algorithm configuration to use on a predictive modeling problem. A challenge is that there are aspects of the problem and the algorithm called confounding variables that cannot be controlled (held constant) and must be controlled-for. An example is […]

Continue Reading 4
All of Statistics for Machine Learning

All of Statistics for Machine Learning

A foundation in statistics is required to be effective as a machine learning practitioner. The book “All of Statistics” was written specifically to provide a foundation in probability and statistics for computer science undergraduates that may have an interest in data mining and machine learning. As such, it is often recommended as a book to […]

Continue Reading 2
A Gentle Introduction to Effect Size Measures in Python

A Gentle Introduction to Effect Size Measures in Python

Statistical hypothesis tests report on the likelihood of the observed results given an assumption, such as no association between variables or no difference between groups. Hypothesis tests do not comment on the size of the effect if the association or difference is statistically significant. This highlights the need for standard ways of calculating and reporting […]

Continue Reading 2
How to Generate Random Numbers in Python

How to Generate Random Numbers in Python

The use of randomness is an important part of the configuration and evaluation of machine learning algorithms. From the random initialization of weights in an artificial neural network, to the splitting of data into random train and test sets, to the random shuffling of a training dataset in stochastic gradient descent, generating random numbers and […]

Continue Reading 6
Statistics for Evaluating Machine Learning Models

Statistics for Evaluating Machine Learning Models

Tom Mitchell’s classic 1997 book “Machine Learning” provides a chapter dedicated to statistical methods for evaluating machine learning models. Statistics provides an important set of tools used at each step of a machine learning project. A practitioner cannot effectively evaluate the skill of a machine learning model without using statistical methods. Unfortunately, statistics is an […]

Continue Reading 4
The Close Relationship Between Applied Statistics and Machine Learning

The Close Relationship Between Applied Statistics and Machine Learning

The machine learning practitioner has a tradition of algorithms and a pragmatic focus on results and model skill above other concerns such as model interpretability. Statisticians work on much the same type of modeling problems under the names of applied statistics and statistical learning. Coming from a mathematical background, they have more of a focus […]

Continue Reading 6