## Statistics for Evaluating Machine Learning Models

Tom Mitchell’s classic 1997 book “Machine Learning” provides a chapter dedicated to statistical methods for evaluating machine learning models. Statistics provides an important set of tools used at each step of a machine learning project. A practitioner cannot effectively evaluate the skill of a machine learning model without using statistical methods. Unfortunately, statistics is an […]

## What is Statistics (and why is it important in machine learning)?

Statistics is a collection of tools that you can use to get answers to important questions about data. You can use descriptive statistical methods to transform raw observations into information that you can understand and share. You can use inferential statistical methods to reason from small samples of data to whole domains. In this post, […]

## A Gentle Introduction to Estimation Statistics for Machine Learning

Statistical hypothesis tests can be used to indicate whether the difference between two samples is due to random chance, but cannot comment on the size of the difference. A group of methods referred to as “new statistics” are seeing increased use instead of or in addition to p-values in order to quantify the magnitude of […]

## Statistics Books for Machine Learning

Statistical methods are used at each step in an applied machine learning project. This means it is important to have a strong grasp of the fundamentals of the key findings from statistics and a working knowledge of relevant statistical methods. Unfortunately, statistics is not covered in many computer science and software engineering degree programs. Even […]

## Crash Course in Statistics for Machine Learning

You do not need to know statistics before you can start learning and applying machine learning. You can start today. Nevertheless, knowing some statistics can be very helpful to understand the language used in machine learning. Knowing some statistics will eventually be required when you want to start making strong claims about your results. In […]

## How to Create Custom Data Transforms for Scikit-Learn

The scikit-learn Python library for machine learning offers a suite of data transforms for changing the scale and distribution of input data, as well as removing input features (columns). There are many simple data cleaning operations, such as removing outliers and removing columns with few observations, that are often performed manually to the data, requiring […]

## Framework for Data Preparation Techniques in Machine Learning

There are a vast number of different types of data preparation techniques that could be used on a predictive modeling project. In some cases, the distribution of the data or the requirements of a machine learning model may suggest the data preparation needed, although this is rarely the case given the complexity and high-dimensionality of […]

## 4 Automatic Outlier Detection Algorithms in Python

The presence of outliers in a classification or regression dataset can result in a poor fit and lower predictive modeling performance. Identifying and removing outliers is challenging with simple statistical methods for most machine learning datasets given the large number of input variables. Instead, automatic outlier detection methods can be used in the modeling pipeline […]

## Data Preparation for Machine Learning (7-Day Mini-Course)

Data Preparation for Machine Learning Crash Course. Get on top of data preparation with Python in 7 days. Data preparation involves transforming raw data into a form that is more appropriate for modeling. Preparing data may be the most important part of a predictive modeling project and the most time-consuming, although it seems to be […]