Archive | Statistics

A Gentle Introduction to the Bootstrap Method

A Gentle Introduction to the Bootstrap Method

The bootstrap method is a resampling technique used to estimate statistics on a population by sampling a dataset with replacement. It can be used to estimate summary statistics such as the mean or standard deviation. It is used in applied machine learning to estimate the skill of machine learning models when making predictions on data […]

Continue Reading
A Gentle Introduction to k-fold Cross-Validation

A Gentle Introduction to k-fold Cross-Validation

Cross-validation is a statistical method used to estimate the skill of machine learning models. It is commonly used in applied machine learning to compare and select a model for a given predictive modeling problem because it is easy to understand, easy to implement, and results in skill estimates that generally have a lower bias than […]

Continue Reading
Introduction to Nonparametric Statistical Significance Tests in Python

How to Calculate Nonparametric Statistical Hypothesis Tests in Python

In applied machine learning, we often need to determine whether two data samples have the same or different distributions. We can answer this question using statistical significance tests that can quantify the likelihood that the samples have the same distribution. If the data does not have the familiar Gaussian distribution, we must resort to nonparametric […]

Continue Reading
A Gentle Introduction to Normality Tests in Python

A Gentle Introduction to Normality Tests in Python

An important decision point when working with a sample of data is whether to use parametric or nonparametric statistical methods. Parametric statistical methods assume that the data has a known and specific distribution, often a Gaussian distribution. If a data sample is not Gaussian, then the assumptions of parametric statistical tests are violated and nonparametric […]

Continue Reading
Statistics Books for Machine Learning

Statistics Books for Machine Learning

Statistical methods are used at each step in an applied machine learning project. This means it is important to have a strong grasp of the fundamentals of the key findings from statistics and a working knowledge of relevant statistical methods. Unfortunately, statistics is not covered in many computer science and software engineering degree programs. Even […]

Continue Reading