Archive | Statistics

A Gentle Introduction to Statistical Sampling and Resampling

A Gentle Introduction to Statistical Sampling and Resampling

Data is the currency of applied machine learning. Therefore, it is important that it is both collected and used effectively. Data sampling refers to statistical methods for selecting observations from the domain with the objective of estimating a population parameter. Whereas data resampling refers to methods for economically using a collected dataset to improve the […]

Continue Reading 8
A Gentle Introduction to Critical Values for Statistical Hypothesis Testing

How to Calculate Critical Values for Statistical Hypothesis Testing with Python

In is common, if not standard, to interpret the results of statistical hypothesis tests using a p-value. Not all implementations of statistical tests return p-values. In some cases, you must use alternatives, such as critical values. In addition, critical values are used when estimating the expected intervals for observations from a population, such as in […]

Continue Reading 13
Line Plot of the Chi-Squared Probability Density Function

A Gentle Introduction to Statistical Data Distributions

A sample of data will form a distribution, and by far the most well-known distribution is the Gaussian distribution, often called the Normal distribution. The distribution provides a parameterized mathematical function that can be used to calculate the probability for any individual observation from the sample space. This distribution describes the grouping or the density […]

Continue Reading 8
Scatter Plot of Dataset With Linear Model and Prediction Interval

Prediction Intervals for Machine Learning

A prediction from a machine learning perspective is a single point that hides the uncertainty of that prediction. Prediction intervals provide a way to quantify and communicate the uncertainty in a prediction. They are different from confidence intervals that instead seek to quantify the uncertainty in a population parameter such as a mean or standard […]

Continue Reading 48
Confidence Intervals for Machine Learning

Confidence Intervals for Machine Learning

Much of machine learning involves estimating the performance of a machine learning algorithm on unseen data. Confidence intervals are a way of quantifying the uncertainty of an estimate. They can be used to add a bounds or likelihood on a population parameter, such as a mean, estimated from a sample of independent observations from the […]

Continue Reading 32
A Gentle Introduction to the Bootstrap Method

A Gentle Introduction to the Bootstrap Method

The bootstrap method is a resampling technique used to estimate statistics on a population by sampling a dataset with replacement. It can be used to estimate summary statistics such as the mean or standard deviation. It is used in applied machine learning to estimate the skill of machine learning models when making predictions on data […]

Continue Reading 70