Author Archive | Jason Brownlee

A Gentle Introduction to Critical Values for Statistical Hypothesis Testing

How to Calculate Critical Values for Statistical Hypothesis Testing with Python

By Jason Brownlee on September 24, 2019 in Statistics 13

In is common, if not standard, to interpret the results of statistical hypothesis tests using a p-value. Not all implementations of statistical tests return p-values. In some cases, you must use alternatives, such as critical values. In addition, critical values are used when estimating the expected intervals for observations from a population, such as in […]

Line Plot of the Chi-Squared Probability Density Function

A Gentle Introduction to Statistical Data Distributions

By Jason Brownlee on August 8, 2019 in Statistics 12

A sample of data will form a distribution, and by far the most well-known distribution is the Gaussian distribution, often called the Normal distribution. The distribution provides a parameterized mathematical function that can be used to calculate the probability for any individual observation from the sample space. This distribution describes the grouping or the density […]

A Gentle Introduction to Data Visualization Methods in Python

By Jason Brownlee on August 23, 2019 in Statistics 18

Sometimes data does not make sense until you can look at in a visual form, such as with charts and plots. Being able to quickly visualize your data samples for yourself and others is an important skill both in applied statistics and in applied machine learning. In this tutorial, you will discover the five types […]

A Gentle Introduction to Estimation Statistics for Machine Learning

By Jason Brownlee on August 8, 2019 in Statistics 4

Statistical hypothesis tests can be used to indicate whether the difference between two samples is due to random chance, but cannot comment on the size of the difference. A group of methods referred to as “new statistics” are seeing increased use instead of or in addition to p-values in order to quantify the magnitude of […]

Error Bar Plot of Tolerance Interval vs Sample Size

A Gentle Introduction to Statistical Tolerance Intervals in Machine Learning

By Jason Brownlee on August 8, 2019 in Statistics 20

It can be useful to have an upper and lower limit on data. These bounds can be used to help identify anomalies and set expectations for what to expect. A bound on observations from a population is called a tolerance interval. A tolerance interval comes from the field of estimation statistics. A tolerance interval is […]

Scatter Plot of Dataset With Linear Model and Prediction Interval

Prediction Intervals for Machine Learning

By Jason Brownlee on February 17, 2021 in Statistics 70

A prediction from a machine learning perspective is a single point that hides the uncertainty of that prediction. Prediction intervals provide a way to quantify and communicate the uncertainty in a prediction. They are different from confidence intervals that instead seek to quantify the uncertainty in a population parameter such as a mean or standard […]

Confidence Intervals for Machine Learning

By Jason Brownlee on August 8, 2019 in Statistics 68

Much of machine learning involves estimating the performance of a machine learning algorithm on unseen data. Confidence intervals are a way of quantifying the uncertainty of an estimate. They can be used to add a bounds or likelihood on a population parameter, such as a mean, estimated from a sample of independent observations from the […]

A Gentle Introduction to the Bootstrap Method

By Jason Brownlee on August 8, 2019 in Statistics 106

The bootstrap method is a resampling technique used to estimate statistics on a population by sampling a dataset with replacement. It can be used to estimate summary statistics such as the mean or standard deviation. It is used in applied machine learning to estimate the skill of machine learning models when making predictions on data […]

A Gentle Introduction to k-fold Cross-Validation

By Jason Brownlee on October 4, 2023 in Statistics 301

Cross-validation is a statistical method used to estimate the skill of machine learning models. It is commonly used in applied machine learning to compare and select a model for a given predictive modeling problem because it is easy to understand, easy to implement, and results in skill estimates that generally have a lower bias than […]

How to Transform Data to Better Fit The Normal Distribution

By Jason Brownlee on August 8, 2019 in Statistics 62

A large portion of the field of statistics is concerned with methods that assume a Gaussian distribution: the familiar bell curve. If your data has a Gaussian distribution, the parametric methods are powerful and well understood. This gives some incentive to use them if possible. Even if your data does not have a Gaussian distribution. […]

← Previous 1 … 50 51 52 … 109 Next →