Search results for "Model Risk"

Nested Cross-Validation for Machine Learning with Python

By Jason Brownlee on November 20, 2021 in Python Machine Learning 160

The k-fold cross-validation procedure is used to estimate the performance of machine learning models when making predictions on data not used during training. This procedure can be used both when optimizing the hyperparameters of a model on a dataset, and when comparing and selecting a model for the dataset. When the same cross-validation procedure and […]

How to Create Custom Data Transforms for Scikit-Learn

By Jason Brownlee on July 19, 2020 in Data Preparation 24

The scikit-learn Python library for machine learning offers a suite of data transforms for changing the scale and distribution of input data, as well as removing input features (columns). There are many simple data cleaning operations, such as removing outliers and removing columns with few observations, that are often performed manually to the data, requiring […]

Feature Engineering for Machine Learning

8 Top Books on Data Cleaning and Feature Engineering

By Jason Brownlee on June 30, 2020 in Data Preparation 21

Data preparation is the transformation of raw data into a form that is more appropriate for modeling. It is a challenging topic to discuss as the data differs in form, type, and structure from project to project. Nevertheless, there are common data preparation tasks across projects. It is a huge field of study and goes […]

Feature Engineering and Selection (Book Review)

By Jason Brownlee on June 30, 2020 in Data Preparation 20

Data preparation is the process of transforming raw data into learning algorithms. In some cases, data preparation is a required step in order to provide the data to an algorithm in its required input format. In other cases, the most appropriate representation of the input data is not known and must be explored in a […]

A Gentle Introduction to Degrees of Freedom in Machine Learning

By Jason Brownlee on August 19, 2020 in Statistics 14

Degrees of freedom is an important concept from statistics and engineering. It is often employed to summarize the number of values used in the calculation of a statistic, such as a sample statistic or in a statistical hypothesis test. In machine learning, the degrees of freedom may refer to the number of parameters in the […]

Standard Machine Learning Datasets for Imbalanced Classification

By Jason Brownlee on January 5, 2021 in Imbalanced Classification 14

An imbalanced classification problem is a problem that involves predicting a class label where the distribution of class labels in the training dataset is skewed. Many real-world classification problems have an imbalanced class distribution, therefore it is important for machine learning practitioners to get familiar with working with these types of problems. In this tutorial, […]

A Gentle Introduction to Markov Chain Monte Carlo for Probability

By Jason Brownlee on September 25, 2019 in Probability 15

Probabilistic inference involves estimating an expected value or density using a probabilistic model. Often, directly inferring values is not tractable with probabilistic models, and instead, approximation methods must be used. Markov Chain Monte Carlo sampling provides a class of algorithms for systematic random sampling from high-dimensional probability distributions. Unlike Monte Carlo sampling methods that are […]

Probability for Machine Learning

By Jason Brownlee on September 8, 2023 in

Probability for Machine Learning Discover How To Harness Uncertainty With Python Machine Learning DOES NOT MAKE SENSE Without Probability What is Probability? …it’s about handling uncertainty Uncertainty involves making decisions with incomplete information, and this is the way we generally operate in the world. Handling uncertainty is typically described using everyday words like chance, luck, […]

What Is Probability?

By Jason Brownlee on November 29, 2019 in Probability 4

Uncertainty involves making decisions with incomplete information, and this is the way we generally operate in the world. Handling uncertainty is typically described using everyday words like chance, luck, and risk. Probability is a field of mathematics that gives us the language and tools to quantify the uncertainty of events and reason in a principled […]

Resources for Getting Started With Probability in Machine Learning

By Jason Brownlee on September 25, 2019 in Probability 16

Machine Learning is a field of computer science concerned with developing systems that can learn from data. Like statistics and linear algebra, probability is another foundational field that supports machine learning. Probability is a field of mathematics concerned with quantifying uncertainty. Many aspects of machine learning are uncertain, including, most critically, observations from the problem […]

← Previous 1 2 3 4 … 6 Next →