The k-fold cross-validation procedure is used to estimate the performance of machine learning models when making predictions on data not used during training. This procedure can be used both when optimizing the hyperparameters of a model on a dataset, and when comparing and selecting a model for the dataset. When the same cross-validation procedure and […]
Search results for "Model Risk"
How to Create Custom Data Transforms for Scikit-Learn
The scikit-learn Python library for machine learning offers a suite of data transforms for changing the scale and distribution of input data, as well as removing input features (columns). There are many simple data cleaning operations, such as removing outliers and removing columns with few observations, that are often performed manually to the data, requiring […]
8 Top Books on Data Cleaning and Feature Engineering
Data preparation is the transformation of raw data into a form that is more appropriate for modeling. It is a challenging topic to discuss as the data differs in form, type, and structure from project to project. Nevertheless, there are common data preparation tasks across projects. It is a huge field of study and goes […]
Feature Engineering and Selection (Book Review)
Data preparation is the process of transforming raw data into learning algorithms. In some cases, data preparation is a required step in order to provide the data to an algorithm in its required input format. In other cases, the most appropriate representation of the input data is not known and must be explored in a […]
A Gentle Introduction to Degrees of Freedom in Machine Learning
Degrees of freedom is an important concept from statistics and engineering. It is often employed to summarize the number of values used in the calculation of a statistic, such as a sample statistic or in a statistical hypothesis test. In machine learning, the degrees of freedom may refer to the number of parameters in the […]
Standard Machine Learning Datasets for Imbalanced Classification
An imbalanced classification problem is a problem that involves predicting a class label where the distribution of class labels in the training dataset is skewed. Many real-world classification problems have an imbalanced class distribution, therefore it is important for machine learning practitioners to get familiar with working with these types of problems. In this tutorial, […]
A Gentle Introduction to Markov Chain Monte Carlo for Probability
Probabilistic inference involves estimating an expected value or density using a probabilistic model. Often, directly inferring values is not tractable with probabilistic models, and instead, approximation methods must be used. Markov Chain Monte Carlo sampling provides a class of algorithms for systematic random sampling from high-dimensional probability distributions. Unlike Monte Carlo sampling methods that are […]
Probability for Machine Learning
Probability for Machine Learning Discover How To Harness Uncertainty With Python Machine Learning DOES NOT MAKE SENSE Without Probability What is Probability? …it’s about handling uncertainty Uncertainty involves making decisions with incomplete information, and this is the way we generally operate in the world. Handling uncertainty is typically described using everyday words like chance, luck, […]
What Is Probability?
Uncertainty involves making decisions with incomplete information, and this is the way we generally operate in the world. Handling uncertainty is typically described using everyday words like chance, luck, and risk. Probability is a field of mathematics that gives us the language and tools to quantify the uncertainty of events and reason in a principled […]
Resources for Getting Started With Probability in Machine Learning
Machine Learning is a field of computer science concerned with developing systems that can learn from data. Like statistics and linear algebra, probability is another foundational field that supports machine learning. Probability is a field of mathematics concerned with quantifying uncertainty. Many aspects of machine learning are uncertain, including, most critically, observations from the problem […]