Machine learning model selection and configuration may be the biggest challenge in applied machine learning. Controlled experiments must be performed in order to discover what works best for a given classification or regression predictive modeling task. This can feel overwhelming given the large number of data preparation schemes, learning algorithms, and model hyperparameters that could […]

# Search results for "Value At Risk"

## A Gentle Introduction to Computational Learning Theory

Computational learning theory, or statistical learning theory, refers to mathematical frameworks for quantifying learning tasks and algorithms. These are sub-fields of machine learning that a machine learning practitioner does not need to know in great depth in order to achieve good results on a wide range of problems. Nevertheless, it is a sub-field where having […]

## Nested Cross-Validation for Machine Learning with Python

The k-fold cross-validation procedure is used to estimate the performance of machine learning models when making predictions on data not used during training. This procedure can be used both when optimizing the hyperparameters of a model on a dataset, and when comparing and selecting a model for the dataset. When the same cross-validation procedure and […]

## How to Create Custom Data Transforms for Scikit-Learn

The scikit-learn Python library for machine learning offers a suite of data transforms for changing the scale and distribution of input data, as well as removing input features (columns). There are many simple data cleaning operations, such as removing outliers and removing columns with few observations, that are often performed manually to the data, requiring […]

## 8 Top Books on Data Cleaning and Feature Engineering

Data preparation is the transformation of raw data into a form that is more appropriate for modeling. It is a challenging topic to discuss as the data differs in form, type, and structure from project to project. Nevertheless, there are common data preparation tasks across projects. It is a huge field of study and goes […]

## Feature Engineering and Selection (Book Review)

Data preparation is the process of transforming raw data into learning algorithms. In some cases, data preparation is a required step in order to provide the data to an algorithm in its required input format. In other cases, the most appropriate representation of the input data is not known and must be explored in a […]

## A Gentle Introduction to Degrees of Freedom in Machine Learning

Degrees of freedom is an important concept from statistics and engineering. It is often employed to summarize the number of values used in the calculation of a statistic, such as a sample statistic or in a statistical hypothesis test. In machine learning, the degrees of freedom may refer to the number of parameters in the […]

## Standard Machine Learning Datasets for Imbalanced Classification

An imbalanced classification problem is a problem that involves predicting a class label where the distribution of class labels in the training dataset is skewed. Many real-world classification problems have an imbalanced class distribution, therefore it is important for machine learning practitioners to get familiar with working with these types of problems. In this tutorial, […]

## A Gentle Introduction to Model Selection for Machine Learning

Given easy-to-use machine learning libraries like scikit-learn and Keras, it is straightforward to fit many different machine learning models on a given predictive modeling dataset. The challenge of applied machine learning, therefore, becomes how to choose among a range of different models that you can use for your problem. Naively, you might believe that model […]

## A Gentle Introduction to Markov Chain Monte Carlo for Probability

Probabilistic inference involves estimating an expected value or density using a probabilistic model. Often, directly inferring values is not tractable with probabilistic models, and instead, approximation methods must be used. Markov Chain Monte Carlo sampling provides a class of algorithms for systematic random sampling from high-dimensional probability distributions. Unlike Monte Carlo sampling methods that are […]