Archive | Imbalanced Classification

How to Calibrate Probabilities for Imbalanced Classification

How to Calibrate Probabilities for Imbalanced Classification

Many machine learning models are capable of predicting a probability or probability-like scores for class membership. Probabilities provide a required level of granularity for evaluating and comparing models, especially on imbalanced classification problems where tools like ROC Curves are used to interpret predictions and the ROC AUC metric is used to compare model performance, both […]

Continue Reading 4
A Gentle Introduction to the Fbeta-Measure for Machine Learning

A Gentle Introduction to the Fbeta-Measure for Machine Learning

Fbeta-measure is a configurable single-score metric for evaluating a binary classification model based on the predictions made for the positive class. The Fbeta-measure is calculated using precision and recall. Precision is a metric that calculates the percentage of correct predictions for the positive class. Recall calculates the percentage of correct predictions for the positive class […]

Continue Reading 0
Histogram of Each Variable in the Oil Spill Dataset

How to Develop an Imbalanced Classification Model to Detect Oil Spills

Many imbalanced classification tasks require a skillful model that predicts a crisp class label, where both classes are equally important. An example of an imbalanced classification problem where a class label is required and both classes are equally important is the detection of oil spills or slicks in satellite images. The detection of a spill […]

Continue Reading 14
Box and Whisker Plot of Probabilistic Models on the Haberman Breast Cancer Survival Dataset

How to Develop a Probabilistic Model of Breast Cancer Patient Survival

Developing a probabilistic model is challenging in general, although it is made more so when there is skew in the distribution of cases, referred to as an imbalanced dataset. The Haberman Dataset describes the five year or greater survival of breast cancer patient patients in the 1950s and 1960s and mostly contains patients that survive. […]

Continue Reading 8
Scatter Plots of an Imbalanced Classification Dataset With Different Numbers of Clusters

Why Is Imbalanced Classification Difficult?

Imbalanced classification is primarily challenging as a predictive modeling task because of the severely skewed class distribution. This is the cause for poor performance with traditional machine learning models and evaluation metrics that assume a balanced class distribution. Nevertheless, there are additional properties of a classification dataset that are not only challenging for predictive modeling […]

Continue Reading 8
A Gentle Introduction to Threshold-Moving for Imbalanced Classification

A Gentle Introduction to Threshold-Moving for Imbalanced Classification

Classification predictive modeling typically involves predicting a class label. Nevertheless, many machine learning algorithms are capable of predicting a probability or scoring of class membership, and this must be interpreted before it can be mapped to a crisp class label. This is achieved by using a threshold, such as 0.5, where all values equal or […]

Continue Reading 14
Cost-Sensitive Learning for Imbalanced Classification

Cost-Sensitive Learning for Imbalanced Classification

Most machine learning algorithms assume that all misclassification errors made by a model are equal. This is often not the case for imbalanced classification problems where missing a positive or minority class case is worse than incorrectly classifying an example from the negative or majority class. There are many real-world examples, such as detecting spam […]

Continue Reading 6