Archive | Imbalanced Classification

Scatter Plot of Binary Classification Dataset With 1 to 100 Class Imbalance

Cost-Sensitive Logistic Regression for Imbalanced Classification

Logistic regression does not support imbalanced classification directly. Instead, the training algorithm used to fit the logistic regression model must be modified to take the skewed distribution into account. This can be achieved by specifying a class weighting configuration that is used to influence the amount that logistic regression coefficients are updated during training. The […]

Continue Reading
Tour of Data Resampling Methods for Imbalanced Classification

Tour of Data Sampling Methods for Imbalanced Classification

Machine learning techniques often fail or give misleadingly optimistic performance on classification datasets with an imbalanced class distribution. The reason is that many machine learning algorithms are designed to operate on classification data with an equal number of observations for each class. When this is not the case, algorithms can learn that very few examples […]

Continue Reading
Combine Oversampling and Undersampling for Imbalanced Classification

How to Combine Oversampling and Undersampling for Imbalanced Classification

Resampling methods are designed to add or remove examples from the training dataset in order to change the class distribution. Once the class distributions are more balanced, the suite of standard machine learning classification algorithms can be fit successfully on the transformed datasets. Oversampling methods duplicate or create new synthetic examples in the minority class, […]

Continue Reading
Scatter Plot of Imbalanced Dataset Undersampled With the Condensed Nearest Neighbor Rule

Undersampling Algorithms for Imbalanced Classification

Resampling methods are designed to change the composition of a training dataset for an imbalanced classification task. Most of the attention of resampling methods for imbalanced classification is put on oversampling the minority class. Nevertheless, a suite of techniques has been developed for undersampling the majority class that can be used in conjunction with effective […]

Continue Reading
Scatter Plot of Imbalanced Dataset Transformed by SMOTE and Random Undersampling

SMOTE for Imbalanced Classification with Python

Imbalanced classification involves developing predictive models on classification datasets that have a severe class imbalance. The challenge of working with imbalanced datasets is that most machine learning techniques will ignore, and in turn have poor performance on, the minority class, although typically it is performance on the minority class that is most important. One approach […]

Continue Reading
Imbalanced Classification With Python (7-Day Mini-Course)

Imbalanced Classification With Python (7-Day Mini-Course)

Imbalanced Classification Crash Course. Get on top of imbalanced classification in 7 days. Classification predictive modeling is the task of assigning a label to an example. Imbalanced classification are those classification tasks where the distribution of examples across the classes is not equal. Practical imbalanced classification requires the use of a suite of specialized techniques, […]

Continue Reading
What Is the Naive Classifier for Each Imbalanced Classification Metric?

What Is the Naive Classifier for Each Imbalanced Classification Metric?

A common mistake made by beginners is to apply machine learning algorithms to a problem without establishing a performance baseline. A performance baseline provides a minimum score above which a model is considered to have skill on the dataset. It also provides a point of relative improvement for all models evaluated on the dataset. A […]

Continue Reading