Scatter Plot of Imbalanced Dataset Undersampled With the Condensed Nearest Neighbor Rule

Undersampling Algorithms for Imbalanced Classification

Resampling methods are designed to change the composition of a training dataset for an imbalanced classification task. Most of the attention of resampling methods for imbalanced classification is put on oversampling the minority class. Nevertheless, a suite of techniques has been developed for undersampling the majority class that can be used in conjunction with effective […]

Continue Reading 0
Scatter Plot of Imbalanced Dataset Transformed by SMOTE and Random Undersampling

SMOTE Oversampling for Imbalanced Classification with Python

Imbalanced classification involves developing predictive models on classification datasets that have a severe class imbalance. The challenge of working with imbalanced datasets is that most machine learning techniques will ignore, and in turn have poor performance on, the minority class, although typically it is performance on the minority class that is most important. One approach […]

Continue Reading 2
Imbalanced Classification With Python (7-Day Mini-Course)

Imbalanced Classification With Python (7-Day Mini-Course)

Imbalanced Classification Crash Course. Get on top of imbalanced classification in 7 days. Classification predictive modeling is the task of assigning a label to an example. Imbalanced classification are those classification tasks where the distribution of examples across the classes is not equal. Practical imbalanced classification requires the use of a suite of specialized techniques, […]

Continue Reading 24
What Is the Naive Classifier for Each Imbalanced Classification Metric?

What Is the Naive Classifier for Each Imbalanced Classification Metric?

A common mistake made by beginners is to apply machine learning algorithms to a problem without establishing a performance baseline. A performance baseline provides a minimum score above which a model is considered to have skill on the dataset. It also provides a point of relative improvement for all models evaluated on the dataset. A […]

Continue Reading 0
A Gentle Introduction to Probability Metrics for Imbalanced Classification

A Gentle Introduction to Probability Metrics for Imbalanced Classification

Classification predictive modeling involves predicting a class label for examples, although some problems require the prediction of a probability of class membership. For these problems, the crisp class labels are not required, and instead, the likelihood that each example belonging to each class is required and later interpreted. As such, small relative probabilities can carry […]

Continue Reading 16
Precision-Recall Curve of a Logistic Regression Model and a No Skill Classifier

ROC Curves and Precision-Recall Curves for Imbalanced Classification

Most imbalanced classification problems involve two classes: a negative case with the majority of examples and a positive case with a minority of examples. Two diagnostic tools that help in the interpretation of binary (two-class) classification predictive models are ROC Curves and Precision-Recall curves. Plots from the curves can be created and used to understand […]

Continue Reading 14
How to Calculate Precision, Recall, and F-Measure for Imbalanced Classification

How to Calculate Precision, Recall, and F-Measure for Imbalanced Classification

Classification accuracy is the total number of correct predictions divided by the total number of predictions made for a dataset. As a performance measure, accuracy is inappropriate for imbalanced classification problems. The main reason is that the overwhelming number of examples from the majority class (or classes) will overwhelm the number of examples in the […]

Continue Reading 15