Archive | Imbalanced Classification

Scatter Plot of Binary Classification Dataset With 1 to 100 Class Imbalance

Cost-Sensitive SVM for Imbalanced Classification

By Jason Brownlee on August 21, 2020 in Imbalanced Classification 29

The Support Vector Machine algorithm is effective for balanced classification, although it does not perform well on imbalanced datasets. The SVM algorithm finds a hyperplane decision boundary that best splits the examples into two classes. The split is made soft through the use of a margin that allows some points to be misclassified. By default, […]

Cost-Sensitive Decision Trees for Imbalanced Classification

By Jason Brownlee on August 21, 2020 in Imbalanced Classification 18

The decision tree algorithm is effective for balanced classification, although it does not perform well on imbalanced datasets. The split points of the tree are chosen to best separate examples into two groups with minimum mixing. When both groups are dominated by examples from one class, the criterion used to select a split point will […]

Cost-Sensitive Logistic Regression for Imbalanced Classification

By Jason Brownlee on October 26, 2020 in Imbalanced Classification 34

Logistic regression does not support imbalanced classification directly. Instead, the training algorithm used to fit the logistic regression model must be modified to take the skewed distribution into account. This can be achieved by specifying a class weighting configuration that is used to influence the amount that logistic regression coefficients are updated during training. The […]

Tour of Data Resampling Methods for Imbalanced Classification

Tour of Data Sampling Methods for Imbalanced Classification

By Jason Brownlee on January 14, 2020 in Imbalanced Classification 21

Machine learning techniques often fail or give misleadingly optimistic performance on classification datasets with an imbalanced class distribution. The reason is that many machine learning algorithms are designed to operate on classification data with an equal number of observations for each class. When this is not the case, algorithms can learn that very few examples […]

How to Combine Oversampling and Undersampling for Imbalanced Classification

By Jason Brownlee on May 11, 2021 in Imbalanced Classification 52

Resampling methods are designed to add or remove examples from the training dataset in order to change the class distribution. Once the class distributions are more balanced, the suite of standard machine learning classification algorithms can be fit successfully on the transformed datasets. Oversampling methods duplicate or create new synthetic examples in the minority class, […]

Scatter Plot of Imbalanced Dataset Undersampled With the Condensed Nearest Neighbor Rule

Undersampling Algorithms for Imbalanced Classification

By Jason Brownlee on January 27, 2021 in Imbalanced Classification 36

Resampling methods are designed to change the composition of a training dataset for an imbalanced classification task. Most of the attention of resampling methods for imbalanced classification is put on oversampling the minority class. Nevertheless, a suite of techniques has been developed for undersampling the majority class that can be used in conjunction with effective […]

Scatter Plot of Imbalanced Dataset Transformed by SMOTE and Random Undersampling

SMOTE for Imbalanced Classification with Python

By Jason Brownlee on March 17, 2021 in Imbalanced Classification 321

Imbalanced classification involves developing predictive models on classification datasets that have a severe class imbalance. The challenge of working with imbalanced datasets is that most machine learning techniques will ignore, and in turn have poor performance on, the minority class, although typically it is performance on the minority class that is most important. One approach […]

Imbalanced Classification With Python (7-Day Mini-Course)

By Jason Brownlee on January 5, 2021 in Imbalanced Classification 150

Imbalanced Classification Crash Course. Get on top of imbalanced classification in 7 days. Classification predictive modeling is the task of assigning a label to an example. Imbalanced classification are those classification tasks where the distribution of examples across the classes is not equal. Practical imbalanced classification requires the use of a suite of specialized techniques, […]

Random Oversampling and Undersampling for Imbalanced Classification

By Jason Brownlee on January 5, 2021 in Imbalanced Classification 101

Imbalanced datasets are those where there is a severe skew in the class distribution, such as 1:100 or 1:1000 examples in the minority class to the majority class. This bias in the training dataset can influence many machine learning algorithms, leading some to ignore the minority class entirely. This is a problem as it is […]

What Is the Naive Classifier for Each Imbalanced Classification Metric?

By Jason Brownlee on August 27, 2020 in Imbalanced Classification 19

A common mistake made by beginners is to apply machine learning algorithms to a problem without establishing a performance baseline. A performance baseline provides a minimum score above which a model is considered to have skill on the dataset. It also provides a point of relative improvement for all models evaluated on the dataset. A […]

← Previous 1 2 3 4 5 Next →