Archive | Imbalanced Classification

A Gentle Introduction to the Fbeta-Measure for Machine Learning

By Jason Brownlee on January 14, 2020 in Imbalanced Classification 15

Fbeta-measure is a configurable single-score metric for evaluating a binary classification model based on the predictions made for the positive class. The Fbeta-measure is calculated using precision and recall. Precision is a metric that calculates the percentage of correct predictions for the positive class. Recall calculates the percentage of correct predictions for the positive class […]

Histogram of Each Variable in the Oil Spill Dataset

How to Develop an Imbalanced Classification Model to Detect Oil Spills

By Jason Brownlee on January 5, 2021 in Imbalanced Classification 30

Many imbalanced classification tasks require a skillful model that predicts a crisp class label, where both classes are equally important. An example of an imbalanced classification problem where a class label is required and both classes are equally important is the detection of oil spills or slicks in satellite images. The detection of a spill […]

Box and Whisker Plot of Probabilistic Models on the Haberman Breast Cancer Survival Dataset

How to Develop a Probabilistic Model of Breast Cancer Patient Survival

By Jason Brownlee on August 21, 2020 in Imbalanced Classification 12

Developing a probabilistic model is challenging in general, although it is made more so when there is skew in the distribution of cases, referred to as an imbalanced dataset. The Haberman Dataset describes the five year or greater survival of breast cancer patient patients in the 1950s and 1960s and mostly contains patients that survive. […]

Scatter Plots of an Imbalanced Classification Dataset With Different Numbers of Clusters

Why Is Imbalanced Classification Difficult?

By Jason Brownlee on January 14, 2020 in Imbalanced Classification 24

Imbalanced classification is primarily challenging as a predictive modeling task because of the severely skewed class distribution. This is the cause for poor performance with traditional machine learning models and evaluation metrics that assume a balanced class distribution. Nevertheless, there are additional properties of a classification dataset that are not only challenging for predictive modeling […]

Scatter Plot of a Binary Classification Problem With a 1 to 1000 Class Imbalance

One-Class Classification Algorithms for Imbalanced Datasets

By Jason Brownlee on August 21, 2020 in Imbalanced Classification 104

Outliers or anomalies are rare examples that do not fit in with the rest of the data. Identifying outliers in data is referred to as outlier or anomaly detection and a subfield of machine learning focused on this problem is referred to as one-class classification. These are unsupervised learning algorithms that attempt to model “normal” […]

Bagging and Random Forest for Imbalanced Classification

By Jason Brownlee on January 5, 2021 in Imbalanced Classification 36

Bagging is an ensemble algorithm that fits multiple models on different subsets of a training dataset, then combines the predictions from all models. Random forest is an extension of bagging that also randomly selects subsets of features used in each data sample. Both bagging and random forests have proven effective on a wide range of […]

A Gentle Introduction to Threshold-Moving for Imbalanced Classification

By Jason Brownlee on January 5, 2021 in Imbalanced Classification 205

Classification predictive modeling typically involves predicting a class label. Nevertheless, many machine learning algorithms are capable of predicting a probability or scoring of class membership, and this must be interpreted before it can be mapped to a crisp class label. This is achieved by using a threshold, such as 0.5, where all values equal or […]

Cost-Sensitive Learning for Imbalanced Classification

By Jason Brownlee on January 14, 2020 in Imbalanced Classification 39

Most machine learning algorithms assume that all misclassification errors made by a model are equal. This is often not the case for imbalanced classification problems where missing a positive or minority class case is worse than incorrectly classifying an example from the negative or majority class. There are many real-world examples, such as detecting spam […]

Scatter Plot of Binary Classification Dataset With 1 to 100 Class Imbalance

How to Configure XGBoost for Imbalanced Classification

By Jason Brownlee on August 21, 2020 in Imbalanced Classification 57

The XGBoost algorithm is effective for a wide range of regression and classification predictive modeling problems. It is an efficient implementation of the stochastic gradient boosting algorithm and offers a range of hyperparameters that give fine-grained control over the model training procedure. Although the algorithm performs well in general, even on imbalanced classification datasets, it […]

How to Develop a Cost-Sensitive Neural Network for Imbalanced Classification

By Jason Brownlee on August 21, 2020 in Imbalanced Classification 48

Deep learning neural networks are a flexible class of machine learning algorithms that perform well on a wide range of problems. Neural networks are trained using the backpropagation of error algorithm that involves calculating errors made by the model on the training dataset and updating the model weights in proportion to those errors. The limitation […]

← Previous 1 2 3 … 5 Next →