Archive | Imbalanced Classification

Box and Whisker Plot of Machine Learning Models on the Imbalanced Glass Identification Dataset

Imbalanced Multiclass Classification with the Glass Identification Dataset

Multiclass classification problems are those where a label must be predicted, but there are more than two labels that may be predicted. These are challenging predictive modeling problems because a sufficiently representative number of examples of each class is required for a model to learn the problem. It is made challenging when the number of […]

Continue Reading 26
How to Predict the Probability of Fraudulent Credit Card Transactions

Imbalanced Classification with the Fraudulent Credit Card Transactions Dataset

Fraud is a major problem for credit card companies, both because of the large volume of transactions that are completed each day and because many fraudulent transactions look a lot like normal transactions. Identifying fraudulent credit card transactions is a common type of imbalanced binary classification where the focus is on the positive class (is […]

Continue Reading 14
How to Spot-Check Imbalanced Machine Learning Algorithms

Step-By-Step Framework for Imbalanced Classification Projects

Classification predictive modeling problems involve predicting a class label for a given set of inputs. It is a challenging problem in general, especially if little is known about the dataset, as there are tens, if not hundreds, of machine learning algorithms to choose from. The problem is made significantly more difficult if the distribution of […]

Continue Reading 30
Histogram Plots of the Variables for the Phoneme Dataset

Predictive Model for the Phoneme Imbalanced Classification Dataset

Many binary classification tasks do not have an equal number of examples from each class, e.g. the class distribution is skewed or imbalanced. Nevertheless, accuracy is equally important in both classes. An example is the classification of vowel sounds from European languages as either nasal or oral on speech recognition where there are many more […]

Continue Reading 6
Develop an Imbalanced Classification Model to Detect Microcalcifications

Imbalanced Classification Model to Detect Mammography Microcalcifications

Cancer detection is a popular example of an imbalanced classification problem because there are often significantly more cases of non-cancer than actual cancer. A standard imbalanced classification dataset is the mammography dataset that involves detecting breast cancer from radiological scans, specifically the presence of clusters of microcalcifications that appear bright on a mammogram. This dataset […]

Continue Reading 8
How to Calibrate Probabilities for Imbalanced Classification

How to Calibrate Probabilities for Imbalanced Classification

Many machine learning models are capable of predicting a probability or probability-like scores for class membership. Probabilities provide a required level of granularity for evaluating and comparing models, especially on imbalanced classification problems where tools like ROC Curves are used to interpret predictions and the ROC AUC metric is used to compare model performance, both […]

Continue Reading 10
A Gentle Introduction to the Fbeta-Measure for Machine Learning

A Gentle Introduction to the Fbeta-Measure for Machine Learning

Fbeta-measure is a configurable single-score metric for evaluating a binary classification model based on the predictions made for the positive class. The Fbeta-measure is calculated using precision and recall. Precision is a metric that calculates the percentage of correct predictions for the positive class. Recall calculates the percentage of correct predictions for the positive class […]

Continue Reading 4