Scatter Plot of Binary Classification Dataset With 1 to 100 Class Distribution

Failure of Classification Accuracy for Imbalanced Class Distributions

Classification accuracy is a metric that summarizes the performance of a classification model as the number of correct predictions divided by the total number of predictions. It is easy to calculate and intuitive to understand, making it the most common metric used for evaluating classifier models. This intuition breaks down when the distribution of examples […]

Continue Reading 12
Standard Machine Learning Datasets for Imbalanced Classification

Standard Machine Learning Datasets for Imbalanced Classification

An imbalanced classification problem is a problem that involves predicting a class label where the distribution of class labels in the training dataset is skewed. Many real-world classification problems have an imbalanced class distribution, therefore it is important for machine learning practitioners to get familiar with working with these types of problems. In this tutorial, […]

Continue Reading 14
Scatter Plot of Binary Classification Dataset With Provided Class Distribution

Develop an Intuition for Severely Skewed Class Distributions

An imbalanced classification problem is a problem that involves predicting a class label where the distribution of class labels in the training dataset is not equal. A challenge for beginners working with imbalanced classification problems is what a specific skewed class distribution means. For example, what is the difference and implication for a 1:10 vs. […]

Continue Reading 14
A Gentle Introduction to Imbalanced Classification

A Gentle Introduction to Imbalanced Classification

Classification predictive modeling involves predicting a class label for a given observation. An imbalanced classification problem is an example of a classification problem where the distribution of examples across the known classes is biased or skewed. The distribution can vary from a slight bias to a severe imbalance where there is one example in the […]

Continue Reading 16
Use the ColumnTransformer for Numerical and Categorical Data in Python

Use the ColumnTransformer for Numerical and Categorical Data in Python

You must prepare your raw data using data transforms prior to fitting a machine learning model. This is required to ensure that you best expose the structure of your predictive modeling problem to the learning algorithms. Applying data transforms like scaling or encoding categorical variables is straightforward when all input variables are the same type. […]

Continue Reading 2
Learning Curves of Cross-Entropy Loss for a Deep Learning Model

TensorFlow 2 Tutorial: Get Started in Deep Learning With tf.keras

Predictive modeling with deep learning is a skill that modern developers need to know. TensorFlow is the premier open-source deep learning framework developed and maintained by Google. Although using TensorFlow directly can be challenging, the modern tf.keras API beings the simplicity and ease of use of Keras to the TensorFlow project. Using tf.keras allows you […]

Continue Reading 20
Results for Standard Classification and Regression Machine Learning Datasets

Results for Standard Classification and Regression Machine Learning Datasets

It is important that beginner machine learning practitioners practice on small real-world datasets. So-called standard machine learning datasets contain actual observations, fit into memory, and are well studied and well understood. As such, they can be used by beginner practitioners to quickly test, explore, and practice data preparation and modeling techniques. A practitioner can confirm […]

Continue Reading 4
How to Transform Target Variables for Regression With Scikit-Learn

How to Transform Target Variables for Regression With Scikit-Learn

Data preparation is a big part of applied machine learning. Correctly preparing your training data can mean the difference between mediocre and extraordinary results, even with very simple linear algorithms. Performing data preparation operations, such as scaling, is relatively straightforward for input variables and has been made routine in Python via the Pipeline scikit-learn class. […]

Continue Reading 10