Probability Decision Surface for Logistic Regression on a Binary Classification Task

Plot a Decision Surface for Machine Learning Algorithms in Python

Classification algorithms learn how to assign class labels to examples, although their decisions can appear opaque. A popular diagnostic for understanding the decisions made by a classification algorithm is the decision surface. This is a plot that shows how a fit machine learning algorithm predicts a coarse grid across the input feature space. A decision […]

Continue Reading
A Gentle Introduction to Computational Learning Theory

A Gentle Introduction to Computational Learning Theory

Computational learning theory, or statistical learning theory, refers to mathematical frameworks for quantifying learning tasks and algorithms. These are sub-fields of machine learning that a machine learning practitioner does not need to know in great depth in order to achieve good results on a wide range of problems. Nevertheless, it is a sub-field where having […]

Continue Reading
Histogram of Examples in Each Class in the Glass Multi-Class Classification Dataset

Multi-Class Imbalanced Classification

Imbalanced classification are those prediction tasks where the distribution of examples across class labels is not equal. Most imbalanced classification examples focus on binary classification tasks, yet many of the tools and techniques for imbalanced classification also directly support multi-class classification problems. In this tutorial, you will discover how to use the tools of imbalanced […]

Continue Reading
Line Plot of Expected vs. Births Predicted Using XGBoost

How to Use XGBoost for Time Series Forecasting

XGBoost is an efficient implementation of gradient boosting for classification and regression problems. It is both fast and efficient, performing well, if not the best, on a wide range of predictive modeling tasks and is a favorite among data science competition winners, such as those on Kaggle. XGBoost can also be used for time series […]

Continue Reading
Box and Whisker Plots of Classification Accuracy vs Repeats for k-Fold Cross-Validation

Repeated k-Fold Cross-Validation for Model Evaluation in Python

The k-fold cross-validation procedure is a standard method for estimating the performance of a machine learning algorithm or configuration on a dataset. A single run of the k-fold cross-validation procedure may result in a noisy estimate of model performance. Different splits of the data may result in very different results. Repeated k-fold cross-validation provides a […]

Continue Reading
Nested Cross-Validation for Machine Learning with Python

Nested Cross-Validation for Machine Learning with Python

The k-fold cross-validation procedure is used to estimate the performance of machine learning models when making predictions on data not used during training. This procedure can be used both when optimizing the hyperparameters of a model on a dataset, and when comparing and selecting a model for the dataset. When the same cross-validation procedure and […]

Continue Reading