Archive | Data Preparation

Histogram of Each Variable in the Diabetes Classification Dataset

How to Selectively Scale Numerical Input Variables for Machine Learning

Many machine learning models perform better when input variables are carefully transformed or scaled prior to modeling. It is convenient, and therefore common, to apply the same data transforms, such as standardization and normalization, equally to all input variables. This can achieve good results on many problems. Nevertheless, better results may be achieved by carefully […]

Continue Reading 4
Dimensionality Reduction Algorithms With Python

6 Dimensionality Reduction Algorithms With Python

Dimensionality reduction is an unsupervised learning technique. Nevertheless, it can be used as a data transform pre-processing step for machine learning algorithms on classification and regression predictive modeling datasets with supervised learning algorithms. There are many dimensionality reduction algorithms to choose from and no single best algorithm for all cases. Instead, it is a good […]

Continue Reading 8
Model-Based Outlier Detection and Removal in Python

4 Automatic Outlier Detection Algorithms in Python

The presence of outliers in a classification or regression dataset can result in a poor fit and lower predictive modeling performance. Identifying and removing outliers is challenging with simple statistical methods for most machine learning datasets given the large number of input variables. Instead, automatic outlier detection methods can be used in the modeling pipeline […]

Continue Reading 12