Search results for "MinMaxScaler"

Box and Whisker Plot of MAE Distributions for Individual Models and Data Transform Ensemble

Develop a Bagging Ensemble with Different Data Transformations

By Jason Brownlee on April 27, 2021 in Ensemble Learning 8

Bootstrap aggregation, or bagging, is an ensemble where each model is trained on a different sample of the training dataset. The idea of bagging can be generalized to other techniques for changing the training dataset and fitting the same model on each changed version of the data. One approach is to use data transforms that […]

Radius Neighbors Classifier Algorithm With Python

By Jason Brownlee on August 3, 2020 in Python Machine Learning 2

Radius Neighbors Classifier is a classification machine learning algorithm. It is an extension to the k-nearest neighbors algorithm that makes predictions using all examples in the radius of a new example rather than the k-closest neighbors. As such, the radius-based approach to selecting neighbors is more appropriate for sparse data, preventing examples that are far […]

Line Plot of Accuracy vs. Hill Climb Optimization Iteration for the Diabetes Dataset

How to Hill Climb the Test Set for Machine Learning

By Jason Brownlee on September 27, 2020 in Data Preparation 16

Hill climbing the test set is an approach to achieving good or perfect predictions on a machine learning competition without touching the training set or even developing a predictive model. As an approach to machine learning competitions, it is rightfully frowned upon, and most competition platforms impose limitations to prevent it, which is important. Nevertheless, […]

Histogram of Each Variable in the Diabetes Classification Dataset

How to Selectively Scale Numerical Input Variables for Machine Learning

By Jason Brownlee on August 17, 2020 in Data Preparation 12

Many machine learning models perform better when input variables are carefully transformed or scaled prior to modeling. It is convenient, and therefore common, to apply the same data transforms, such as standardization and normalization, equally to all input variables. This can achieve good results on many problems. Nevertheless, better results may be achieved by carefully […]

Box and Whisker Plot of Classification Accuracy for Different Data Transforms on the Wine Classification Dataset

How to Grid Search Data Preparation Techniques

By Jason Brownlee on August 17, 2020 in Data Preparation 11

Machine learning predictive modeling performance is only as good as your data, and your data is only as good as the way you prepare it for modeling. The most common approach to data preparation is to study a dataset and review the expectations of a machine learning algorithms, then carefully choose the most appropriate data […]

Machine Learning Data Preparation Framework

Framework for Data Preparation Techniques in Machine Learning

By Jason Brownlee on July 17, 2020 in Data Preparation 42

There are a vast number of different types of data preparation techniques that could be used on a predictive modeling project. In some cases, the distribution of the data or the requirements of a machine learning model may suggest the data preparation needed, although this is rarely the case given the complexity and high-dimensionality of […]

How to Use Feature Extraction on Tabular Data for Data Preparation

How to Use Feature Extraction on Tabular Data for Machine Learning

By Jason Brownlee on August 17, 2020 in Data Preparation 33

Machine learning predictive modeling performance is only as good as your data, and your data is only as good as the way you prepare it for modeling. The most common approach to data preparation is to study a dataset and review the expectations of a machine learning algorithm, then carefully choose the most appropriate data […]

Data Preparation for Machine Learning (7-Day Mini-Course)

By Jason Brownlee on June 30, 2020 in Data Preparation 279

Data Preparation for Machine Learning Crash Course. Get on top of data preparation with Python in 7 days. Data preparation involves transforming raw data into a form that is more appropriate for modeling. Preparing data may be the most important part of a predictive modeling project and the most time-consuming, although it seems to be […]

How to Avoid Data Leakage When Performing Data Preparation

By Jason Brownlee on August 17, 2020 in Data Preparation 87

Data preparation is the process of transforming raw data into a form that is appropriate for modeling. A naive approach to preparing data applies the transform on the entire dataset before evaluating the performance of the model. This results in a problem referred to as data leakage, where knowledge of the hold-out test set leaks […]

Histogram of Skewed Gaussian Data After Power Transform

How to Use Power Transforms for Machine Learning

By Jason Brownlee on August 28, 2020 in Data Preparation 57

Machine learning algorithms like Linear Regression and Gaussian Naive Bayes assume the numerical variables have a Gaussian probability distribution. Your data may not have a Gaussian distribution and instead may have a Gaussian-like distribution (e.g. nearly Gaussian but with outliers or a skew) or a totally different distribution (e.g. exponential). As such, you may be […]

← Previous 1 2 3 … 6 Next →