Archive | Python Machine Learning

Use the ColumnTransformer for Numerical and Categorical Data in Python

Use the ColumnTransformer for Numerical and Categorical Data in Python

You must prepare your raw data using data transforms prior to fitting a machine learning model. This is required to ensure that you best expose the structure of your predictive modeling problem to the learning algorithms. Applying data transforms like scaling or encoding categorical variables is straightforward when all input variables are the same type. […]

Continue Reading 2
Results for Standard Classification and Regression Machine Learning Datasets

Results for Standard Classification and Regression Machine Learning Datasets

It is important that beginner machine learning practitioners practice on small real-world datasets. So-called standard machine learning datasets contain actual observations, fit into memory, and are well studied and well understood. As such, they can be used by beginner practitioners to quickly test, explore, and practice data preparation and modeling techniques. A practitioner can confirm […]

Continue Reading 4
How to Transform Target Variables for Regression With Scikit-Learn

How to Transform Target Variables for Regression With Scikit-Learn

Data preparation is a big part of applied machine learning. Correctly preparing your training data can mean the difference between mediocre and extraordinary results, even with very simple linear algorithms. Performing data preparation operations, such as scaling, is relatively straightforward for input variables and has been made routine in Python via the Pipeline scikit-learn class. […]

Continue Reading 16
Hyperparameters for Classification Machine Learning Algorithms

Tune Hyperparameters for Classification Machine Learning Algorithms

Machine learning algorithms have hyperparameters that allow you to tailor the behavior of the algorithm to your specific dataset. Hyperparameters are different from parameters, which are the internal coefficients or weights for a model found by the learning algorithm. Unlike parameters, hyperparameters are specified by the practitioner when configuring the model. Typically, it is challenging […]

Continue Reading 12
How to Develop Super Learner Ensembles in Python

How to Develop Super Learner Ensembles in Python

Selecting a machine learning algorithm for a predictive modeling problem involves evaluating many different models and model configurations using k-fold cross-validation. The super learner is an ensemble machine learning algorithm that combines all of the models and model configurations that you might investigate for a predictive modeling problem and uses them to make a prediction […]

Continue Reading 35
How to Use Out-of-Fold Predictions in Machine Learning

How to Use Out-of-Fold Predictions in Machine Learning

Machine learning algorithms are typically evaluated using resampling techniques such as k-fold cross-validation. During the k-fold cross-validation process, predictions are made on test sets comprised of data not used to train the model. These predictions are referred to as out-of-fold predictions, a type of out-of-sample predictions. Out-of-fold predictions play an important role in machine learning […]

Continue Reading 6
How to Choose Feature Selection Methods For Machine Learning

How to Choose a Feature Selection Method For Machine Learning

Feature selection is the process of reducing the number of input variables when developing a predictive model. It is desirable to reduce the number of input variables to both reduce the computational cost of modeling and, in some cases, to improve the performance of the model. Feature-based feature selection methods involve evaluating the relationship between […]

Continue Reading 39
Bar Chart of the Input Features (x) vs The Chi Squared Feature Importance (y)

How to Perform Feature Selection with Categorical Data

Feature selection is the process of identifying and selecting a subset of input features that are most relevant to the target variable. Feature selection is often straightforward when working with real-valued data, such as using the Pearson’s correlation coefficient, but can be challenging when working with categorical data. The two most commonly used feature selection […]

Continue Reading 28