Archive | Python Machine Learning

How to Choose Feature Selection Methods For Machine Learning

How to Choose a Feature Selection Method For Machine Learning

Feature selection is the process of reducing the number of input variables when developing a predictive model. It is desirable to reduce the number of input variables to both reduce the computational cost of modeling and, in some cases, to improve the performance of the model. Feature-based feature selection methods involve evaluating the relationship between […]

Continue Reading 57
Bar Chart of the Input Features (x) vs The Chi Squared Feature Importance (y)

How to Perform Feature Selection with Categorical Data

Feature selection is the process of identifying and selecting a subset of input features that are most relevant to the target variable. Feature selection is often straightforward when working with real-valued data, such as using the Pearson’s correlation coefficient, but can be challenging when working with categorical data. The two most commonly used feature selection […]

Continue Reading 32
Boxplot of top 10 Spot-Checking Algorithms on a Classification Problem

How to Develop a Framework to Spot-Check Machine Learning Algorithms in Python

Spot-checking algorithms is a technique in applied machine learning designed to quickly and objectively provide a first set of results on a new predictive modeling problem. Unlike grid searching and other types of algorithm tuning that seek the optimal algorithm or optimal configuration for an algorithm, spot-checking is intended to evaluate a diverse set of […]

Continue Reading 18
Scatter plot of Moons Test Classification Problem

How to Generate Test Datasets in Python with scikit-learn

Test datasets are small contrived datasets that let you test a machine learning algorithm or test harness. The data from test datasets have well-defined properties, such as linearly or non-linearity, that allow you to explore specific algorithm behavior. The scikit-learn Python library provides a suite of functions for generating samples from configurable test problems for […]

Continue Reading 30