Archive | Data Preparation

How to Choose Feature Selection Methods For Machine Learning

How to Choose a Feature Selection Method For Machine Learning

Feature selection is the process of reducing the number of input variables when developing a predictive model. It is desirable to reduce the number of input variables to both reduce the computational cost of modeling and, in some cases, to improve the performance of the model. Statistical-based feature selection methods involve evaluating the relationship between […]

Continue Reading
Bar Chart of the Input Features (x) vs The Chi Squared Feature Importance (y)

How to Perform Feature Selection with Categorical Data

Feature selection is the process of identifying and selecting a subset of input features that are most relevant to the target variable. Feature selection is often straightforward when working with real-valued data, such as using the Pearson’s correlation coefficient, but can be challenging when working with categorical data. The two most commonly used feature selection […]

Continue Reading
Why One-Hot Encode Data in Machine Learning?

Why One-Hot Encode Data in Machine Learning?

Getting started in applied machine learning can be difficult, especially when working with real-world data. Often, machine learning tutorials will recommend or require that you prepare your data in specific ways before fitting a machine learning model. One good example is to use a one-hot encoding on categorical data. Why is a one-hot encoding required? […]

Continue Reading
How to Handle Missing Values with Python

How to Handle Missing Data with Python

Real-world data often has missing values. Data can have missing values due to unrecorded observations, incorrect or inconsistent data entry, and more. Many machine learning algorithms do not support data with missing values. So handling missing data is important for accurate data analysis and building robust models. In this tutorial, you will learn how to […]

Continue Reading
Data Leakage in Machine Learning

Data Leakage in Machine Learning

Data leakage is a big problem in machine learning when developing predictive models. Data leakage is when information from outside the training dataset is used to create the model. In this post you will discover the problem of data leakage in predictive modeling. After reading this post you will know: What is data leakage is […]

Continue Reading
feature selection

An Introduction to Feature Selection

Which features should you use to create a predictive model? This is a difficult question that may require deep knowledge of the problem domain. It is possible to automatically select those features in your data that are most useful or most relevant for the problem you are working on. This is a process called feature […]

Continue Reading