Archive | Data Preparation

How to Choose Data Preparation Methods for Machine Learning

By Jason Brownlee on July 15, 2020 in Data Preparation 6

Data preparation is an important part of a predictive modeling project. Correct application of data preparation will transform raw data into a representation that allows learning algorithms to get the most out of the data and make skillful predictions. The problem is choosing a transform or sequence of transforms that results in a useful representation […]

Feature Engineering for Machine Learning

8 Top Books on Data Cleaning and Feature Engineering

By Jason Brownlee on June 30, 2020 in Data Preparation 21

Data preparation is the transformation of raw data into a form that is more appropriate for modeling. It is a challenging topic to discuss as the data differs in form, type, and structure from project to project. Nevertheless, there are common data preparation tasks across projects. It is a huge field of study and goes […]

Data Preparation for Machine Learning (7-Day Mini-Course)

By Jason Brownlee on June 30, 2020 in Data Preparation 276

Data Preparation for Machine Learning Crash Course. Get on top of data preparation with Python in 7 days. Data preparation involves transforming raw data into a form that is more appropriate for modeling. Preparing data may be the most important part of a predictive modeling project and the most time-consuming, although it seems to be […]

Feature Engineering and Selection (Book Review)

By Jason Brownlee on June 30, 2020 in Data Preparation 20

Data preparation is the process of transforming raw data into learning algorithms. In some cases, data preparation is a required step in order to provide the data to an algorithm in its required input format. In other cases, the most appropriate representation of the input data is not known and must be explored in a […]

Box and Whisker Plot of Imputation Number of Neighbors for the Horse Colic Dataset

kNN Imputation for Missing Values in Machine Learning

By Jason Brownlee on August 17, 2020 in Data Preparation 50

Datasets may have missing values, and this can cause problems for many machine learning algorithms. As such, it is good practice to identify and replace missing values for each column in your input data prior to modeling your prediction task. This is called missing data imputation, or imputing for short. A popular approach to missing […]

How to Avoid Data Leakage When Performing Data Preparation

By Jason Brownlee on August 17, 2020 in Data Preparation 87

Data preparation is the process of transforming raw data into a form that is appropriate for modeling. A naive approach to preparing data applies the transform on the entire dataset before evaluating the performance of the model. This results in a problem referred to as data leakage, where knowledge of the hold-out test set leaks […]

Tour of Data Preparation Techniques for Machine Learning

By Jason Brownlee on June 30, 2020 in Data Preparation 38

Predictive modeling machine learning projects, such as classification and regression, always involve some form of data preparation. The specific data preparation required for a dataset depends on the specifics of the data, such as the variable types, as well as the algorithms that will be used to model them that may impose expectations or requirements […]

What Is Data Preparation in a Machine Learning Project

By Jason Brownlee on June 30, 2020 in Data Preparation 12

Data preparation may be one of the most difficult steps in any machine learning project. The reason is that each dataset is different and highly specific to the project. Nevertheless, there are enough commonalities across predictive modeling projects that we can define a loose sequence of steps and subtasks that you are likely to perform. […]

Why Data Preparation Is So Important in Machine Learning

By Jason Brownlee on June 30, 2020 in Data Preparation 7

On a predictive modeling project, machine learning algorithms learn a mapping from input variables to a target variable. The most common form of predictive modeling project involves so-called structured data or tabular data. This is data as it looks in a spreadsheet or a matrix, with rows of examples and columns of features for each […]

Ordinal and One-Hot Encoding Transforms for Machine Learning

Ordinal and One-Hot Encodings for Categorical Data

By Jason Brownlee on August 17, 2020 in Data Preparation 80

Machine learning models require all input and output variables to be numeric. This means that if your data contains categorical data, you must encode it to numbers before you can fit and evaluate a model. The two most popular techniques are an Ordinal Encoding and a One-Hot Encoding. In this tutorial, you will discover how […]

← Previous 1 2 3 … 6 Next →