Data preparation can make or break the predictive ability of your model. In Chapter 3 of their book Applied Predictive Modeling, Kuhn and Johnson introduce the process of data preparation. They refer to it as the addition, deletion or transformation of training set data. In this post you will discover the data pre-process steps that […]
Archive | Data Preparation
Rescaling Data for Machine Learning in Python with Scikit-Learn
Your data must be prepared before you can build models. The data preparation process can involve three steps: data selection, data preprocessing and data transformation. In this post you will discover two simple data transformation methods you can apply to your data in Python using scikit-learn. Let’s get started. Update: See this post for a […]
Data Cleaning: Turn Messy Data into Tidy Data
Data preparation is difficult because the process is not objective, or at least it does not feel that way. Questions like “what is the best form of the data to describe the problem?” are not objective. You have to think from the perspective of the problem you want to solve and try a few different […]
How to Identify Outliers in your Data
Bojan Miletic asked a question about outlier detection in datasets when working with machine learning algorithms. This post is in answer to his question. If you have a question about machine learning, sign-up to the newsletter and reply to an email or use the contact form and ask, I will answer your question and may […]
How to Prepare Data For Machine Learning
Machine learning algorithms learn from data. It is critical that you feed them the right data for the problem you want to solve. Even if you have good data, you need to make sure that it is in a useful scale, format and even that meaningful features are included. In this post you will learn […]