Archive | Data Preparation

Data Leakage in Machine Learning

Data Leakage in Machine Learning

Data leakage is a big problem in machine learning when developing predictive models. Data leakage is when information from outside the training dataset is used to create the model. In this post you will discover the problem of data leakage in predictive modeling. After reading this post you will know: What is data leakage is […]

Continue Reading 73
feature selection

An Introduction to Feature Selection

Which features should you use to create a predictive model? This is a difficult question that may require deep knowledge of the problem domain. It is possible to automatically select those features in your data that are most useful or most relevant for the problem you are working on. This is a process called feature […]

Continue Reading 210
Outlier

How to Identify Outliers in your Data

Bojan Miletic asked a question about outlier detection in datasets when working with machine learning algorithms. This post is in answer to his question. If you have a question about machine learning, sign-up to the newsletter and reply to an email or use the contact form and ask, I will answer your question and may […]

Continue Reading 47