Preparing data is required to get the best results from machine learning algorithms. In this post you will discover how to transform your data in order to best expose its structure to machine learning algorithms in R using the caret package. You will work through 8 popular and powerful data transforms with recipes that you can […]
Search results for "Principal Component"
Better Understand Your Data in R Using Visualization (10 recipes you can use today)
You must understand your data to get the best results from machine learning algorithms. Data visualization is perhaps the fastest and most useful way to summarize and learn more about your data. In this post you will discover exactly how you can use data visualization to better understand or data for machine learning using R. […]
Use Random Forest: Testing 179 Classifiers on 121 Datasets
If you don’t know what algorithm to use on your problem, try a few. Alternatively, you could just try Random Forest and maybe a Gaussian SVM. In a recent study these two algorithms were demonstrated to be the most effective when raced against nearly 200 other algorithms averaged over more than 100 data sets. In […]
An Introduction to Feature Selection
Which features should you use to create a predictive model? This is a difficult question that may require deep knowledge of the problem domain. It is possible to automatically select those features in your data that are most useful or most relevant for the problem you are working on. This is a process called feature […]
Discover Feature Engineering, How to Engineer Features and How to Get Good at It
Feature engineering is an informal topic, but one that is absolutely known and agreed to be key to success in applied machine learning. In creating this guide I went wide and deep and synthesized all of the material I could. You will discover what feature engineering is, what problem it solves, why it matters, how […]
How To Get Started With Machine Learning Algorithms in R
R is the most popular platform for applied machine learning. When you want to get serious with applied machine learning you will find your way into R. It is very powerful because so many machine learning algorithms are provided. A problem is that the algorithms are all provided by third parties, which makes their usage […]
Improve Model Accuracy with Data Pre-Processing
Data preparation can make or break the predictive ability of your model. In Chapter 3 of their book Applied Predictive Modeling, Kuhn and Johnson introduce the process of data preparation. They refer to it as the addition, deletion or transformation of training set data. In this post you will discover the data pre-process steps that […]
Linear Regression in R
In this post you will discover 4 recipes for linear regression for the R platform. You can copy and paste the recipes in this post to make a jump-start on your own problem or to learn and practice with linear regression in R. Let’s get started. Each example in this post uses the longley dataset […]
Data Science Screencasts: A Data Origami Review
Data Origami is a new website by Cameron Davidson-Pilon that provides data science screencasts. It is a cool idea and a cool site. Cameron was kind enough to give me access to the site so that I could review it. I watched all of the videos I could and wrote up all my notes, and […]
A Gentle Introduction to Scikit-Learn: A Python Machine Learning Library
If you are a Python programmer or you are looking for a robust library you can use to bring machine learning into a production system then a library that you will want to seriously consider is scikit-learn. In this post you will get an overview of the scikit-learn library and useful references of where you […]