Last Updated on December 13, 2019
How do you get started with machine learning in R?
R is a large and complex platform. It is also the most popular platform for the best data scientists in the world.
In this post you will discover the step-by-step process that you can use to get started using machine learning for predictive modeling on the R platform.
The steps are practical and so simple that you could be able to build accurate predictive models after one weekend.
The process does assume that you are a developer, know a little machine learning and will actually do the work, but the process does deliver results.
Kick-start your project with my new book Machine Learning Mastery With R, including step-by-step tutorials and the R source code files for all examples.
Let’s get started.
Learn R The Wrong Way
Here is how I DON’T think you should study machine learning in R.
- Step 1: Get really good at R programming and R syntax.
- Step 2: Know the deep theory of every possible algorithm you could use in R.
- Step 3: Study to great detail how to use each machine learning algorithm in R.
- Step 4: Only lightly touch on how to evaluate models.
I think this is the wrong way.
- It teaches you that you need to spend all your time learning how to use individual machine learning algorithms.
- It does not teach you the process of building predictive machine learning models in R that you can actually use in practice to make predictions.
Sadly, this is the approach used to teach machine learning in R that I see in almost all books and online courses on the topic.
You don’t want to be a badass at R or even at machine learning algorithms in R. You want to be a badass at building accurate predictive models using R. This is the context.
You can take time to learn individual machine learning algorithms in great detail, so long as it aids you in building more accurate predictive models, more reliably.
Need more Help with R for Machine Learning?
Take my free 14-day email course and discover how to use R on your project (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Good Background For Machine Learning in R
You can just dive into R. Go for it.
In my opinion though, I think you will get a lot more out of it if you have some background.
R is an advanced platform and you can get a lot out of it as a beginner. But, if you have a little machine learning and a little programming as a foundation, R will become a superpower for building accurate predictive models very quickly.
Here are some suggestions for getting the most out of getting started with machine learning in R. I think these are reasonable for a modern developer interested in machine learning.
A developer who knows how to program. This helps because it won’t be a big deal to pick up the syntax of R, which at times can be a little odd. It is also helpful to know who to whip up scripts or script-lets (mini scripts) to do this or that task. R is a programming language after all.
Interested in predictive modeling machine learning. Machine learning is a big field that covered a variety of interesting algorithms. Predictive modeling is a subset that is only concerned with building models that make predictions on new data. Not explaining the relationships between data, nor learning from data in general. I predictive modeling is where R really shines as a platform for machine learning.
Familiar with machine learning basics. You understand machine learning as induction problem where all algorithms are really just trying to estimate and underlying mapping function from an input space to an output space. All predictive machine learning makes sense through this lens as do strategies of searching for good and best machine learning algorithms, algorithm parameters and data transforms.
The approach I layout in the next section also makes some assumptions about your background.
You are not an absolute beginner in machine learning. You could be, and the approach may work for you, but the you will get a lot more out of it if you have some additional suggested background.
You want to use a top-down approach to studying machine learning. This is the approach I teach where rather than starting with theory and principles and eventually touch in practical machine learning if there is time, that you start with the goal of working through a project end-to-end and research details as you need them in order to deliver better results.
You are familiar with the steps in a predictive modeling machine learning project. Specifically:
- Define Problem
- Prepare Data
- Evaluate Algorithms
- Improve Results
- Present Results
You can learn more about this process and these steps here:
- How to Use a Machine Learning Checklist to Get Accurate Predictions, Reliably (even if you are a beginner)
- Process for working through Machine Learning Problems
You are at least familiar with some machine learning algorithms. Or you may know how to pick them up quickly, for example using the algorithm description template method. I think learning the details of how and why machine learning algorithms is a separate task from learning how to use those algorithms on a machine learning platform like R. They are often conflated in books and course at the determinant of learning.
You can learn more about how to learn any machine learning algorithm using the template method here:
- How to Learn a Machine Learning Algorithm
- 5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics
How To Learn Machine Learning in R
This section lays out a process that you can use to get started with building machine learning predictive models on the R platform.
It is divided into two parts:
- Map the tasks of a machine learning project onto the R platform.
- Work through predictive modeling projects using standard datasets.
1. Map Machine Tasks Onto R
You need to know how to do specific tasks of a machine learning on the R platform. Once you know how to complete a discrete task using the platform and get a result reliably, you can do it again and again on project after project.
This process is straightforward:
- List out all of the discrete tasks of a predictive modeling machine learning project.
- Create recipes to complete the task reliably that you can copy-paste as a starting point on future projects.
- Add to and maintain the recipes are your understanding of the platform and machine learning improves.
Predictive Modeling Tasks
Below is a minimum list of predictive modeling tasks you may want to map to R the R platform and create recipes. This not complete, but does cover the broad strokes of the platform:
- Overview of R syntax
- Prepare Data
- Loading Data
- Working With Data
- Data Summarization
- Data Visualization
- Data Cleaning
- Feature Selection
- Data Transforms
- Evaluate Algorithms
- Resampling Methods
- Evaluation Metrics
- Spot-Check Algorithms
- Model Selection
- Improve Results
- Algorithm Tuning
- Ensemble Methods
- Present Results
- Finalize Model
- Make New Predictions
You will notice the first task is an overview of R syntax. As a developer, you need to know the basics of the language before you can do anything. Such as assignment, data structures, flow control and creating and calling functions.
Library of Standalone Recipes
I recommend creating recipes that are standalone. That means that each recipe is a complete program that has everything it needs to achieve the task and produce an output. This means that you can copy it directly into a future predictive modeling project.
You can store the recipes in a directory or on GitHub.
2. Small Predictive Modeling Projects
Recipes for common predictive modeling tasks with machine learning are not enough.
Again, this is where most books and courses stop. They leave it to you to piece together the recipes into end-to-end projects.
You need to piece the recipes together into end-to-end projects. This will teach and show you how to actually deliver a result using the platform. I recommend only using small well understood machine learning datasets from the UCI Machine learning repository.
These datasets are available for free as CSV downloads, and most are available directly in R by loading third party libraries. These datasets are excellent for practicing because:
- They are small, meaning they fit into memory and algorithms can model them in reasonable time.
- They are well behaved, meaning you often don’t need to do a lot of feature engineering to get a good result.
- There are standards, meaning that many people have used them before and you can get ideas of good algorithms to try and good results you should expect.
I recommend at least three projects:
- Hello World Project (iris flowers). This is a quick pass through the project steps without much tuning or optimizing on a dataset that is widely used as the hello world of machine learning (more on the iris flowers dataset).
- Binary Classification end-to-end. Work through each step on a binary classification problem (e.g. the Pima Indians diabetes dataset (csv file)).
- Regression end-to-end. Work through each step of the process with a regression problem (e.g. the Boston housing dataset).
Add and Maintain Recipes
Machine learning with R does not stop at working through a few small standard datasets. You need to take on more and different challenges.
- Standard Datasets: You could practice on additional standard datasets from the UCI Machine Learning repository, overcoming the challenges of different problem types.
- Competition Datasets: You could try working through some more challenging datasets, such as those from past Kaggle competitions or those from past KDDCup challenges.
- Your Own Projects: Ideally, you need to start working through your own projects.
All the while you will be dipping into help, adapting your scripts and learning how to get more out of machine learning on R.
It is important that you fold this knowledge back into your catalog of machine learning recipes. This will let you leverage this knowledge quickly on new projects and contribute greatly to your skill and speed at developing predictive models.
Your Outcomes From This Process
You could work through this process in one weekend. By the end of that weekend, you will have the recipes and project templates that you can use to start modeling your own problems using machine learning in R.
You will go from a developer that is interested in machine learning on R to a developer who has the resources and capability to work through a new dataset end-to-end using R and develop a predictive model to be presented and deployed.
Specifically, you will know:
- How to achieve the subtasks of a predictive modeling problem in R.
- How to learn new and different sub tasks in R.
- How to get help with R.
- How to work through a small to medium sized dataset end-to-end.
- How to deliver a model that can make predictions on new unseen data.
From here you can start to dive into the specifics of the functions, techniques and algorithms used with the goal of learning how to use them better in order to deliver more accurate predictive models, more reliably in less time.
In this post you discovered a step-by-step process that you can use to study and get started with machine learning in R.
The three high-level steps of the process are:
- Map the steps of a predictive modeling process onto the R platform with recipes that you can reuse.
- Work through small standard machine learning datasets to piece the recipes together into projects.
- Work through more and different datasets, ideally your own, and add to your library of recipes.
You also discovered he philosophy behind the process and the reasons why this process is the best process for you.
Do you want to get started in machine learning with R?
- Download and install R right now.
- Use the process outline above, limit yourself to one weekend and go as far as you can.
- Report back. Leave a comment. I would love to hear how you went.
Do you have a question about this process? Leave a comment, I’ll do my best to answer it.
What book do you recommend for learning maths behind machine learning? What book did you read?
Is it possible to survive in ml without mathematical knowledge?
Yes, just focus on delivering results.
The math makes sense after you have context, e.g. you know how to work through problems end to end.
A good book is the elements of statistical learning.