Last Updated on August 22, 2019
R is the most popular platform for applied machine learning. When you want to get serious with applied machine learning you will find your way into R.
It is very powerful because so many machine learning algorithms are provided. A problem is that the algorithms are all provided by third parties, which makes their usage very inconsistent. This slows you down, a lot, because you have to learn how to model data and how to make predicts with each algorithm in each package, again and again.
In this post, you will discover how you can overcome this difficulty with machine learning algorithms in R, with pre-prepared recipes that follow a consistent structure.
Kick-start your project with my new book Machine Learning Mastery With R, including step-by-step tutorials and the R source code files for all examples.
Let’s get started.
Lots of Algorithms, Little Consistency
The R ecosystem is enormous. Open source third party packages provide this power, allowing academics and professionals to get the most powerful algorithms available into the hands of us practitioners.
A problem that I experienced when starting out with R was that the usage to each algorithm differs from package to package. This inconsistency also extends to the documentation, with some providing worked example for classification but ignoring regression and others not providing examples at all.
All this means that if you want to try a few different algorithms from different packages, you must spend time figuring out how to fit and make predictions with each method in turn. This takes a lot of time, especially with the spotty examples and vignettes.
I summarize these difficulties as follows:
- Inconsistent: Algorithm implementations vary in the way a model is fit to data and the way a model is used to generate predictions. This means that you have to study each package and each algorithm implementation just to put together a working example, let alone adapt it to your problem.
- Decentralized: Algorithm are implemented across different packages and it can be hard to locate which packages provide an implementation of the algorithm you need, let alone which package provides the most popular implementation. Additionally, the documentation for one package may be spread across multiple help files, website and vignettes. This means you have to do a lot of searching just to locate an algorithm, let alone compile a list of algorithms from which you can choose.
- Incomplete: Algorithm documentation is almost always partially complete. An example usage may or may not be provided, if it is, it may or may not be demonstrated on a canonical problem. This means you have no obvious way to quickly understand how to use an implementation.
- Complexity: Algorithms vary in their complexity of implementation and description. This can take it’s toll on you as you jump from package to package. You want to focus on how to get the most from the algorithm and its parameters, and not burn energy on parsing reams of PDFs just to get a hello world.
Need more Help with R for Machine Learning?
Take my free 14-day email course and discover how to use R on your project (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Build an Algorithm Recipe Book
You could get a lot more done if you had an algorithm recipe book you could look up and find examples of machine learning algorithms in R that you could copy-and-paste and adapt for your specific problem.
For this the recipe book approach to work, it would have to confirm to some key principles:
- Standalone: Each code example must be standalone, complete and ready to execute.
- Just Code: Each recipe must focuses on the code with minimal exposition on machine learning theory (there are amazing books for that, don’t mix these concerns).
- Simplicity: Each recipe must be presented in the most common use case, which is probably what you are looking to do when you look it up. You want to consult the official documentation only to look up the parameters so that you can get the most from the algorithm.
- Portable: All recipes must be provided in a single reference that can be searched and printed, browsed and looked up (a recipe book).
- Consistent: All code examples are presented consistently and follow the same code structure and style conventions (load data, fit model, make prediction).
An algorithm recipe book would give you the ability to wield the R platform for machine learning and solve complex problems.
- You could apply algorithms and features directly.
- You could discover the code you need.
- You could understand what is going on with a glance.
- You could own the recipes and use and organize them the way you want.
- You could get the most out of the algorithms and features.
Algorithm Recipes in R
I have already blocked out examples of what these recipes could look like.
I have provided example machine learning recipes in R, grouped by algorithm type or similarity, as follows:
- Linear Regression: Ordinary Least Squares Regression, Stepwise Regression, Principal Component Regression and Partial Least Squares Regression.
- Penalized-Linear Regression: Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO) and ElasticNet
- Non-Linear Regression: Multivariate Adaptive Regression Spines (MARS), Support Vector Machine (SVM), k-Nearest Neighbor (kNN) and Neural Network.
- Non-Linear Decision Tree Regression: Classification and Regression Trees (CART), Conditional Decision Trees, Modal Trees, Rule Systems, Bagging CART, Random Forest, Gradient Boosted Machines (GBM) and Cubist.
- Linear Classification: Logistic Regression, Linear Discriminant Analysis (LDA) and Partial Least Squares Discriminant Analysis.
- Non-Linear Classification: Mixture Discriminant Analysis (MDA), Quadratic Discriminant Analysis (QDA), Regularized Discriminant Analysis (RDA), Neural Network, Flexible Discriminant Analysis (FDA), Support Vector Machine (SVM), k-Nearest Neighbor (kNN) and Naive Bayes.
- Non-Linear Decision Tree Classification: Classification and Regression Trees (CART), C4.5, PART, Bagging CART, Random Forest, Gradient Boosted Machines (GBM) and Boosted C5.0.
I think these recipes really fit the bill of this mission.
In this post, you discovered the popularity and power of machine learning in R, but the cost of that power is the time required to harness it.
You discovered that one approach to addressing this limitation in R is to devise a recipe book of complete and standalone machine learning algorithms that you can look up and apply to your specific problems, as needed.
Finally, you saw examples of machine learning algorithm recipes in R for a wide range of algorithm type.
If you found this approach useful, I’d love to hear about it.
Thanks Jason , this article is really useful.
Very helpful for understanding algorithms in R. Thanks.
Dear Jason, As a fellow practitioner, let me say thank you very much! As you well know, it is impossible for people to know everything about everything – so your examples are great for people who understand statistics and just need a brush-up on the syntax of R.
It is one thing to know how an engine theoretically works. It is another to know how to fix the engine and use specific tools with their specific syntax.
So your recommendations are spot on and for a quick-dirty-dive into building models like an onion – you rock! Start simple and add complexity after the basics are implemented.
Thank you for your time and effort. Truly appreciate your hard work.
Very kind of you to say Rob, I’m happy you’re able to put the examples to good use.
For heart disease prediction which machine learning algorithms will correctly suits?
Test a large number of algorithms and see what works best on your specific data.