R is a powerful platform for data analysis and machine learning. It is my main workhorse for things like competitions and consulting work. The reason is the large amounts of powerful algorithms available, all on the one platform.
In this post I want to point out some resources you can use to get started in R for machine learning.
Introduction to R
You might want to familiarize yourself with the platform and language before you start addressing your problems with machine learning.
I think the best way to familiarize yourself is to start addressing problems. The trial of real work will force you to learn what you must learn to solve your problem.A good reference can help you answer your “how do I…” questions.
R in a Nutshell
The book I read cover-to-cover when first starting out with R was R in a Nutshell. It walks you through the platform from installation, basic operations, data analysis and even some machine learning algorithms. I highly recommend it.
I chose it because it was a broad reference. I wanted to know a little bit of everything in the platform, so I would know where to look when I had a specific question.
There is a wealth of machine learning algorithms implemented in R, many by the academics and their teams that actually developed them in the first place. This alone is a complelling reason to get started in R. Additionally, the data handling/manipulation and graphing tools are very powerful (although Python’s SciPy stack is catching up).
CRAN: Machine Learning and Statistical Learning
Not a book, but a great place you can start out is the Machine Learning and Statistical Learning view on CRAN maintained by Torsten Hothorn. It lists most of the R packages you can use for machine learning, grouped by algorithm and algorithm types.
It is a great place to start, but one thing that I think it could do better is point out canonical packages and to elaborate more on some of the wrapper packages available like caret.
Applied Predictive Modeling
Max Kuhn, an author to this book is the creator of the famous caret package. Applied Predictive Modeling is very practical and opens in the first part with a description of predictive analytics process and case studies. Parts 2 and 3 look at regression and classification algorithms and the final Part covers more advanced topics like feature selection.
An Introduction to Statistical Learning: with Applications in R
This is the more accessible version of the classic “The Elements of Statistical Learning: Data Mining, Inference, and Prediction” and includes two of the same authors.
An Introduction to Statistical Learning opens with an introduction to Statistical Learning and concerns such as model accuracy and the bias-variance tradeoff. Chapters 3 and 4 looks at linear regression and some simpler classification algorithms. Following chapters look at cross validation, model selection before moving into non-linear regression, decision trees, SVM and finishing up with unsupervised methods.
The book is also available online for free from the authors webpage.
Practical Data Science with R
Practical Data Science with R has more of a data science spin than machine learning. Part 1 is introductory looking at loading data into R. Part 2 starts off with model evaluation and works through models in increasing complexity through k-NN, Naive Bayes, Linear Regression, clustering, association rules and SVM. Part 3 works through advanced issues like self-documenting scripts and presenting results.
Provides a good introduction with solid practical advice.
Machine Learning with R
Machine Learning with R provides an overview of machine learning in R without going into detail or theory. It also heavily uses case studies to demonstrate each algorithm. It opens with a brief introduction to machine learning and R and in data management in R. It goes on in subsequent chapters to cover k-NN, Naive Bayes, Decision Trees, Regression, Neural Networks, Apriori, and Clustering.
It finishes up with chapters on model evaluation, algorithm tuning and other advanced topics. A good feature of this text are the step-by-step sequences provided in each chapter, providing an actionable framework around the case studies.
Data Mining with R: Learning with Case Studies
After a quick introduction to R in the first chapter, Data Mining with R presents case study after case study. These include: predicting algae blooms, stock market returns, fraudulent transactions and classifying microarray samples. Each study explores various different data preparation, model building and model evaluation methods.
It’s a dense by valuable book if you’re looking for getting a feel for working through real problems.
Data Mining and Business Analytics with R
Data Mining and Business Analytics with R provides worked examples using R, but the examples are more business focused than scientifically focused, as in some other books. The chapters work through the key machine learning methods using R with smaller case studies throughout. The book finishes with some larger case studies on sentiment analysis in text and modeling network data.
Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R!)
Data Mining with Rattle and R provides an introduction to machine learning algorithms, although the twist is that uses the Rattle graphical environment. After the introductory material on loading and handling data in part 1, the standard machine learning algorithms are covered in part 2.
What I do like about the presentation of the algorithms is the standardized description that includes a tutorial, parameter tuning and command summary. I’m big fan of the consistent structured presentation of algorithms.
We have coved 7 popular machine learning books that focus on using the R platform.
The best advice I can give is to pick one and read it. Read it cover to cover, take notes and do the exercises. Like programming, using R is a practical skill that you can only build by practicing. Practice machine learning R.
Have I missed a Machine Learning book on R? Leave a comment and let me know.