In this post, you will discover 8 recipes for non-linear regression with decision trees in R.
Each example in this post uses the longley dataset provided in the datasets package that comes with R.
The longley dataset describes 7 economic variables observed from 1947 to 1962 used to predict the number of people employed yearly.
Kick-start your project with my new book Machine Learning Mastery With R, including step-by-step tutorials and the R source code files for all examples.
Let’s get started.

Decision Tree
Photo by Katie Walker, some rights reserved
Classification and Regression Trees
Classification and Regression Trees (CART) split attributes based on values that minimize a loss function, such as sum of squared errors.
The following recipe demonstrates the recursive partitioning decision tree method on the longley dataset.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# load the package library(rpart) # load data data(longley) # fit model fit <- rpart(Employed~., data=longley, control=rpart.control(minsplit=5)) # summarize the fit summary(fit) # make predictions predictions <- predict(fit, longley[,1:6]) # summarize accuracy mse <- mean((longley$Employed - predictions)^2) print(mse) |
Learn more about the rpart function and the rpart package.
Conditional Decision Trees
Condition Decision Trees are created using statistical tests to select split points on attributes rather than a loss function.
The following recipe demonstrates the condition inference trees method on the longley dataset.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# load the package library(party) # load data data(longley) # fit model fit <- ctree(Employed~., data=longley, controls=ctree_control(minsplit=2,minbucket=2,testtype="Univariate")) # summarize the fit summary(fit) # make predictions predictions <- predict(fit, longley[,1:6]) # summarize accuracy mse <- mean((longley$Employed - predictions)^2) print(mse) |
Learn more about the ctree function and the party package.
Need more Help with R for Machine Learning?
Take my free 14-day email course and discover how to use R on your project (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Model Trees
Model Trees create a decision tree and use a linear model at each node to make a prediction rather than using an average value.
The following recipe demonstrates the M5P Model Tree method on the longley dataset.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# load the package library(RWeka) # load data data(longley) # fit model fit <- M5P(Employed~., data=longley) # summarize the fit summary(fit) # make predictions predictions <- predict(fit, longley[,1:6]) # summarize accuracy mse <- mean((longley$Employed - predictions)^2) print(mse) |
Learn more about the M5P function and the RWeka package.
Rule System
Rule Systems can be crated by extracting and simplifying the rules from a decision tree.
The following recipe demonstrates the M5Rules Rule System on the longley dataset.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# load the package library(RWeka) # load data data(longley) # fit model fit <- M5Rules(Employed~., data=longley) # summarize the fit summary(fit) # make predictions predictions <- predict(fit, longley[,1:6]) # summarize accuracy mse <- mean((longley$Employed - predictions)^2) print(mse) |
Learn more about the M5Rules function and the RWeka package.
Bagging CART
Bootstrapped Aggregation (Bagging) is an ensemble method that creates multiple models of the same type from different sub-samples of the same dataset. The predictions from each separate model are combined together to provide a superior result. This approach has shown participially effective for high-variance methods such as decision trees.
The following recipe demonstrates bagging applied to the recursive partitioning decision tree.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# load the package library(ipred) # load data data(longley) # fit model fit <- bagging(Employed~., data=longley, control=rpart.control(minsplit=5)) # summarize the fit summary(fit) # make predictions predictions <- predict(fit, longley[,1:6]) # summarize accuracy mse <- mean((longley$Employed - predictions)^2) print(mse) |
Learn more about the bagging function and the ipred package.
Random Forest
Random Forest is variation on Bagging of decision trees by reducing the attributes available to making a tree at each decision point to a random sub-sample. This further increases the variance of the trees and more trees are required.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# load the package library(randomForest) # load data data(longley) # fit model fit <- randomForest(Employed~., data=longley) # summarize the fit summary(fit) # make predictions predictions <- predict(fit, longley[,1:6]) # summarize accuracy mse <- mean((longley$Employed - predictions)^2) print(mse) |
Learn more about the randomForest function and the randomForest package.
Gradient Boosted Machine
Boosting is an ensemble method developed for classification for reducing bias where models are added to learn the misclassification errors in existing models. It has been generalized and adapted in the form of Gradient Boosted Machines (GBM) for use with CART decision trees for classification and regression.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# load the package library(gbm) # load data data(longley) # fit model fit <- gbm(Employed~., data=longley, distribution="gaussian") # summarize the fit summary(fit) # make predictions predictions <- predict(fit, longley) # summarize accuracy mse <- mean((longley$Employed - predictions)^2) print(mse) |
Learn more about the gbm function and the gbm package.
Cubist
Cubist decision trees are another ensemble method. They are constructed like model trees but involve a boosting-like procedure called committees that re rule-like models.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# load the package library(Cubist) # load data data(longley) # fit model fit <- cubist(longley[,1:6], longley[,7]) # summarize the fit summary(fit) # make predictions predictions <- predict(fit, longley[,1:6]) # summarize accuracy mse <- mean((longley$Employed - predictions)^2) print(mse) |
Learn more about the cubist function and the Cubist package.
Summary
In this post you discovered 8 recipes for decision trees for non-linear regression in R. Each recipe is ready for you to copy-and-paste into your own workspace and modify for your needs.
For more information see Chapter 8 of Applied Predictive Modeling by Kuhn and Johnson that provides an excellent introduction to non-linear regression with decision trees with R for beginners.
please tell me about Genetic algorithm code in R as above u mansion.
Thank a lot for the these guide.
You’re welcome!
Should RMSE here be sqrt(mean((actual-predicted)^2))?
Hi! I would just like to ask if what decision tree is best for use when the data is highly quantitative? For example, a weather data set. Thanks!
Great guide & website!
There’s a tiny typo in this sentence (crated -> created):
“Rule Systems can be crated by extracting and simplifying the rules from a decision tree.”
I have certain clarification in ranking the decision trees. I have certain features. X1, X2, X3 which can be labelled to Y. After building the model, I have to predict the ranking of trees based on feature X1. Can you please suggest some good methods for this ?
Thank you so much 🙂
Short, but very useful and comprehensive
But this is not nonlinear regression – if I’m not mistaken, isn’t this just multiple linear modelling?
What is the connection between tree-methods (including RF, boosting, etc) and actual nonlinear regression such as Michaelis-Menton model used in enzyme kinetics, models used in PK/PD modelling, nonlinear synergy models?
Look forward to hearing your thoughts.
“Will May 15, 2015 at 11:33 am #
Shouldn’t RMSE here be:
sqrt(mean((actual-predicted)^2))?”
Will’s comment (above this one),
is absolutely correct!
Too bad Author (of this otherwise great article),
has not answered and corrected this mistake
in the rmse calculation…
Using the correct rmse formula (above),
returns a completely different rmse value…
Be aware…
Hey Jason, many thanks for your examples. How would you deal with the fact that most of these models do not generalize well to new validation data ? – using the option subset = in your examples. Only rule systems and model trees seem to generalize correctly to my data:
data <- cbind(1:10000, -1:-10000,c(2:10001)+runif(10000,min=0,max=0.1))
data <- cbind(1:10000, c(-1:-10000)+runif(10000,min=0,max=0.1))
data <- cbind(data, 1000*c(1:10000)/10*sin(data[,1])+(data[,2]^2)/10+runif(10000,min=0,max=0.1)/100 ) #
colnames(data) <- c("x1","x2","y")
It is really data dependent.
If you find a subset of methods that work well on your data, then double down on them.
Hey Jason,
Great Guide and thanks for making us more awesome.
Quick question: How do I extract rules/path from random forest tree in R for predicted rows?
Thanks
Thanks.
I’m not sure that this would be tractable given the vast number of individual decisions.
thanks a lot
You’re welcome.