# Linear Regression in R

Last Updated on

In this post you will discover 4 recipes for linear regression for the R platform.

You can copy and paste the recipes in this post to make a jump-start on your own problem or to learn and practice with linear regression in R.

Discover how to prepare data, fit machine learning models and evaluate their predictions in R with my new book, including 14 step-by-step tutorials, 3 projects, and full source code.

Let’s get started.

Each example in this post uses the longley dataset provided in the datasets package that comes with R. The longley dataset describes 7 economic variables observed from 1947 to 1962 used to predict the number of people employed yearly.

## Ordinary Least Squares Regression

Ordinary Least Squares (OLS) regression is a linear model that seeks to find a set of coefficients for a line/hyper-plane that minimise the sum of the squared errors.

### Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

## Stepwize Linear Regression

Stepwise Linear Regression is a method that makes use of linear regression to discover which subset of attributes in the dataset result in the best performing model. It is step-wise because each iteration of the method makes a change to the set of attributes and creates a model to evaluate the performance of the set.

## Principal Component Regression

Principal Component Regression (PCR) creates a linear regression model using the outputs of a Principal Component Analysis (PCA) to estimate the coefficients of the model. PCR is useful when the data has highly correlated predictors.

## Partial Least Squares Regression

Partial Least Squares (PLS) Regression creates a linear model of the data in a transformed projection of problem space. Like PCR, PLS is appropriate for data with highly-correlated predictors.

## Summary

In this post you discovered 4 recipes for creating linear regression models in R and making predictions using those models.

Chapter 6 of Applied Predictive Modeling by Kuhn and Johnson provides an excellent introduction to linear regression with R for beginners. Practical Regression and Anova using R (PDF) by Faraway provides a more in-depth treatment.

## Discover Faster Machine Learning in R! #### Develop Your Own Models in Minutes

...with just a few lines of R code

Discover how in my new Ebook:
Machine Learning Mastery With R

Covers self-study tutorials and end-to-end projects like:

### 8 Responses to Linear Regression in R

1. Piyush R January 1, 2015 at 12:33 pm #

Is there any reason why SLR is not performing better than OLSR? again, the question comes how to judge ‘better’. I notice that rmse(SLR)>rmse(OLSR). If we are aiming for a better fit shouldn’t the rmse decrease?

2. Sam October 2, 2015 at 6:28 am #

I think the rmse calculation step missed taking the square root.

rmse <- sqrt( mean((longley\$Employed – predictions)^2) )

• anika January 22, 2016 at 5:38 pm #

Yes thank you, this was something I noticed as well and was unsure of.

• Sibanjan May 17, 2017 at 7:00 pm #

Jason calculated mse not rmse. So there is no square root.

3. ML December 15, 2017 at 12:59 pm #

Anyone , please confirm/oppose my understanding. Are following attributes found best by “Stepwize Linear Regression” algorithm?

GNP + Unemployed + Armed.Forces + Year

• Jason Brownlee December 15, 2017 at 3:33 pm #

We cannot know what algorithm will be best for your data, try a suite of methods and compare the skill of each.

• ML December 16, 2017 at 3:55 am #

Sorry, I was not clear. I am not asking which algorithm is best. I am simply running the example code above of ““Stepwize Linear Regression” . Then I am trying to understand the what gets printed on screen. (Thus the data is “longley”. )
I see that three steps/iterations are performed.
My understanding:
1. After the first iteration “GNP.deflator” is discarded because its AIC was lowest(-35.163)
2. After the second iteration “Population” is discarded because its AIC was lowest(-36.799)
3. After 3rd step GNP should be discarded. ( Its AIC is -31.879)
Is my understanding correct?

• Jason Brownlee December 16, 2017 at 5:35 am #

Ah, I see. The result is the variables suggested or chosen for the final model.