[New Book] Click to get The Beginner's Guide to Data Science!
Use the offer code 20offearlybird to get 20% off. Hurry, sale ends soon!

Penalized Regression in R

In this post you will discover 3 recipes for penalized regression for the R platform.

You can copy and paste the recipes in this post to make a jump-start on your own problem or to learn and practice with linear regression in R.

Kick-start your project with my new book Machine Learning Mastery With R, including step-by-step tutorials and the R source code files for all examples.

Let’s get started.

Penalized Regression

Penalized Regression
Photo by Bay Area Bias, some rights reserved

Each example in this post uses the longley dataset provided in the datasets package that comes with R. The longley dataset describes 7 economic variables observed from 1947 to 1962 used to predict the number of people employed yearly.

Ridge Regression

Ridge Regression creates a linear regression model that is penalized with the L2-norm which is the sum of the squared coefficients. This has the effect of shrinking the coefficient values (and the complexity of the model) allowing some coefficients with minor contribution to the response to get close to zero.

Learn about the glmnet function in the glmnet package.

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Least Absolute Shrinkage and Selection Operator

Least Absolute Shrinkage and Selection Operator (LASSO) creates a regression model that is penalized with the L1-norm which is the sum of the absolute coefficients. This has the effect of shrinking coefficient values (and the complexity of the model), allowing some with a minor effect to the response to become zero.

Learn about the lars function in the lars package.

Elastic Net

Elastic Net creates a regression model that is penalized with both the L1-norm and L2-norm. This has the effect of effectively shrinking coefficients (as in ridge regression) and setting some coefficients to zero (as in LASSO).

Learn about the glmnet function in the glmnet package.

Summary

In this post you discovered 3 recipes for penalized regression in R.

Penalization is a powerful method for attribute selection and improving the accuracy of predictive models. For more information see Chapter 6 of Applied Predictive Modeling by Kuhn and Johnson that provides an excellent introduction to linear regression with R for beginners.

Discover Faster Machine Learning in R!

Master Machine Learning With R

Develop Your Own Models in Minutes

...with just a few lines of R code

Discover how in my new Ebook:
Machine Learning Mastery With R

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more...

Finally Bring Machine Learning To Your Own Projects

Skip the Academics. Just Results.

See What's Inside

29 Responses to Penalized Regression in R

  1. Avatar
    Hrvoje July 25, 2014 at 10:53 pm #

    Nice article, but first and third code are the same 🙂 What’s the difference?

    • Avatar
      jasonb July 26, 2014 at 7:40 am #

      Almost. Note the value of alpha (the elastic net mixing parameter).
      A great thing about the glmnet function is that it can do ridge, lasso and a hybrid of both. In the first example, we have used glmnet with an alpha of 0 which results in ridge regression (only L2). If alpha was set to 1 it would be lasso (only L1). Note in the third example that alpha is set to 0.5, this is the elastic net mixture of L1 and L2 at a 50% mixing.
      I hope that is clearer.

      • Avatar
        Hrvoje July 26, 2014 at 11:47 pm #

        It’ s much clearer now. Tnx a lot 🙂

  2. Avatar
    TropoSco August 2, 2014 at 7:50 pm #

    Thanks for the post,

    I was wondering if you knew the differences (computational and statistical performances) between using the lars package and the glmnet one with alpha=1 for performing a LASSO regression ?

    Thank you for your time and keep up the good work!

  3. Avatar
    JOY May 12, 2015 at 10:04 pm #

    nice article.l want to know how to use R for regression analysis.you are following the step by step method to do it to my e-mail.l have R package already on my Laptop

  4. Avatar
    Christine March 16, 2016 at 9:08 pm #

    Hi Jason, thank you so much for the very clear tutorials,
    It may be a silly question to ask, but how do I interpret the goodness-of-fit from the elastic net? I’ve obtained a value after:
    # summarize the fit
    summary(fit)
    # make predictions
    predictions <- predict(fit, x, type="link")
    # summarize accuracy
    rmse <- mean((y – predictions)^2)
    print(rmse)

    but not sure how I should interpret it.
    Thanks so much!

    • Avatar
      keval March 24, 2017 at 2:02 pm #

      I was wondering the same, how do I interpret mse?

      • Avatar
        Jason Brownlee March 25, 2017 at 7:32 am #

        Take the square root and the results are in the same units as the original data.

  5. Avatar
    Hans June 2, 2017 at 12:25 am #

    What does ‘longley[,1:6]’ respectively ‘longley[,7]’ mean?

    How can we replace data(longley) with own csv-data accurately?

    • Avatar
      Jason Brownlee June 2, 2017 at 1:01 pm #

      It specifies columns in the data.

      • Avatar
        Hans June 2, 2017 at 8:14 pm #

        Which columns are specified with ‘longley[,1:6]’ for example?

        When I click ‘x’ in the inspector of R Studio it gives me a table with headers:
        GNP.deflator
        GNP
        Unemployed
        Armed.Forces
        Population
        Year

        When I click y it gives me a table with header “V1”

        Could it described in words like ‘longley[datastart,dataend:datarows]’?
        Is it a kind of subsetting?

        • Avatar
          Hans June 2, 2017 at 8:25 pm #

          Alternatively if it is meant to be

          longley[,1:6] = longley[,col 1 to col 6]

          and

          longley[,7] = longley[,col 7]

          what is the part before the ‘,’ ? Any wildcard?

          • Avatar
            Hans June 6, 2017 at 9:16 am #

            got it…
            dataset[10:12,1:3] = dataset[startRow:endRow,startColumn:endColumn]
            dataset[,1:3] = dataset[allRows,startColumn:endColumn]

  6. Avatar
    Hans June 2, 2017 at 8:27 pm #

    Is there a way in R Studio where I can easily see how the original table of ‘data(longley)’ is structured (all headers)?

    • Avatar
      Jason Brownlee June 3, 2017 at 7:24 am #

      I do not use RStudio and cannot give you advice about it sorry.

    • Avatar
      srepho July 5, 2017 at 2:49 pm #

      In RStudio you can use View(df) and it shows the dataframe/tibble in the Viewer

      View(longley)

      Alternativly within the dplyr package you can do glimpse(df) to get a list of the column names and the data type.

      dplyr::glimpse(longley)

  7. Avatar
    Hans June 6, 2017 at 9:05 am #

    How to predict one step of unseen data in the above code?

    • Avatar
      Hand June 6, 2017 at 11:11 am #

      Do we have to use training and testdata to predict unseen data with glmnet?
      And if so, should we use last obervations of prediction with ‘test_data’ as newx for a prediction of new unseen data?

      • Avatar
        Jason Brownlee June 7, 2017 at 7:08 am #

        You can use any test harness you like to estimate the skill of the model on unseen data.

  8. Avatar
    hiya July 11, 2017 at 5:31 pm #

    Internally what method has been used – Least square
    or Maximum Likelihood

    • Avatar
      Jason Brownlee July 12, 2017 at 9:41 am #

      I would recommend reading the package documentation for the specific methods.

  9. Avatar
    Fakhra December 25, 2017 at 1:34 am #

    what if i have to use a proposed penalty than the built in penalty of LASSO or Elastic net? How it can be used?

    • Avatar
      Jason Brownlee December 25, 2017 at 5:25 am #

      Good question, you might need to implement it yourself.

  10. Avatar
    Munira April 6, 2018 at 2:58 am #

    Hi Jason,
    I have started your MAchine Learning Course-which seems to be very useful. I have a question regarding an analysis I am planning to conduct. I have briefly described my research setting and questions about R packages. If you can help me with this, that would be wonderful.

    I have been trying to analyze a high dimensional data (p exceeds n) with limited observation (n=50). I want to use Lasso method for variable selection and parameter estimation to develop a prediction model. As my data is Count observation, it has to be Poisson or Negative Binomial. I have explored several R packages and researches and finally decided to use Glmnet. Now I have some questions

    I know “Glmnet” and “Penalized” package are using different algorithm. As I saw “Mpath” use coordinate descent like Glmnet, do “Mapth” and “Glmnet” provide comparable results? The only reason I am interested in “Mpath” because they allow NB regression that “Glmnet” doesn’t.

    A few of the package allow post Inference (p-value and confidence interval) for example ‘selectiveInference’, “hdi”. however, I couldn’t find anything for Poisson or NB models. Is there any package that can help me?

    Thanks in advance.

    • Avatar
      Jason Brownlee April 6, 2018 at 6:34 am #

      There might be, I’m not sure off the cuff sorry.

      Perhaps try posting to the R user list?

  11. Avatar
    niebieska_biedronka April 20, 2018 at 6:24 pm #

    As I understand from glmnet package description (https://cran.r-project.org/web/packages/glmnet/glmnet.pdf), it does not fit the ridge regression, only lasso or elastic net – I guess it is because of penalty definition (it never reduces to ridge penalty definition)

    • Avatar
      Jason Brownlee April 21, 2018 at 6:44 am #

      With elastic net you can do ridge, lasso and both (e.g. elastic net).

  12. Avatar
    william July 28, 2018 at 8:20 pm #

    is Lasso a good method to use for feature selection for high dimensional dataset for a regression problem ML algorithm?

    i.e. i’m trying to determine the best algorithm i can use to select the best features for my output variable, which is continuous, so i’ve been using Lasso, i’m just not sure how effective it is compared to others…any suggestions for feature selection methods for regression-focused ML problems?

    thanks,

Leave a Reply