Linear Classification in R

In this post you will discover recipes for 3 linear classification algorithms in R.

All recipes in this post use the iris flowers dataset provided with R in the datasets package. The dataset describes the measurements if iris flowers and requires classification of each observation to one of three flower species.

binary classification

Red vs Blue
Photo by Robert Couse-Baker, some rights reserved

Logistic Regression

Logistic Regression is a classification method that models the probability of an observation belonging to one of two classes. As such, normally logistic regression is demonstrated with binary classification problem (2 classes). Logistic Regression can also be used on problems with more than two classes (multinomial), as in this case.

This recipe demonstrates multinomial logistic regression method on the iris dataset.

Learn more about the vglm function in the VGAM package.

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

Linear Discriminant Analysis

LDA is a classification method that finds a linear combination of data attributes that best separate the data into classes.

This recipes demonstrates the LDA method on the iris dataset.

Learn more about the lda function the MASS package.

Partial Least Squares Discriminant Analysis

Partial Least Squares Discriminate Analysis is the application of LDA on a dimension-reducing projection of the input data (partial least squares).

This recipe demonstrates the PLSDA method on the iris dataset.

Learn more about the plsda function in the caret package.

Summary

In this post, you discovered 3 recipes for linear classification that you can copy and paste into your own problem.


Frustrated With Your Progress In R Machine Learning?

Master Machine Learning With R

Develop Your Own Models in Minutes

…with just a few lines of R code

Discover how in my new Ebook:
Machine Learning Mastery With R

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.


10 Responses to Linear Classification in R

  1. Anne-Marie August 8, 2014 at 3:51 pm #

    Hi Jason

    I’m relatively new to this field of study and you’ve motivated me to start producing blog posts about what I’m learning. Quick question, how do you display the chunks of code in the post?

    Thanks
    Anne-Marie

    • jasonb August 8, 2014 at 6:37 pm #

      That’s a great compliment, thanks Anne-Marie.

      I use the Crayon Syntax Highlighter wordpress plugin to provides code snippets on my blog.

      Good luck with your blog, I’d to see what you come up with!

  2. Taylor Nelson October 10, 2014 at 7:22 am #

    Hi Jason,

    I am learning Naive Bayes classification in R right now, but I have used Logistic Regression before many times as well. Both methods seem to apply probabilities to classification–is there any sense as to whether one is better, or one method would be better in a certain context?

    Thanks!

    • jasonb October 10, 2014 at 7:27 am #

      Spot on Taylor. It’s very hard to tell a priori. The best advice is to experiment and spot check each algorithm to get a sense of which is better at picking out the structure in the problem.
      Even knowing the assumptions in the methods doesn’t help much in practice – NaiveBayes with its strong independence assumption seeps to do well even in some situations with tightly coupled attributes.

  3. Ugi August 2, 2015 at 4:57 pm #

    Hi Jasson 😀
    Thanks for sharing . . . 😀

  4. Neg March 13, 2016 at 7:04 am #

    Hello
    forgive me if the question is too simple. you use “Species~.” in your code, I am not sure what it means. my understanding is that you are using species as a feature for the prediction. could you explain the code a bit more please.

    Thank you for the tutorials.

    • Mary Li March 29, 2016 at 6:35 am #

      Neg, “Species~.” simply means “Species” is outcome, and all other variables in the dataset are input covariates.

      Thanks Jason for sharing!

  5. Kamchatang April 5, 2016 at 5:13 am #

    What’s the explanation behind this?

    ‘There were 20 warnings (use warnings() to see them)

    In checkwz(wz, M = M, trace = trace, wzepsilon = control$wzepsilon) :
    2 elements replaced by 1.819e-12’

    Can I ignore these warnings?

  6. Haoliang Han May 7, 2016 at 9:43 am #

    I’ve tried the logistic Regression, but why does the output have two different values, such as Length1, Length2, Width1, Width2? Which one is accurate? And what does it mean?

Leave a Reply