Linear Classification in R

By Jason Brownlee on August 22, 2019 in R Machine Learning 12

In this post you will discover recipes for 3 linear classification algorithms in R.

All recipes in this post use the iris flowers dataset provided with R in the datasets package. The dataset describes the measurements if iris flowers and requires classification of each observation to one of three flower species.

Kick-start your project with my new book Machine Learning Mastery With R, including step-by-step tutorials and the R source code files for all examples.

Let’s get started.

Red vs Blue
Photo by Robert Couse-Baker, some rights reserved

Logistic Regression

Logistic Regression is a classification method that models the probability of an observation belonging to one of two classes. As such, normally logistic regression is demonstrated with binary classification problem (2 classes). Logistic Regression can also be used on problems with more than two classes (multinomial), as in this case.

This recipe demonstrates multinomial logistic regression method on the iris dataset.

# load the package
library(VGAM)
# load data
data(iris)
# fit model
fit <- vglm(Species~., family=multinomial, data=iris)
# summarize the fit
summary(fit)
# make predictions
probabilities <- predict(fit, iris[,1:4], type="response")
predictions <- apply(probabilities, 1, which.max)
predictions[which(predictions=="1")] <- levels(iris$Species)[1]
predictions[which(predictions=="2")] <- levels(iris$Species)[2]
predictions[which(predictions=="3")] <- levels(iris$Species)[3]
# summarize accuracy
table(predictions, iris$Species)

# load the package

library(VGAM)

# load data

data(iris)

# fit model

fit <- vglm(Species~., family=multinomial, data=iris)

# summarize the fit

summary(fit)

# make predictions

probabilities <- predict(fit, iris[,1:4], type="response")

predictions <- apply(probabilities, 1, which.max)

predictions[which(predictions=="1")] <- levels(iris$Species)[1]

predictions[which(predictions=="2")] <- levels(iris$Species)[2]

predictions[which(predictions=="3")] <- levels(iris$Species)[3]

# summarize accuracy

table(predictions, iris$Species)

Learn more about the vglm function in the VGAM package.

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Linear Discriminant Analysis

LDA is a classification method that finds a linear combination of data attributes that best separate the data into classes.

This recipes demonstrates the LDA method on the iris dataset.

# load the package
library(MASS)
data(iris)
# fit model
fit <- lda(Species~., data=iris)
# summarize the fit
summary(fit)
# make predictions
predictions <- predict(fit, iris[,1:4])$class
# summarize accuracy
table(predictions, iris$Species)

# load the package

library(MASS)

data(iris)

# fit model

fit <- lda(Species~., data=iris)

# summarize the fit

summary(fit)

# make predictions

predictions <- predict(fit, iris[,1:4])$class

# summarize accuracy

table(predictions, iris$Species)

Learn more about the lda function the MASS package.

Partial Least Squares Discriminant Analysis

Partial Least Squares Discriminate Analysis is the application of LDA on a dimension-reducing projection of the input data (partial least squares).

This recipe demonstrates the PLSDA method on the iris dataset.

# load the package
library(caret)
data(iris)
x <- iris[,1:4]
y <- iris[,5]
# fit model
fit <- plsda(x, y, probMethod="Bayes")
# summarize the fit
summary(fit)
# make predictions
predictions <- predict(fit, iris[,1:4])
# summarize accuracy
table(predictions, iris$Species)

# load the package

library(caret)

data(iris)

x <- iris[,1:4]

y <- iris[,5]

# fit model

fit <- plsda(x, y, probMethod="Bayes")

# summarize the fit

summary(fit)

# make predictions

predictions <- predict(fit, iris[,1:4])

# summarize accuracy

table(predictions, iris$Species)

Learn more about the plsda function in the caret package.

Summary

In this post, you discovered 3 recipes for linear classification that you can copy and paste into your own problem.

12 Responses to Linear Classification in R

Anne-Marie August 8, 2014 at 3:51 pm #

Hi Jason

I’m relatively new to this field of study and you’ve motivated me to start producing blog posts about what I’m learning. Quick question, how do you display the chunks of code in the post?

Thanks
Anne-Marie

Reply
- jasonb August 8, 2014 at 6:37 pm #
  
  That’s a great compliment, thanks Anne-Marie.
  
  I use the Crayon Syntax Highlighter wordpress plugin to provides code snippets on my blog.
  
  Good luck with your blog, I’d to see what you come up with!
  
  Reply
  - Andrew May 11, 2022 at 5:40 am #
    
    Is your site still using wordpress?
    
    Reply
    - James Carmichael May 13, 2022 at 12:42 am #
      
      Hi Andrew…Yes it is.
      
      Reply
Taylor Nelson October 10, 2014 at 7:22 am #

Hi Jason,

I am learning Naive Bayes classification in R right now, but I have used Logistic Regression before many times as well. Both methods seem to apply probabilities to classification–is there any sense as to whether one is better, or one method would be better in a certain context?

Thanks!

Reply
- jasonb October 10, 2014 at 7:27 am #
  
  Spot on Taylor. It’s very hard to tell a priori. The best advice is to experiment and spot check each algorithm to get a sense of which is better at picking out the structure in the problem.
  Even knowing the assumptions in the methods doesn’t help much in practice – NaiveBayes with its strong independence assumption seeps to do well even in some situations with tightly coupled attributes.
  
  Reply
Ugi August 2, 2015 at 4:57 pm #

Hi Jasson 😀
Thanks for sharing . . . 😀

Reply
Neg March 13, 2016 at 7:04 am #

Hello
forgive me if the question is too simple. you use “Species~.” in your code, I am not sure what it means. my understanding is that you are using species as a feature for the prediction. could you explain the code a bit more please.

Thank you for the tutorials.

Reply
- Mary Li March 29, 2016 at 6:35 am #
  
  Neg, “Species~.” simply means “Species” is outcome, and all other variables in the dataset are input covariates.
  
  Thanks Jason for sharing!
  
  Reply
Kamchatang April 5, 2016 at 5:13 am #

What’s the explanation behind this?

‘There were 20 warnings (use warnings() to see them)

In checkwz(wz, M = M, trace = trace, wzepsilon = control$wzepsilon) :
2 elements replaced by 1.819e-12’

Can I ignore these warnings?

Reply
- Jason Brownlee April 8, 2016 at 1:37 pm #
  
  Yes, you can ignore these warnings.
  
  Reply
Haoliang Han May 7, 2016 at 9:43 am #

I’ve tried the logistic Regression, but why does the output have two different values, such as Length1, Length2, Width1, Width2? Which one is accurate? And what does it mean?

Reply

Navigation

Linear Classification in R

Logistic Regression

Need more Help with R for Machine Learning?

Linear Discriminant Analysis

Partial Least Squares Discriminant Analysis

Summary

Discover Faster Machine Learning in R!

Develop Your Own Models in Minutes

Finally Bring Machine Learning To Your Own Projects

More On This Topic

12 Responses to Linear Classification in R

Leave a Reply Click here to cancel reply.