How To Estimate Model Accuracy in R Using The Caret Package

When you are building a predictive model, you need a way to evaluate the capability of the model on unseen data.

This is typically done by estimating accuracy using data that was not used to train the model such as a test set, or using cross validation. The caret package in R provides a number of methods to estimate the accuracy of a machines learning algorithm.

In this post you discover 5 approaches for estimating model performance on unseen data. You will also have access to recipes in R using the caret package for each method, that you can copy and paste into your own project, right now.

Caret package in R

Caret package in R, from the caret homepage

Estimating Model Accuracy

We have considered model accuracy before in the configuration of test options in a test harness. You can read more in the post: How To Choose The Right Test Options When Evaluating Machine Learning Algorithms.

In this post you can going to discover 5 different methods that you can use to estimate model accuracy.

They are as follows and each will be described in turn:

  • Data Split
  • Bootstrap
  • k-fold Cross Validation
  • Repeated k-fold Cross Validation
  • Leave One Out Cross Validation

Generally, I would recommend Repeated k-fold Cross Validation, but each method has its features and benefits, especially when the amount of data or space and time complexity are considered. Consider which approach best suits your problem.

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

Data Split

Data splitting involves partitioning the data into an explicit training dataset used to prepare the model and an unseen test dataset used to evaluate the models performance on unseen data.

It is useful when you have a very large dataset so that the test dataset can provide a meaningful estimation of performance, or for when you are using slow methods and need a quick approximation of performance.

The example below splits the iris dataset so that 80% is used for training a Naive Bayes model and 20% is used to evaluate the models performance.

Bootstrap

Bootstrap resampling involves taking random samples from the dataset (with re-selection) against which to evaluate the model. In aggregate, the results provide an indication of the variance of the models performance. Typically, large number of resampling iterations are performed (thousands or tends of thousands).

The following example uses a bootstrap with 10 resamples to prepare a Naive Bayes model.

k-fold Cross Validation

The k-fold cross validation method involves splitting the dataset into k-subsets. For each subset is held out while the model is trained on all other subsets. This process is completed until accuracy is determine for each instance in the dataset, and an overall accuracy estimate is provided.

It is a robust method for estimating accuracy, and the size of k and tune the amount of bias in the estimate, with popular values set to 3, 5, 7 and 10.

The following example uses 10-fold cross validation to estimate Naive Bayes on the iris dataset.

Repeated k-fold Cross Validation

The process of splitting the data into k-folds can be repeated a number of times, this is called Repeated k-fold Cross Validation. The final model accuracy is taken as the mean from the number of repeats.

The following example uses 10-fold cross validation with 3 repeats to estimate Naive Bayes on the iris dataset.

Leave One Out Cross Validation

In Leave One Out Cross Validation (LOOCV), a data instance is left out and a model constructed on all other data instances in the training set. This is repeated for all data instances.

The following example demonstrates LOOCV to estimate Naive Bayes on the iris dataset.

Summary

In this post you discovered 5 different methods that you can use to estimate the accuracy of your model on unseen data.

Those methods were: Data Split, Bootstrap, k-fold Cross Validation, Repeated k-fold Cross Validation, and Leave One Out Cross Validation.

You can learn more about the caret package in R at the caret package homepage and the caret package CRAN page. If you would like to master the caret package, I would recommend the book written by the author of the package, titled: Applied Predictive Modeling, especially Chapter 4 on overfitting models.


Frustrated With Your Progress In R Machine Learning?

Master Machine Learning With R

Develop Your Own Models in Minutes

…with just a few lines of R code

Discover how in my new Ebook:
Machine Learning Mastery With R

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.


77 Responses to How To Estimate Model Accuracy in R Using The Caret Package

  1. kumaran November 11, 2014 at 4:16 pm #

    Hi Sir,
    Could you provide the full code for the bayes classifier and bootstrap resampling
    i’ll use that as my model for my engineering project work….

    • hima July 17, 2016 at 9:32 pm #

      confusionMatrix(predictions$class,y_test)

      Error in predictions$class : $ operator is invalid for atomic vectors

      i am getting an error message while implementing R code for Confusion matrix…

      • somia August 1, 2017 at 4:53 am #

        did you solve this problem?????

  2. Romeo July 23, 2015 at 3:58 pm #

    You’re a great teacher! Others make complex things more complicated while you make the above look simple. Your emails though just pointed me to other books. Why don’t you start to have your own video course or book?

  3. pushpa September 4, 2015 at 10:53 am #

    very useful information , simplified

  4. take September 12, 2015 at 7:49 am #

    Hi Jason, in the case of k-fold Cross Validation, which should be used as accuracy of a model, the one in the “model” variable or the one shown in the confusionMatrix? I assume it’s the former but wanted to confirm. Thanks.

    • Jason Brownlee September 12, 2015 at 8:46 am #

      The model is used to create predictions.

      We can only evaluate the accurate of predictions and the accuracy reflects the capability of the model.

      • take September 14, 2015 at 9:02 am #

        Thanks for your reply, Jason. When I run your code of Repeated k-fold Cross Validation, and look at the content of the “model” variable, I get the following result with accuracy indicated as 0.9533333. In contrast, when I look at the result of the confusionMatris() function, accuracy is 0.96 (see below). In my real data this difference is larger 0.931 vs. 0.998 for model and confusionMatrix, respectively. So my question is which result should be used as the capability of the model?

        > model
        Naive Bayes

        150 samples
        4 predictor
        3 classes: ‘setosa’, ‘versicolor’, ‘virginica’

        No pre-processing
        Resampling: Cross-Validated (10 fold, repeated 3 times)
        Summary of sample sizes: 135, 135, 135, 135, 135, 135, …
        Resampling results across tuning parameters:

        usekernel Accuracy Kappa Accuracy SD Kappa SD
        FALSE 0.9533333 0.93 0.05295845 0.07943768
        TRUE 0.9533333 0.93 0.05577734 0.08366600

        Tuning parameter ‘fL’ was held constant at a value of 0
        Accuracy was used to select the optimal model using the largest value.
        The final values used for the model were fL = 0 and usekernel = FALSE.

        > confusionMatrix(predictions, iris$Species)
        Confusion Matrix and Statistics

        Reference
        Prediction setosa versicolor virginica
        setosa 50 0 0
        versicolor 0 47 3
        virginica 0 3 47

        Overall Statistics

        Accuracy : 0.96
        (e.d. truncated)

  5. Henok September 17, 2015 at 11:15 pm #

    Very helpful material really!!!!!

  6. Tobi October 4, 2015 at 4:24 pm #

    Hi,
    very useful thanks, I wonder what the license on your code is.
    Clarification would be nice, because you provide copy/paste capabilities.
    Maybe re-use with referencing your website, like CC-BY?
    Cheers
    Tobias

  7. Dev October 10, 2015 at 2:01 am #

    Hi jason
    It looks really helpful. I want to bootstrap for a quadratic model. When I replace the species~. when training the model with my model with which is kind of lm(y~poly(x,2)+poly(z,2)).. I find errors as the regression method id not appropriate. Can xou please help

    model <- train(Species~., data=iris, trControl=train_control, method="nb")
    regards
    Dev

  8. Amir October 17, 2015 at 10:35 am #

    Dear Jason

    I have a data set including 3 class (C1,C2 and C3) and 7 features (variables).
    I have classified my data by many methods on all variables.
    Now I want plot and illustrate for example a 2-D plot for every methods.
    But I have to use just 2 variables for a 2-D plot. Now I don’t know, How select tow variables that show best separation between class in plot. I can test classification accuracy for variable pairs (for example: V1-V2, V1-V3,….) but this work is very time consuming. Can you help me? is there a methods for select two best variables in classification models?

    Thank you so much
    Best regards,

    Amir

  9. Petros October 29, 2015 at 5:56 pm #

    great blog. However I am unsure how the kfold model is built. Does it run the 10 folds and then used the best model of the 10 or does it fit the model on all the data and you know how well it performs from the kfolds?

  10. Jim Callahan December 7, 2015 at 8:28 am #

    Finally! A clear post on how to do cross validation for machine learning in R!
    (and it even uses the Caret package). Congrats Jason Brownlee!

    BEYOND THE CONFUSION MATRIX
    Here is a wikipedia article that shows the formulas for calculating the relevant measures
    from the confusion matrix:
    https://en.wikipedia.org/wiki/Sensitivity_and_specificity

    true positive (TP)
    eqv. with hit

    true negative (TN)
    eqv. with correct rejection

    false positive (FP)
    eqv. with false alarm, Type I error

    false negative (FN)
    eqv. with miss, Type II error

    sensitivity or true positive rate (TPR)
    eqv. with hit rate, recall
    TPR = TP / P = TP / (TP+FN)

    specificity (SPC) or true negative rate
    SPC = TN / N = TN / (TN+FP)

    precision or positive predictive value (PPV)
    PPV = TP / (TP + FP)

    negative predictive value (NPV)
    NPV = TN / (TN + FN)

    fall-out or false positive rate (FPR)
    FPR = FP / N = FP / (FP + TN) = 1-SPC

    false negative rate (FNR)
    FNR = FN / (TP + FN) = 1-TPR

    false discovery rate (FDR)
    FDR = FP / (TP + FP) = 1 – PPV

    accuracy (ACC)
    ACC = (TP + TN) / (TP + FP + FN + TN)

    https://en.wikipedia.org/wiki/Sensitivity_and_specificity

  11. DR Venugopala Rao Manneni April 6, 2016 at 6:20 pm #

    A Clear post and very useful

  12. Robert Feyerharm April 15, 2016 at 5:12 am #

    Jason –

    I’m working on a project with the caret package where I first partition my data into 5 CV folds, then train competing models on each of the 5 training folds with 10-fold CV and score the remaining test folds to evaluate performance.

    How would you obtain the best fit model predictions on each of the 5 test fold partitions?

    For example, using the following dataset:

    # Load data & factor admit variable.
    mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv&quot😉
    mydata$admit <- as.factor(mydata$admit)

    # Create levels yes/no to make sure the the classprobs get a correct name.
    levels(mydata$admit) = c("yes", "no")

    # Partition data into 5 folds.
    set.seed(123)
    folds <- createFolds(mydata$admit, k=5)

    # Train elastic net logistic regression via 10-fold CV on each of 5 training folds using index argument.
    set.seed(123)
    train_control <- trainControl( method="cv",
    number=10,
    index=folds,
    classProbs = TRUE,
    savePredictions = TRUE)

    glmnetGrid <- expand.grid(alpha=c(0, .5, 1), lambda=c(.1, 1, 10))
    model<- train(admit ~ .,
    data=mydata,
    trControl=train_control,
    method="glmnet",
    family="binomial",
    tuneGrid=glmnetGrid,
    metric="Accuracy",
    preProcess=c("center","scale"))

    Can caret extract predictions on each of the 5 test fold partitions with the best fitting model w/ optimal alpha & lambda values obtained via 10-fold CV?

  13. Heba October 17, 2016 at 10:40 pm #

    Caret Train does not output the Accuracy SD

    I try to run the code below but the only metrics I get are the Accuracy and Kappa. I need to see the Accuracy SD and Kappa SD. Is that possible?

    The code I use is:
    library(“caret”, lib.loc=”~/R/win-library/3.3″)

    set.seed(42)
    mtry <- sqrt(ncol(Train_2.4.16[,-which(names(Train_2.4.16) == "Label")]))
    control <- trainControl(method="repeatedcv", number=10, repeats=5)
    tunegrid <- expand.grid(.mtry=mtry)

    rfFit <- train(Label ~., data = Train_2.4.16,
    method = "rf",
    preProc = c("center", "scale"),
    tuneLength = 10,
    metric = "Accuracy",
    trControl=control,
    allowParallel=TRUE)
    print(rfFit)

    The output is several lines for different mtry values and the accuracy and kappa measures but it does not show the Accuracy SD and Kappa SD which is quite important too. Are there any indicators that need to be set up for these two important measures to show on the output.

    • Jason Brownlee October 18, 2016 at 5:54 am #

      Hi Heba, perhaps the caret API has changed. I’m sorry about that.

  14. pmpawmil January 14, 2017 at 12:53 am #

    Is it normal ?

    model <- train(Species~., data=iris, trControl=train_control, method="nb", tuneGrid=grid)
    Error in command 'train.default(x, y, weights = w, …)':
    The tuning parameter grid should have columns fL, usekernel, adjust

    • Jason Brownlee January 15, 2017 at 5:22 am #

      Perhaps the package has been updated. The error suggests you need to include “fL, usekernel, adjust” in the grid of parameters being optimized.

      • Andressa May 6, 2018 at 8:18 am #

        estou com o mesmo problema

      • Andressa May 6, 2018 at 8:19 am #

        I have the same problem

  15. Peter Lou January 17, 2017 at 1:05 pm #

    After you evaluate the model accuracy, are you allowed to go back to revise the model? For example, you found your model was overfitting when comparing training and test results. I’m asking this question because in a machine learning course, the instructor said we should never use the test data to help with model construction. Test data is just there for reality check of the power of the model.

    • Jason Brownlee January 17, 2017 at 2:39 pm #

      Hi Peter,

      I agree with your instructor if the test or validation dataset is held back and you are using cross validation or similar with the training dataset.

  16. young jae Seo March 31, 2017 at 5:26 pm #

    Hi Jason. Thanks you for your good posting.
    But I have a one question.
    To calculate the accuracy of mode, I use confusionMatrix in decision tree.
    Before doing this process, I split the data as createDataPartition() function.

    Now I want to calculate accuracy of gbm model.
    I think gbm model use full data because it is boost model.
    So… I can’t calculate accuracy of gbm model…

    Can you explain how to calculate accuracy of gbm() function?
    Do i have to split the data set as createDataPartition??

    Have a nice day!

    • Jason Brownlee April 1, 2017 at 5:53 am #

      Yes, you can use resampling methods to evaluate the performance of any algorithm.

      • Young jae Seo April 2, 2017 at 3:08 am #

        You mean that use Data Split method?
        In gbm() modeling , Is it a problem that modeling as train data set??
        I think that I have to use full data set in gbm() modeling.
        have i wrong information?

  17. Seo young jae April 8, 2017 at 9:17 pm #

    Hi Jason. Thank you for good information!
    I have a question.

    You used tunegrid in k-fold cross validation
    But, why you didn’t use that in repeated k-fold cross validation method?
    what’s the difference??

    • Jason Brownlee April 9, 2017 at 2:58 pm #

      It is a good idea to use a repeat for CV with stochastic algorithms.

      I didn’t in this case for simplicity of the example.

      See this post on stochastic machine learning algorithms:
      http://machinelearningmastery.com/randomness-in-machine-learning/

      • Seo young jae April 9, 2017 at 3:24 pm #

        Thank you for reply.
        But I mean tunegrid parameter. not repeat for CV.

        I mean that you used tunegrid parameter in k-fold cross validation, but you didn’t use tunegrid parameter in repeated k-fold cross validation method.

        what’s the difference?

        • Jason Brownlee April 10, 2017 at 7:36 am #

          The tunegrid is for evaluating a grid of hyperparameters.

          This is not required for using CV with or without repeats.

  18. Hans May 5, 2017 at 8:23 pm #

    I’m a little bit confused.

    Is ‘Estimating Model Accuracy’ actually included in these tutorials without explicit
    coding?

    http://machinelearningmastery.com/compare-models-and-select-the-best-using-the-caret-r-package/

    http://machinelearningmastery.com/evaluate-machine-learning-algorithms-with-r/

    What is the difference between the accuracy values of this example and the others?

  19. Hans May 5, 2017 at 9:09 pm #

    What does this error mean?

    > model <- NaiveBayes(n2~., data=data_train)

    Error in NaiveBayes.default(X, Y, …) :
    grouping/classes object must be a factor

    • Hans May 5, 2017 at 10:47 pm #

      Do we need parameter colClasses for this example?

      • Hans May 6, 2017 at 2:35 am #

        Is this example only for classification problems?

    • Jason Brownlee May 6, 2017 at 7:42 am #

      It means your output variable is not a factor (categorical).

      You can make it a factor using as.factor() (from memory).

      • Hans May 6, 2017 at 9:25 pm #

        Can Y be a Class for itself?

        • Jason Brownlee May 7, 2017 at 5:40 am #

          I do not understand.

          Y is the output variable which may be a class (factor) or a real value depending on whether your problem is classification or regression.

  20. Noah June 1, 2017 at 4:52 am #

    Should repeated CV give us a valid estimate of the out of sample (training) error? I seem to get much different error rates when I compare caret’s repeatedcv metrics with a manual hold out sample. Am I missing something about repeatedcv? Thanks! I enjoy your content!

    • Jason Brownlee June 2, 2017 at 12:53 pm #

      Repeated CV should be a less biased estimate.

      The hold-out score will probably be optimistic.

  21. Nainy June 19, 2017 at 3:06 pm #

    hello
    when I run your code I am getting the following error
    Error in train(Species ~ ., data = iris, trControl = train_control, method = “nb”) :
    unused arguments (data = iris, trControl = train_control, method = “nb”)

    can you please explain how to solve this

    • Jason Brownlee June 20, 2017 at 6:34 am #

      Please confirm that you have copied all of the required code.

  22. Yones July 6, 2017 at 2:23 am #

    Hi,

    I gunna use leave one out cross validation for Cubist. But I don’t how to use it. The code that you published here is not working on my codes.
    Please would you help me about it?

    Best,

  23. Cesar Yona July 6, 2017 at 10:19 am #

    How can I apply those techniques to time series prediction?

  24. Sanskriti July 15, 2017 at 10:44 pm #

    hi ,
    thank you for a great tutorial.
    Is there any other package we can use instead of caret because for the version 3.2.4 , caret is not available.

    Thank you

  25. Ipong September 14, 2017 at 8:50 pm #

    Hi Jason, whether method=”cv” also applies to stratified kfold method as well ?

  26. emanuele barca October 2, 2017 at 5:56 pm #

    Dear Jason,

    first of all thanks a lot for your effort in explaining a difficult topic.

    I am presently engaged with linear mixed model and I would like to subject my model to a

    LOOCV.

    fm2 <- lme(X1 ~ X2 + X3, random= ~1|X4, method="REML", data = bb)

    fm4<-update(fm2, correlation = corSpher(c(25, 0.5), nugget=TRUE,form = ~ X5 + X6|X4))

    how can I insert my model in your script?

    thanks in advance

    emanuele

  27. Duncan Williamson November 28, 2017 at 9:34 pm #

    I get this error message in the k-fold Cross Validation method

    > model

    I don’t know what this means so I would appreciate your solution here.

    Best wishes

    Duncan

    • Jason Brownlee November 29, 2017 at 8:22 am #

      It looks like you might not have copied all of the relevant code into your example Duncan?

  28. Mansur Can December 24, 2017 at 10:51 am #

    Error: The tuning parameter grid should have columns fL, usekernel, adjust

    How can I include fL, usekernel, adjust parameters in the grid?

    Many thanks,

    Mansur

  29. Chris January 26, 2018 at 9:48 pm #

    Really nice blog this

  30. Keith February 19, 2018 at 12:40 pm #

    I am a bit embarrassed to have to ask this question…

    I have a small sample set (120 or so, with 20 or so “positive” cases). I am using logistic regression and cross validating (cv = 10). Do I also need a holdhout set to test against to really determine accuracy?

    • Jason Brownlee February 19, 2018 at 3:07 pm #

      CV should be a sufficient estimate of model skill.

      Does that help?

      • Keith February 20, 2018 at 1:46 am #

        Yes – thanks. I had managed to get myself confused about cv / holdout testing. Thanks.

  31. Marius April 2, 2018 at 11:36 pm #

    Hi Jason

    I run my train function, as follows:

    caret_model <- train(method "glm", trainControl (method = "cv", number = 5, …), data = train, …)

    I am collecting my ROC with caret_model$results and coefficients with caret_model$finalModel or with summary(caret_model).

    Then, I run a simple glm, as follows:
    glm_model <- glm(data = train, formula=…, family=binomial).

    When I check the coefficients of my glm_model they are identical to the coefficients of my caret_model.

    So, my question is, on what data caret actually runs the glm model with cross validation since it produces absolutely the same coefficients as a simple glm model? I was under the impression that it actually runs 5 glm models, produces 5 ROCs and then displays the average of the 5 ROCs produced and selects the best glm model based on the best ROC. But I don`t think it does that since the coefficients are absolutely the same with a simple glm run on the train data.

    Thank you very much.

  32. Levi Uzodike May 2, 2018 at 9:52 am #

    When I run my code:

    > train_control model <- train(emotion~., data=tweet_p1, trControl=train_control, method="nb")

    I get:

    1 package is needed for this model and is not installed. (klaR). Would you like to try to install it now?
    1: yes
    2: no

    Selection: yes
    Warning: dependency ‘later’ is not available

    then it installs a bunch of other dependencies. This is the end of the install messages:

    The downloaded binary packages are in
    /var/folders/k0/bl302_r97b171sw66wd_h8nw0000gn/T//RtmpmT5Kvt/downloaded_packages
    Error: package klaR is required

    How can package klaR be required to install itself? Anyway, I just went ahead and did library(klaR) and the end of these messages were:

    Error: package or namespace load failed for ‘klaR’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
    there is no package called ‘later’

    so then I did install.packages("later") and the error I got was:

    Warning in install.packages :
    package ‘later’ is not available (for R version 3.5.0)
    Then I did library(later)

    Any tips?

    • Levi Uzodike May 2, 2018 at 9:54 am #

      sorry the first line of code should read:
      > train_control model <- train(emotion~., data=tweet_p1, trControl=train_control, method="nb")

    • Jason Brownlee May 3, 2018 at 6:29 am #

      I’m sorry to hear about the difficulty.

      Perhaps try posting your error message to stackoverflow?

  33. Levi Uzodike May 2, 2018 at 9:55 am #

    I mean: (it keeps eating up my text)
    train_control <- trainControl(method="LOOCV")

    model <- train(emotion~., data=tweet_p1, trControl=train_control, method="nb")

  34. vineela June 18, 2018 at 11:20 pm #

    sir,i m working on project stock market prediction,which model is best to use

  35. Charlie August 17, 2018 at 6:59 am #

    Consider I’d like to compare a standard logistic regression object (i.e. modlog<-glm(class~., data=dat, family= binomial(link= ‘logit’)), but not a caret() train logistic object, to a caret decision tree object.

    While random partitioning of data, using caret createDataPartition(), can initially be used on the original dataset, it appears that the trainControl() created trControl variable is only compatible with a caret train() tree or glm derived object, meaning the the k-fold cross-validation as implemented in trainControl can not be applied to standard logistic regression object. Is this correct?

    Best,
    Charlie
    P.S. Great post

    • Jason Brownlee August 17, 2018 at 7:39 am #

      Not sure I follow. You can use caret to evaluate the model or fit a standalone glm model on all data, you can also prepare data with caret (e.g. split) and fit a glm model manually.

  36. Sandy September 19, 2018 at 4:42 am #

    Very nice explanation of the different methods. Would you happen to know if the Caret package can handle multilevel models using a negative binomial distribution? Thanks~

Leave a Reply