Compare Models And Select The Best Using The Caret R Package

The Caret R package allows you to easily construct many different model types and tune their parameters.

After creating and tuning many model types, you may want know and select the best model so that you can use it to make predictions, perhaps in an operational environment.

In this post you discover how to compare the results of multiple models using the caret R package.

Kick-start your project with my new book Machine Learning Mastery With R, including step-by-step tutorials and the R source code files for all examples.

Let’s get started.

Compare Machine Learning Models

While working on a problem, you will settle on one or a handful of well-performing models. After tuning the parameters of each, you will want to compare the models and discover which are the best and worst performing.

It is useful to get an idea of the spread of the models, perhaps one can be improved, or you can stop working on one that is clearly performing worse than the others.

In the example below we compare three sophisticated machine learning models in the Pima Indians diabetes dataset. This dataset is a summary from a collection of medical reports and indicate the onset of diabetes in the patient within five years.

You can learn more about the dataset here:

The three models constructed and tuned are Learning Vector Quantization (LVQ), Stochastic Gradient Boosting (also known as Gradient Boosted Machine or GBM), and Support Vector Machine (SVM). Each model is automatically tuned and is evaluated using 3 repeats of 10-fold cross validation.

The random number seed is set before each algorithm is trained to ensure that each algorithm gets the same data partitions and repeats. This allows us to compare apples to apples in the final results. Alternatively, we could ignore this concern and increase the number of repeats to 30 or 100, using randomness to control for variation in the data partitioning.

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Once the models are trained and an optimal parameter configuration found for each, the accuracy results from each of the best models are collected. Each “winning” model has 30 results (3 repeats of 10-fold cross validation). The objective of comparing results is to compare the accuracy distributions (30 values) between the models.

This is done in three ways. The distributions are summarized in terms of the percentiles. The distributions are summarized as box plots and finally the distributions are summarized as dot plots.

Below is the table of results from summarizing the distributions for each model.


Box Plot Comparing Model Results

Box Plot Comparing Model Results using the Caret R Package

Dotplot Comparing Model Results using the Caret R Package

Dotplot Comparing Model Results using the Caret R Package

If you needed to make strong claims about which algorithm was better, you could also use statistical hypothesis tests to statistically show that the differences in the results were significant.

Something like a Student t-test if the results are normally distributed or a rank sum test if the distribution is unknown.


In this post you discovered how you can use the caret R package to compare the results from multiple different models, even after their parameters have been optimized. You saw three ways the results can be compared, in table, box plot and a dot plot.

The examples in this post are standalone and you can easily copy-and-paste them into your own project and adapt them for your problem.

Discover Faster Machine Learning in R!

Master Machine Learning With R

Develop Your Own Models in Minutes

...with just a few lines of R code

Discover how in my new Ebook:
Machine Learning Mastery With R

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more...

Finally Bring Machine Learning To Your Own Projects

Skip the Academics. Just Results.

See What's Inside

60 Responses to Compare Models And Select The Best Using The Caret R Package

  1. Avatar
    anonymous March 26, 2015 at 2:41 am #

    Great work

  2. Avatar
    rain1024 April 17, 2015 at 2:58 pm #

    great work Jason,

    Thank you so much

  3. Avatar
    Ragy August 5, 2015 at 1:50 pm #

    Great reference. Thanks 🙂

  4. Avatar
    Ronnie March 23, 2016 at 11:41 pm #

    Can you suggest an R-code if you have 50 models all with specific tuning grids to train, predict and compare?

    I have one but its too slow.

  5. Avatar
    Qasim April 21, 2016 at 11:11 am #

    Is this acceptable for non-parametric models? If not , can you suggest any method to select the best model?

  6. Avatar
    Qiaojing November 9, 2016 at 6:35 pm #

    Hi, may i know how do you interpret the results at the end where you compare the models (I.e. box plot & dot plot)? Is the best model the one with the highest accuracy in the box plot or the one with smaller box area (less deviance?)?

    Thank you very much!

    • Avatar
      Jason Brownlee November 10, 2016 at 7:41 am #

      Great question Qiaojing, and really it is specific to your problem.

      Maybe you are more interested in average performance or more interested in less uncertainity in performance.

  7. Avatar
    Aldo November 18, 2016 at 11:14 pm #

    Thanks, excellent post. Very useful!

    • Avatar
      Jason Brownlee November 19, 2016 at 8:46 am #

      I’m glad you found it useful Aldo.

      • Avatar
        Jatin June 1, 2017 at 3:16 pm #

        Seriously, a great post that helps everyone!!

  8. Avatar
    shuiM December 28, 2016 at 6:52 am #

    Nice post! For the purpose of discussion, what would you suggest in a situation where you’ve exhaustively explored feature selection and the top two models have roughly the same performance plots and the t-test gives a large p-value. In that case nothing can be concluded and I can’t find an obvious way to improve them. Would you take a look at the statistically relevant variables of each one? Flip a coin? Would you look at the complexity? Thanks! I really like that you include code in your posts.

  9. Avatar
    pablo January 27, 2017 at 10:38 pm #

    Great work! very useful 😀

  10. Avatar
    Calu April 20, 2017 at 3:34 am #

    Great Post!

  11. Avatar
    Hans May 5, 2017 at 10:24 am #

    I have learned to retrieve specific data from the summary object of this example like so:



    for(i in 1:myIndex){

    How do we generate or retrieve RMSE/MSE values from the summary object?

  12. Avatar
    Hans May 5, 2017 at 10:57 am #

    I got the following error, trying to implement own source data in this example:

    > modelLvq <- train(n2~., data=series, method="lvq", trControl=control)
    Error: wrong model type for regression

    What's going wrong?

  13. Avatar
    Hans May 5, 2017 at 11:50 am #

    I also get:
    Error in terms.formula(formula, data = data) :
    ‘.’ in formula and no ‘data’ argument

    Is there a hint? Is it because I no more have a #library(mlbench)?
    What else is in mlbench?

    • Avatar
      Hans May 5, 2017 at 10:06 pm #

      I try to implement own source data in this example.

      It has a column with a date and a column with numbers from 1 to 30.

      I managed to replace data(PimaIndiansDiabetes) with a read_csv().

      In the train section I say for example:
      modelLvq <- train(n2~., data=series, method="lvq", trControl=control)

      …where n2 is my column header.

      Why do I get so much errors?

      Warning messages:
      1: In .local(x, …) : Variable(s) `' constant. Cannot scale data.
      2: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
      There were missing values in resampled performance measures.

      What are the crucial data structure depended parts of this example?
      I can't see any more? Are there some in the closed source mlbench library?

    • Avatar
      Jason Brownlee May 6, 2017 at 7:30 am #

      mlbench is an R package:

  14. Avatar
    Ujjayant Sinha May 5, 2017 at 9:57 pm #

    r1 = resamples( list (gbm_bc=gbm_bc , ada_yj= ada_y j) )

    ## here, gbm: gradient boosting, ada: adaboost


    This returns-

    summary.resamples(object = results_bestmodel)

    Models: logistic_yj, rf_nor, ada_yj, gbm_bc, svm_yj, knn_nor, neural_nor
    Number of resamples: 0

    Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
    logistic_yj NA NA NA NaN NA NA 0
    rf_nor NA NA NA NaN NA NA 0
    ada_yj NA NA NA NaN NA NA 0
    gbm_bc NA NA NA NaN NA NA 0
    svm_yj NA NA NA NaN NA NA 0
    knn_nor NA NA NA NaN NA NA 0
    neural_nor NA NA NA NaN NA NA 0

    What could be the reason behind this ? I had no problem comparing the gbm model to the other models and the adaBoost models to each other.

    • Avatar
      Jason Brownlee May 6, 2017 at 7:44 am #

      Perhaps there were some NaN values in your input data?

      • Avatar
        Ujjayant Sinha May 8, 2017 at 4:02 pm #

        No, I checked using is.nan(),even is.infinite(). Moreover, there wasn’t any problem comparing the 3 AdaBoost models to each other. Anyway, I will keep looking. Thank you.

  15. Avatar
    Hans May 6, 2017 at 2:48 am #

    As far as I understand, the first parameter of train() means…

    use ‘diabetes’ as outcome Y and use all other variables to predict Y.

    A) What if we want to predict Y with Y?

    modelLvq “<"- train(diabetes~., data=PimaIndiansDiabetes, method="lvq", trControl=control)

    I tried: train(y=dataset$n2, x=dataset[, -which(colnames(dataset) == "n2")],

    …but this leads to the message:

    Something is wrong; all the RMSE metric values are missing.

    B) I also would like to pass an external R-variable to that parameter.
    Would this be possible? I tried paste() and sprintf(). But this doesn't work.
    Till now I only can say it hard coded:

    Any suggestions here?

    • Avatar
      Hans May 6, 2017 at 6:02 am #

      Update: Got B) by further research:

      f <- n2~.

    • Avatar
      Jason Brownlee May 6, 2017 at 7:49 am #

      I would suggest learning some basic R syntax before diving into caret.

      Perhaps start here:

      • Avatar
        Hans May 6, 2017 at 9:22 pm #

        Thank you Jason, the article was wonderful and super easy to understand.

        Now I have a complete pipeline including…

        – an interactive command prompt to choose tests in Python or R with different parameters
        – a database filled with test results
        – an Excel dashboard to analyse and visualize my results.

        Ten different, running experiments with own data are currently available.

        Except from specific ‘algo-questions’, the only general thing I don’t understand is question A) from above.
        I googled a lot. And some say there will one day a unified syntax for the x and y parameter of Carets train() function.

        Do you have some additional insights regarding to this problem, Jason?

        Thank you so far.

  16. Avatar
    Seo young jae May 30, 2017 at 1:56 am #

    I have done gradient boosting using your code. For other machine learning techniques, I can divide into train and test data, then use the predict function and the confusionMatrix function to obtain the accuracy. How do I know the accuracy of gradient boosting???? If I use predict and confusionMatrix, I get an error. Is the accuracy shown as AUC, not numeric?

    • Avatar
      Jason Brownlee June 2, 2017 at 12:28 pm #

      I would recommend making predictions and calculating the accuracy. Caret can do this for you.

  17. Avatar
    Hans June 27, 2017 at 2:19 am #

    What does the function resamples() in this case?

    • Avatar
      Jason Brownlee June 27, 2017 at 8:33 am #

      Try: ?resamples to learn more.

      Or here:

      • Avatar
        Hans June 27, 2017 at 10:58 am #

        Hm…the documentation says:
        These functions provide methods for collection, analyzing and visualizing a set of resampling results from a common data set.

        Later on the site you mentioned:
        Given these models, can we make statistical statements about their performance differences? To do this, we first collect the resampling results using resamples.

        So I would say the function ‘resamples()’ is a kind of tool to compare model performances, trained within caret.

  18. Avatar
    Deepti October 10, 2017 at 9:34 pm #


    Nice post but when I am running the code for the model train, it is giving me below error, not sure why I am getting the same. Can you please help.

    modelLvq # train the GBM model
    > set.seed(7)
    > modelGbm <- train(diabetes~., data=PimaIndiansDiabetes, method="gbm", trControl=control, verbose=FALSE)

    Error in train(diabetes ~ ., data = PimaIndiansDiabetes, method = "gbm", :
    unused arguments (data = PimaIndiansDiabetes, method = "gbm", trControl = control, verbose = FALSE)

  19. Avatar
    Célia October 24, 2017 at 8:47 pm #

    Hi Jason,

    Your post is really helpful. A nice job 🙂
    Thank for all.

    But, do you know the best way to recovery the coefficient of a model fitted by caret ? Because I failed to do this. Like… model$FinalModel$coefficients return NULL for example.

    Thanks in advance,

    Have a good day


  20. Avatar
    Mohammed Barakat January 31, 2018 at 2:57 pm #

    Jason, this is such a great piece of work! Thank you.

    I just want to check something. Does this spare us from splitting the data into train/test and validating the models by predicting the testing data? I.e. Do the accuracies ‘resampled’ reflect the in-sample-errors or they also cover the out-of-sample errors? Do we have to do any further validation on testing data?


    • Avatar
      Jason Brownlee February 1, 2018 at 7:15 am #

      They are an estimate of the out of sample errors.

  21. Avatar
    Clinton Cooper February 15, 2018 at 4:45 am #

    I am trying to run this on a problem that is not classification (ie regression). Is there an equivalent model to LVQ for regression?

    • Avatar
      Jason Brownlee February 15, 2018 at 8:50 am #

      I believe there is. Sorry, I don’t have an example on hand.

  22. Avatar
    Jane Kathambi July 23, 2018 at 12:49 am #

    Good article,
    Is there a cheat sheet that shows which models perfoms best for particulear data sets e.g classification inseparable versu separable data?
    What about linear regression?

  23. Avatar
    Chris August 11, 2018 at 7:57 am #

    How does train select the best sigma and cost parameters if you do specify them, for svmRadial in particular?

    • Avatar
      Jason Brownlee August 12, 2018 at 6:29 am #

      It does not, it uses default values. You must use a grid search to find the best hyperparameter values for your dataset.

  24. Avatar
    Joulien August 20, 2018 at 8:06 pm #

    Dear Jason,
    I hope you are doing well,

    Please, do you have a comparison for regression models performance?

    I will appreciate your help,

  25. Avatar
    aadil October 14, 2018 at 8:37 am #

    when i run it I am getting the following error:

    > results <- resamples(list(decision_tree=fitdt, logistic_regresion=lm.fitlog,random_forest=lm.fitrandom))
    Warning message:
    In resamples.default(list(decision_tree = fitdt, logistic_regresion = lm.fitlog, :
    Some performance measures were not computed for each model: Accuracy, Kappa, MAE, RMSE, Rsquared

  26. Avatar
    Marcelo Falchetti October 30, 2018 at 5:31 am #

    This is literally the best site of ML! I can always find useful and simple information! Thanks from Brazil!

  27. Avatar
    Ruth April 28, 2019 at 6:24 am #

    Thank you!

  28. Avatar
    Skylar May 15, 2020 at 9:31 am #

    Hi Jason,

    Wonderful post! I wonder sometimes the dotplot and boxplot generate different model performance rank, I think it is because dotplot is based on accuracy mean and 95% CI, and boxplot is based on the median, right? If so, which one we should mainly rely on if we really want to choose the best one? Or do you think it is fine to report either plot? Thanks!

    • Avatar
      Jason Brownlee May 15, 2020 at 1:28 pm #


      I don’t recall sorry, you might want to check the documentation.

  29. Avatar
    Christian January 27, 2021 at 8:20 am #

    Hey Jason, thanks for material you’ve put here. Really valuable.

    I’ve spot-checked several algos and picked the top 2 from their avg accuracy. However, I run them all against my test data and the ones that scored highest actually didn’t perform well with unseen data and the ones that ranked 3,4 and 5 in my results actually performed better with unseen data. which results should be considered more relevant here, the ones you get from train-cv or the ones from looking at my test data?


    • Avatar
      Jason Brownlee January 27, 2021 at 10:38 am #

      You’re welcome.

      Perhaps the hold out test set you’re using is small/not representative?

      Perhaps focus on results from repeated k-fold cross-validation for model selection, then fit a final model on all data and start making predictions on new data?

Leave a Reply