How to Save Gradient Boosting Models with XGBoost in Python

Last Updated on

XGBoost can be used to create some of the most performant models for tabular data using the gradient boosting algorithm.

Once trained, it is often a good practice to save your model to file for later use in making predictions new test and validation datasets and entirely new data.

In this post you will discover how to save your XGBoost models to file using the standard Python pickle API.

After completing this tutorial, you will know:

  • How to save and later load your trained XGBoost model using pickle.
  • How to save and later load your trained XGBoost model using joblib.

Discover how to configure, fit, tune and evaluation gradient boosting models with XGBoost in my new book, with 15 step-by-step tutorial lessons, and full python code.

Let’s get started.

  • Update Jan/2017: Updated to reflect changes in scikit-learn API version 0.18.1.
  • Update Mar/2018: Added alternate link to download the dataset as the original appears to have been taken down.
  • Update Oct/2019: Updated to use Joblib API directly.
How to Save Gradient Boosting Models with XGBoost in Python

How to Save Gradient Boosting Models with XGBoost in Python
Photo by Keoni Cabral, some rights reserved.

Need help with XGBoost in Python?

Take my free 7-day email course and discover configuration, tuning and more (with sample code).

Click to sign-up now and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

Serialize Your XGBoost Model with Pickle

Pickle is the standard way of serializing objects in Python.

You can use the Python pickle API to serialize your machine learning algorithms and save the serialized format to a file, for example:

Later you can load this file to deserialize your model and use it to make new predictions, for example:

The example below demonstrates how you can train a XGBoost model on the Pima Indians onset of diabetes dataset, save the model to file and later load it to make predictions (update: download from here).

The full code listing is provided below for completeness.

Running this example saves your trained XGBoost model to the pima.pickle.dat pickle file in the current working directory.

After loading the model and making predictions on the training dataset, the accuracy of the model is printed.

Serialize XGBoost Model with joblib

Joblib is part of the SciPy ecosystem and provides utilities for pipelining Python jobs.

The Joblib API provides utilities for saving and loading Python objects that make use of NumPy data structures, efficiently. It may be a faster approach for you to use with very large models.

The API looks a lot like the pickle API, for example, you may save your trained model as follows:

You can later load the model from file and use it to make predictions as follows:

The example below demonstrates how you can train an XGBoost model for classification on the Pima Indians onset of diabetes dataset, save the model to file using Joblib and load it at a later time in order to make predictions.

Running the example saves the model to file as pima.joblib.dat in the current working directory and also creates one file for each NumPy array within the model (in this case two additional files).

After the model is loaded, it is evaluated on the training dataset and the accuracy of the predictions is printed.

Summary

In this post, you discovered how to serialize your trained XGBoost models and later load them in order to make predictions.

Specifically, you learned:

  • How to serialize and later load your trained XGBoost model using the pickle API.
  • How to serialize and later load your trained XGBoost model using the joblib API.

Do you have any questions about serializing your XGBoost models or about this post? Ask your questions in the comments and I will do my best to answer.

Discover The Algorithm Winning Competitions!

XGBoost With Python

Develop Your Own XGBoost Models in Minutes

...with just a few lines of Python

Discover how in my new Ebook:
XGBoost With Python

It covers self-study tutorials like:
Algorithm Fundamentals, Scaling, Hyperparameters, and much more...

Bring The Power of XGBoost To Your Own Projects

Skip the Academics. Just Results.

See What's Inside

6 Responses to How to Save Gradient Boosting Models with XGBoost in Python

  1. koji June 23, 2018 at 1:18 am #

    Hi, Jason. Thank you for sharing your knowledge and I enjoy reading your posts.
    By the way, is there any point to pickle a XGBoost model instead of using like xgb.Booster(model_file=’model.model’)?

    Here is my experiment.

    %timeit model = xgb.Booster(model_file=’model.model’)
    118 µs ± 1.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

    pickle.dump(model, open(“model.pickle”, “wb”))
    %timeit loaded_model = pickle.load(open(“model.pickle”, “rb”))
    139 µs ± 1.54 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

    I am currently looking for a better way how I use the XGBoost model in the production. My concern is that reading a file might be slow if there are many requests from the client side.

    • Jason Brownlee June 23, 2018 at 6:20 am #

      I don’t think so, it really depends on your project/code.

  2. Ran Feldesh October 6, 2018 at 5:59 am #

    Maybe due to sklearn version, running the code as is results in an error, ‘cross_validation’ is not found. Deleting this (while keeping ‘train_test_split’) and revising the relevant import statement to ‘from sklearn.model_selection import train_test_split’, just like your other XGBoost tutorial, solves this.

    • Jason Brownlee October 6, 2018 at 11:40 am #

      You must use scikit-learn version 0.8 or higher.

  3. John November 4, 2018 at 1:56 am #

    Hi Great Post,

    It would be great if you can write a tutorial detailing how to convert a xgboost model to pmml. Some explanations on PMMLPipeline and how to properly use it to generate pmml using sklearn2pmml would be really helpful

Leave a Reply