A Gentle Introduction to XGBoost Loss Functions

By Jason Brownlee on April 14, 2021 in XGBoost 5

XGBoost is a powerful and popular implementation of the gradient boosting ensemble algorithm.

An important aspect in configuring XGBoost models is the choice of loss function that is minimized during the training of the model.

The loss function must be matched to the predictive modeling problem type, in the same way we must choose appropriate loss functions based on problem types with deep learning neural networks.

In this tutorial, you will discover how to configure loss functions for XGBoost ensemble models.

After completing this tutorial, you will know:

Specifying loss functions used when training XGBoost ensembles is a critical step, much like neural networks.
How to configure XGBoost loss functions for binary and multi-class classification tasks.
How to configure XGBoost loss functions for regression predictive modeling tasks.

Let’s get started.

A Gentle Introduction to XGBoost Loss Functions
Photo by Kevin Rheese, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

XGBoost and Loss Functions
XGBoost Loss for Classification
XGBoost Loss for Regression

XGBoost and Loss Functions

Extreme Gradient Boosting, or XGBoost for short, is an efficient open-source implementation of the gradient boosting algorithm. As such, XGBoost is an algorithm, an open-source project, and a Python library.

It was initially developed by Tianqi Chen and was described by Chen and Carlos Guestrin in their 2016 paper titled “XGBoost: A Scalable Tree Boosting System.”

It is designed to be both computationally efficient (e.g. fast to execute) and highly effective, perhaps more effective than other open-source implementations.

XGBoost supports a range of different predictive modeling problems, most notably classification and regression.

XGBoost is trained by minimizing loss of an objective function against a dataset. As such, the choice of loss function is a critical hyperparameter and tied directly to the type of problem being solved, much like deep learning neural networks.

The implementation allows the objective function to be specified via the “objective” hyperparameter, and sensible defaults are used that work for most cases.

Nevertheless, there remains some confusion by beginners as to what loss function to use when training XGBoost models.

We will take a closer look at how to configure the loss function for XGBoost in this tutorial.

Before we get started, let’s get setup.

XGBoost can be installed as a standalone library and an XGBoost model can be developed using the scikit-learn API.

The first step is to install the XGBoost library if it is not already installed. This can be achieved using the pip python package manager on most platforms; for example:

sudo pip install xgboost

1	sudo pip install xgboost

You can then confirm that the XGBoost library was installed correctly and can be used by running the following script.

# check xgboost version
import xgboost
print(xgboost.__version__)

# check xgboost version

import xgboost

print(xgboost.__version__)

Running the script will print your version of the XGBoost library you have installed.

Your version should be the same or higher. If not, you must upgrade your version of the XGBoost library.

1.1.1

1.1.1

It is possible that you may have problems with the latest version of the library. It is not your fault.

Sometimes, the most recent version of the library imposes additional requirements or may be less stable.

If you do have errors when trying to run the above script, I recommend downgrading to version 1.0.1 (or lower). This can be achieved by specifying the version to install to the pip command, as follows:

sudo pip install xgboost==1.0.1

1	sudo pip install xgboost==1.0.1

If you see a warning message, you can safely ignore it for now. For example, below is an example of a warning message that you may see and can ignore:

FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

1	FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

If you require specific instructions for your development environment, see the tutorial:

XGBoost Installation Guide

The XGBoost library has its own custom API, although we will use the method via the scikit-learn wrapper classes: XGBRegressor and XGBClassifier. This will allow us to use the full suite of tools from the scikit-learn machine learning library to prepare data and evaluate models.

Both models operate the same way and take the same arguments that influence how the decision trees are created and added to the ensemble.

For more on how to use the XGBoost API with scikit-learn, see the tutorial:

Extreme Gradient Boosting (XGBoost) Ensemble in Python

Next, let’s take a closer look at how to configure the loss function for XGBoost on classification problems.

XGBoost Loss for Classification

Classification tasks involve predicting a label or probability for each possible class, given an input sample.

There are two main types of classification tasks with mutually exclusive labels: binary classification that has two class labels, and multi-class classification that have more than two class labels.

Binary Classification: Classification task with two class labels.
Multi-Class Classification: Classification task with more than two class labels.

For more on the different types of classification tasks, see the tutorial:

4 Types of Classification Tasks in Machine Learning

XGBoost provides loss functions for each of these problem types.

It is typical in machine learning to train a model to predict the probability of class membership for probability tasks and if the task requires crisp class labels to post-process the predicted probabilities (e.g. use argmax).

This approach is used when training deep learning neural networks for classification, and is also recommended when using XGBoost for classification.

The loss function used for predicting probabilities for binary classification problems is “binary:logistic” and the loss function for predicting class probabilities for multi-class problems is “multi:softprob“.

“binary:logistic“: XGBoost loss function for binary classification.
“multi:softprob“: XGBoost loss function for multi-class classification.

These string values can be specified via the “objective” hyperparameter when configuring your XGBClassifier model.

For example, for binary classification:

...
# define the model for binary classification
model = XGBClassifier(objective='binary:logistic')

...

# define the model for binary classification

model = XGBClassifier(objective='binary:logistic')

And, for multi-class classification:

...
# define the model for multi-class classification
model = XGBClassifier(objective='multi:softprob')

...

# define the model for multi-class classification

model = XGBClassifier(objective='multi:softprob')

Importantly, if you do not specify the “objective” hyperparameter, the XGBClassifier will automatically choose one of these loss functions based on the data provided during training.

We can make this concrete with a worked example.

The example below creates a synthetic binary classification dataset, fits an XGBClassifier on the dataset with default hyperparameters, then prints the model objective configuration.

# example of automatically choosing the loss function for binary classification
from sklearn.datasets import make_classification
from xgboost import XGBClassifier
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# define the model
model = XGBClassifier()
# fit the model
model.fit(X, y)
# summarize the model loss function
print(model.objective)

# example of automatically choosing the loss function for binary classification

from sklearn.datasets import make_classification

from xgboost import XGBClassifier

# define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# define the model

model = XGBClassifier()

# fit the model

model.fit(X, y)

# summarize the model loss function

print(model.objective)

Running the example fits the model on the dataset and prints the loss function configuration.

We can see the model automatically choose a loss function for binary classification.

binary:logistic

1	binary:logistic

Alternately, we can specify the objective and fit the model, confirming the loss function was used.

# example of manually specifying the loss function for binary classification
from sklearn.datasets import make_classification
from xgboost import XGBClassifier
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# define the model
model = XGBClassifier(objective='binary:logistic')
# fit the model
model.fit(X, y)
# summarize the model loss function
print(model.objective)

# example of manually specifying the loss function for binary classification

from sklearn.datasets import make_classification

from xgboost import XGBClassifier

# define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# define the model

model = XGBClassifier(objective='binary:logistic')

# fit the model

model.fit(X, y)

# summarize the model loss function

print(model.objective)

Running the example fits the model on the dataset and prints the loss function configuration.

We can see the model used to specify a loss function for binary classification.

binary:logistic

1	binary:logistic

Let’s repeat this example on a dataset with more than two classes. In this case, three classes.

The complete example is listed below.

# example of automatically choosing the loss function for multi-class classification
from sklearn.datasets import make_classification
from xgboost import XGBClassifier
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1, n_classes=3)
# define the model
model = XGBClassifier()
# fit the model
model.fit(X, y)
# summarize the model loss function
print(model.objective)

# example of automatically choosing the loss function for multi-class classification

from sklearn.datasets import make_classification

from xgboost import XGBClassifier

# define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1, n_classes=3)

# define the model

model = XGBClassifier()

# fit the model

model.fit(X, y)

# summarize the model loss function

print(model.objective)

Running the example fits the model on the dataset and prints the loss function configuration.

We can see the model automatically chose a loss function for multi-class classification.

multi:softprob

1	multi:softprob

Alternately, we can manually specify the loss function and confirm it was used to train the model.

# example of manually specifying the loss function for multi-class classification
from sklearn.datasets import make_classification
from xgboost import XGBClassifier
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1, n_classes=3)
# define the model
model = XGBClassifier(objective="multi:softprob")
# fit the model
model.fit(X, y)
# summarize the model loss function
print(model.objective)

# example of manually specifying the loss function for multi-class classification

from sklearn.datasets import make_classification

from xgboost import XGBClassifier

# define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1, n_classes=3)

# define the model

model = XGBClassifier(objective="multi:softprob")

# fit the model

model.fit(X, y)

# summarize the model loss function

print(model.objective)

Running the example fits the model on the dataset and prints the loss function configuration.

We can see the model used to specify a loss function for multi-class classification.

multi:softprob

1	multi:softprob

Finally, there are other loss functions you can use for classification, including: “binary:logitraw” and “binary:hinge” for binary classification and “multi:softmax” for multi-class classification.

You can see a full list here:

Learning Task Parameters: objective.

Next, let’s take a look at XGBoost loss functions for regression.

XGBoost Loss for Regression

Regression refers to predictive modeling problems where a numerical value is predicted given an input sample.

Although predicting a probability sounds like a regression problem (i.e. a probability is a numerical value), it is generally not considered a regression type predictive modeling problem.

The XGBoost objective function used when predicting numerical values is the “reg:squarederror” loss function.

“reg:squarederror”: Loss function for regression predictive modeling problems.

This string value can be specified via the “objective” hyperparameter when configuring your XGBRegressor model.

For example:

...
# define the model for regression
model = XGBRegressor(objective='reg:squarederror')

...

# define the model for regression

model = XGBRegressor(objective='reg:squarederror')

Importantly, if you do not specify the “objective” hyperparameter, the XGBRegressor will automatically choose this objective function for you.

We can make this concrete with a worked example.

The example below creates a synthetic regression dataset, fits an XGBRegressor on the dataset, then prints the model objective configuration.

# example of automatically choosing the loss function for regression
from sklearn.datasets import make_regression
from xgboost import XGBRegressor
# define dataset
X, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=7)
# define the model
model = XGBRegressor()
# fit the model
model.fit(X, y)
# summarize the model loss function
print(model.objective)

# example of automatically choosing the loss function for regression

from sklearn.datasets import make_regression

from xgboost import XGBRegressor

# define dataset

X, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=7)

# define the model

model = XGBRegressor()

# fit the model

model.fit(X, y)

# summarize the model loss function

print(model.objective)

Running the example fits the model on the dataset and prints the loss function configuration.

We can see the model automatically choose a loss function for regression.

reg:squarederror

1	reg:squarederror

Alternately, we can specify the objective and fit the model, confirming the loss function was used.

# example of manually specifying the loss function for regression
from sklearn.datasets import make_regression
from xgboost import XGBRegressor
# define dataset
X, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=7)
# define the model
model = XGBRegressor(objective='reg:squarederror')
# fit the model
model.fit(X, y)
# summarize the model loss function
print(model.objective)

# example of manually specifying the loss function for regression

from sklearn.datasets import make_regression

from xgboost import XGBRegressor

# define dataset

X, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1, random_state=7)

# define the model

model = XGBRegressor(objective='reg:squarederror')

# fit the model

model.fit(X, y)

# summarize the model loss function

print(model.objective)

Running the example fits the model on the dataset and prints the loss function configuration.

We can see the model used the specified a loss function for regression.

reg:squarederror

1	reg:squarederror

Finally, there are other loss functions you can use for regression, including: “reg:squaredlogerror“, “reg:logistic“, “reg:pseudohubererror“, “reg:gamma“, and “reg:tweedie“.

You can see a full list here:

Learning Task Parameters: objective.

Summary

In this tutorial, you discovered how to configure loss functions for XGBoost ensemble models.

Specifically, you learned:

Specifying loss functions used when training XGBoost ensembles is a critical step much like neural networks.
How to configure XGBoost loss functions for binary and multi-class classification tasks.
How to configure XGBoost loss functions for regression predictive modeling tasks.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

5 Responses to A Gentle Introduction to XGBoost Loss Functions

Anthony The Koala March 26, 2021 at 9:24 am #

Dear Dr Jason,
The late>st xgboost version is 1.3.3

import xgboost xgboost.__version__ '1.3.3'

1
2
3

import xgboost
xgboost.__version__
'1.3.3'

Thank you
Anthony of Sydney

- Jason Brownlee March 29, 2021 at 5:48 am #
  
  Nice work.
  
Mahdi April 13, 2021 at 3:55 pm #

Hi Jason.

This is such a great and informative post. A few questions/comments:

1. In the snippet below, should the first line say “binary:logistic” instead of “multi:logistic”?

“multi:logistic“: XGBoost loss function for binary classification.
“multi:softprob“: XGBoost loss function for multi-class classification.

2. Also, in the snippet below, the second line should read “multi:softprob” instead of “binary:softprob”?

# define the model for multi-class classification
model = XGBClassifier(objective=’binary:softprob’)

Thank you!
Mahdi

- Jason Brownlee April 14, 2021 at 6:20 am #
  
  Thanks.
  
  Fixed!
  
Sergio July 2, 2022 at 8:47 pm #

Hi, do you use any particular function for binary/multi-class multilabel tasks? Thank you.

Navigation

A Gentle Introduction to XGBoost Loss Functions

Tutorial Overview

XGBoost and Loss Functions

XGBoost Loss for Classification

XGBoost Loss for Regression

Further Reading

Tutorials

APIs

Summary

Discover The Algorithm Winning Competitions!

Develop Your Own XGBoost Models in Minutes

Bring The Power of XGBoost To Your Own Projects

More On This Topic

5 Responses to A Gentle Introduction to XGBoost Loss Functions

Leave a Reply Click here to cancel reply.