How to Develop Multi-Output Regression Models with Python

By Jason Brownlee on April 27, 2021 in Ensemble Learning 232

Multioutput regression are regression problems that involve predicting two or more numerical values given an input example.

An example might be to predict a coordinate given an input, e.g. predicting x and y values. Another example would be multi-step time series forecasting that involves predicting multiple future time series of a given variable.

Many machine learning algorithms are designed for predicting a single numeric value, referred to simply as regression. Some algorithms do support multioutput regression inherently, such as linear regression and decision trees. There are also special workaround models that can be used to wrap and use those algorithms that do not natively support predicting multiple outputs.

In this tutorial, you will discover how to develop machine learning models for multioutput regression.

After completing this tutorial, you will know:

The problem of multioutput regression in machine learning.
How to develop machine learning models that inherently support multiple-output regression.
How to develop wrapper models that allow algorithms that do not inherently support multiple outputs to be used for multiple-output regression.

Kick-start your project with my new book Ensemble Learning Algorithms With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Updated Aug/2020: Elaborated examples of wrapper models.

How to Develop Multioutput Regression Models in Python
Photo by a_terracini, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

Problem of Multioutput Regression
1. Check Scikit-Learn Version
2. Multioutput Regression Test Problem
Inherently Multioutput Regression Algorithms
1. Linear Regression for Multioutput Regression
2. k-Nearest Neighbors for Multioutput Regression
3. Evaluate Multioutput Regression With Cross-Validation
Wrapper Multioutput Regression Algorithms
Direct Multioutput Regression
Chained Multioutput Regression

Problem of Multioutput Regression

Regression refers to a predictive modeling problem that involves predicting a numerical value.

For example, predicting a size, weight, amount, number of sales, and number of clicks are regression problems. Typically, a single numeric value is predicted given input variables.

Some regression problems require the prediction of two or more numeric values. For example, predicting an x and y coordinate.

These problems are referred to as multiple-output regression, or multioutput regression.

Regression: Predict a single numeric output given an input.
Multioutput Regression: Predict two or more numeric outputs given an input.

In multioutput regression, typically the outputs are dependent upon the input and upon each other. This means that often the outputs are not independent of each other and may require a model that predicts both outputs together or each output contingent upon the other outputs.

Multi-step time series forecasting may be considered a type of multiple-output regression where a sequence of future values are predicted and each predicted value is dependent upon the prior values in the sequence.

There are a number of strategies for handling multioutput regression and we will explore some of them in this tutorial.

Want to Get Started With Ensemble Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Check Scikit-Learn Version

First, confirm that you have a modern version of the scikit-learn library installed.

This is important because some of the models we will explore in this tutorial require a modern version of the library.

You can check the version of the library with the following code example:

# check scikit-learn version
import sklearn
print(sklearn.__version__)

# check scikit-learn version

import sklearn

print(sklearn.__version__)

Running the example will print the version of the library.

At the time of writing, this is about version 0.22. You need to be using this version of scikit-learn or higher.

0.22.1

0.22.1

Multioutput Regression Test Problem

We can define a test problem that we can use to demonstrate the different modeling strategies.

We will use the make_regression() function to create a test dataset for multiple-output regression. We will generate 1,000 examples with 10 input features, five of which will be redundant and five that will be informative. The problem will require the prediction of two numeric values.

Problem Input: 10 numeric variables.
Problem Output: 2 numeric variables.

The example below generates the dataset and summarizes the shape.

# example of multioutput regression test problem
from sklearn.datasets import make_regression
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
# summarize dataset
print(X.shape, y.shape)

# example of multioutput regression test problem

from sklearn.datasets import make_regression

# create datasets

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

# summarize dataset

print(X.shape, y.shape)

Running the example creates the dataset and summarizes the shape of the input and output elements of the dataset for modeling, confirming the chosen configuration.

(1000, 10) (1000, 2)

1	(1000, 10) (1000, 2)

Next, let’s look at modeling this problem directly.

Inherently Multioutput Regression Algorithms

Some regression machine learning algorithms support multiple outputs directly.

This includes most of the popular machine learning algorithms implemented in the scikit-learn library, such as:

LinearRegression (and related)
KNeighborsRegressor
DecisionTreeRegressor
RandomForestRegressor (and related)

Let’s look at a few examples to make this concrete.

Linear Regression for Multioutput Regression

The example below fits a linear regression model on the multioutput regression dataset, then makes a single prediction with the fit model.

# linear regression for multioutput regression
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
# define model
model = LinearRegression()
# fit model
model.fit(X, y)
# make a prediction
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = model.predict([row])
# summarize prediction
print(yhat[0])

# linear regression for multioutput regression

from sklearn.datasets import make_regression

from sklearn.linear_model import LinearRegression

# create datasets

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

# define model

model = LinearRegression()

# fit model

model.fit(X, y)

# make a prediction

row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]

yhat = model.predict([row])

# summarize prediction

print(yhat[0])

Running the example fits the model and then makes a prediction for one input, confirming that the model predicted two required values.

[-11.73511093  52.78406297]

1	[-11.73511093 52.78406297]

k-Nearest Neighbors for Multioutput Regression

The example below fits a k-nearest neighbors model on the multioutput regression dataset, then makes a single prediction with the fit model.

# k-nearest neighbors for multioutput regression
from sklearn.datasets import make_regression
from sklearn.neighbors import KNeighborsRegressor
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
# define model
model = KNeighborsRegressor()
# fit model
model.fit(X, y)
# make a prediction
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = model.predict([row])
# summarize prediction
print(yhat[0])

# k-nearest neighbors for multioutput regression

from sklearn.datasets import make_regression

from sklearn.neighbors import KNeighborsRegressor

# create datasets

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

# define model

model = KNeighborsRegressor()

# fit model

model.fit(X, y)

# make a prediction

row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]

yhat = model.predict([row])

# summarize prediction

print(yhat[0])

Running the example fits the model and then makes a prediction for one input, confirming that the model predicted two required values.

[-11.73511093  52.78406297]

1	[-11.73511093 52.78406297]

Decision Tree for Multioutput Regression

The example below fits a decision tree model on the multioutput regression dataset, then makes a single prediction with the fit model.

# decision tree for multioutput regression
from sklearn.datasets import make_regression
from sklearn.tree import DecisionTreeRegressor
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
# define model
model = DecisionTreeRegressor()
# fit model
model.fit(X, y)
# make a prediction
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = model.predict([row])
# summarize prediction
print(yhat[0])

# decision tree for multioutput regression

from sklearn.datasets import make_regression

from sklearn.tree import DecisionTreeRegressor

# create datasets

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

# define model

model = DecisionTreeRegressor()

# fit model

model.fit(X, y)

# make a prediction

row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]

yhat = model.predict([row])

# summarize prediction

print(yhat[0])

Running the example fits the model and then makes a prediction for one input, confirming that the model predicted two required values.

[49.93137149 64.08484989]

1	[49.93137149 64.08484989]

Evaluate Multioutput Regression With Cross-Validation

We may want to evaluate a multioutput regression using k-fold cross-validation.

This can be achieved in the same way as evaluating any other machine learning model.

We will fit and evaluate a DecisionTreeRegressor model on the test problem using 10-fold cross-validation with three repeats. We will use the mean absolute error (MAE) performance metric as the score.

The complete example is listed below.

# evaluate multioutput regression model with k-fold cross-validation
from numpy import absolute
from numpy import mean
from numpy import std
from sklearn.datasets import make_regression
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
# define model
model = DecisionTreeRegressor()
# define the evaluation procedure
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the model and collect the scores
n_scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
# force the scores to be positive
n_scores = absolute(n_scores)
# summarize performance
print('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

# evaluate multioutput regression model with k-fold cross-validation

from numpy import absolute

from numpy import mean

from numpy import std

from sklearn.datasets import make_regression

from sklearn.tree import DecisionTreeRegressor

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import RepeatedKFold

# create datasets

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

# define model

model = DecisionTreeRegressor()

# define the evaluation procedure

cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)

# evaluate the model and collect the scores

n_scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)

# force the scores to be positive

n_scores = absolute(n_scores)

# summarize performance

print('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

Running the example evaluates the performance of the decision tree model for multioutput regression on the test problem. The mean and standard deviation of the MAE is reported calculated across all folds and all repeats.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Importantly, error is reported across both output variables, rather than separate error scores for each output variable.

MAE: 51.817 (2.863)

1	MAE: 51.817 (2.863)

Wrapper Multioutput Regression Algorithms

Not all regression algorithms support multioutput regression.

One example is the support vector machine, although for regression, it is referred to as support vector regression, or SVR.

This algorithm does not support multiple outputs for a regression problem and will raise an error. We can demonstrate this with an example, listed below.

# failure of support vector regression for multioutput regression (causes an error)
from sklearn.datasets import make_regression
from sklearn.svm import LinearSVR
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)
# define model
model = LinearSVR()
# fit model
# (THIS WILL CAUSE AN ERROR!)
model.fit(X, y)

# failure of support vector regression for multioutput regression (causes an error)

from sklearn.datasets import make_regression

from sklearn.svm import LinearSVR

# create datasets

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)

# define model

model = LinearSVR()

# fit model

# (THIS WILL CAUSE AN ERROR!)

model.fit(X, y)

Running the example reports an error message indicating that the model does not support multioutput regression.

ValueError: bad input shape (1000, 2)

1	ValueError: bad input shape (1000, 2)

A workaround for using regression models designed for predicting one value for multioutput regression is to divide the multioutput regression problem into multiple sub-problems.

The most obvious way to do this is to split a multioutput regression problem into multiple single-output regression problems.

For example, if a multioutput regression problem required the prediction of three values y1, y2 and y3 given an input X, then this could be partitioned into three single-output regression problems:

Problem 1: Given X, predict y1.
Problem 2: Given X, predict y2.
Problem 3: Given X, predict y3.

There are two main approaches to implementing this technique.

The first approach involves developing a separate regression model for each output value to be predicted. We can think of this as a direct approach, as each target value is modeled directly.

The second approach is an extension of the first method except the models are organized into a chain. The prediction from the first model is taken as part of the input to the second model, and the process of output-to-input dependency repeats along the chain of models.

Direct Multioutput: Develop an independent model for each numerical value to be predicted.
Chained Multioutput: Develop a sequence of dependent models to match the number of numerical values to be predicted.

Let’s take a closer look at each of these techniques in turn.

Direct Multioutput Regression

The direct approach to multioutput regression involves dividing the regression problem into a separate problem for each target variable to be predicted.

This assumes that the outputs are independent of each other, which might not be a correct assumption. Nevertheless, this approach can provide surprisingly effective predictions on a range of problems and may be worth trying, at least as a performance baseline.

For example, the outputs for your problem may, in fact, be mostly independent, if not completely independent, and this strategy can help you find out.

This approach is supported by the MultiOutputRegressor class that takes a regression model as an argument. It will then create one instance of the provided model for each output in the problem.

The example below demonstrates how we can first create a single-output regression model then use the MultiOutputRegressor class to wrap the regression model and add support for multioutput regression.

...
# define base model
model = LinearSVR()
# define the direct multioutput wrapper model
wrapper = MultiOutputRegressor(model)

...

# define base model

model = LinearSVR()

# define the direct multioutput wrapper model

wrapper = MultiOutputRegressor(model)

We can demonstrate this strategy with a worked example on our synthetic multioutput regression problem.

The example below demonstrates evaluating the MultiOutputRegressor class with linear SVR using repeated k-fold cross-validation and reporting the average mean absolute error (MAE) across all folds and repeats.

The complete example is listed below.

# example of evaluating direct multioutput regression with an SVM model
from numpy import mean
from numpy import std
from numpy import absolute
from sklearn.datasets import make_regression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
from sklearn.multioutput import MultiOutputRegressor
from sklearn.svm import LinearSVR
# define dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
# define base model
model = LinearSVR()
# define the direct multioutput wrapper model
wrapper = MultiOutputRegressor(model)
# define the evaluation procedure
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the model and collect the scores
n_scores = cross_val_score(wrapper, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
# force the scores to be positive
n_scores = absolute(n_scores)
# summarize performance
print('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

# example of evaluating direct multioutput regression with an SVM model

from numpy import mean

from numpy import std

from numpy import absolute

from sklearn.datasets import make_regression

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import RepeatedKFold

from sklearn.multioutput import MultiOutputRegressor

from sklearn.svm import LinearSVR

# define dataset

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

# define base model

model = LinearSVR()

# define the direct multioutput wrapper model

wrapper = MultiOutputRegressor(model)

# define the evaluation procedure

cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)

# evaluate the model and collect the scores

n_scores = cross_val_score(wrapper, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)

# force the scores to be positive

n_scores = absolute(n_scores)

# summarize performance

print('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

Running the example reports the mean and standard deviation MAE of the direct wrapper model.

In this case, we can see that the Linear SVR model wrapped by the direct multioutput regression strategy achieved a MAE of about 0.419.

MAE: 0.419 (0.024)

1	MAE: 0.419 (0.024)

We can also use the direct multioutput regression wrapper as a final model and make predictions on new data.

First, the model is fit on all available data, then the predict() function can be called to make predictions on new data.

The example below demonstrates this on our synthetic multioutput regression dataset.

# example of making a prediction with the direct multioutput regression model
from sklearn.datasets import make_regression
from sklearn.multioutput import MultiOutputRegressor
from sklearn.svm import LinearSVR
# define dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
# define base model
model = LinearSVR()
# define the direct multioutput wrapper model
wrapper = MultiOutputRegressor(model)
# fit the model on the whole dataset
wrapper.fit(X, y)
# make a single prediction
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = wrapper.predict([row])
# summarize the prediction
print('Predicted: %s' % yhat[0])

# example of making a prediction with the direct multioutput regression model

from sklearn.datasets import make_regression

from sklearn.multioutput import MultiOutputRegressor

from sklearn.svm import LinearSVR

# define dataset

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

# define base model

model = LinearSVR()

# define the direct multioutput wrapper model

wrapper = MultiOutputRegressor(model)

# fit the model on the whole dataset

wrapper.fit(X, y)

# make a single prediction

row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]

yhat = wrapper.predict([row])

# summarize the prediction

print('Predicted: %s' % yhat[0])

Running the example fits the direct wrapper model on the entire dataset and is then used to make a prediction on a new row of data, as we might when using the model in an application.

Predicted: [50.01932887 64.49432991]

1	Predicted: [50.01932887 64.49432991]

Now that we are familiar with using the direct multioutput regression wrapper, let’s look at the chained method.

Chained Multioutput Regression

Another approach to using single-output regression models for multioutput regression is to create a linear sequence of models.

The first model in the sequence uses the input and predicts one output; the second model uses the input and the output from the first model to make a prediction; the third model uses the input and output from the first two models to make a prediction, and so on.

For example, if a multioutput regression problem required the prediction of three values y1, y2 and y3 given an input X, then this could be partitioned into three dependent single-output regression problems as follows:

Problem 1: Given X, predict y1.
Problem 2: Given X and yhat1, predict y2.
Problem 3: Given X, yhat1, and yhat2, predict y3.

This can be achieved using the RegressorChain class in the scikit-learn library.

The order of the models may be based on the order of the outputs in the dataset (the default) or specified via the “order” argument. For example, order=[0,1] would first predict the oth output, then the 1st output, whereas an order=[1,0] would first predict the last output variable and then the first output variable in our test problem.

The example below demonstrates how we can first create a single-output regression model then use the RegressorChain class to wrap the regression model and add support for multioutput regression.

...
# define base model
model = LinearSVR()
# define the chained multioutput wrapper model
wrapper = RegressorChain(model, order=[0,1])

...

# define base model

model = LinearSVR()

# define the chained multioutput wrapper model

wrapper = RegressorChain(model, order=[0,1])

We can demonstrate this strategy with a worked example on our synthetic multioutput regression problem.

The example below demonstrates evaluating the RegressorChain class with linear SVR using repeated k-fold cross-validation and reporting the average mean absolute error (MAE) across all folds and repeats.

The complete example is listed below.

# example of evaluating chained multioutput regression with an SVM model
from numpy import mean
from numpy import std
from numpy import absolute
from sklearn.datasets import make_regression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
from sklearn.multioutput import RegressorChain
from sklearn.svm import LinearSVR
# define dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
# define base model
model = LinearSVR()
# define the chained multioutput wrapper model
wrapper = RegressorChain(model)
# define the evaluation procedure
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the model and collect the scores
n_scores = cross_val_score(wrapper, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
# force the scores to be positive
n_scores = absolute(n_scores)
# summarize performance
print('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

# example of evaluating chained multioutput regression with an SVM model

from numpy import mean

from numpy import std

from numpy import absolute

from sklearn.datasets import make_regression

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import RepeatedKFold

from sklearn.multioutput import RegressorChain

from sklearn.svm import LinearSVR

# define dataset

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

# define base model

model = LinearSVR()

# define the chained multioutput wrapper model

wrapper = RegressorChain(model)

# define the evaluation procedure

cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)

# evaluate the model and collect the scores

n_scores = cross_val_score(wrapper, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)

# force the scores to be positive

n_scores = absolute(n_scores)

# summarize performance

print('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

Running the example reports the mean and standard deviation MAE of the chained wrapper model.
Note that you may see a ConvergenceWarning when running the example, which can be safely ignored.

In this case, we can see that the Linear SVR model wrapped by the chained multioutput regression strategy achieved a MAE of about 0.643.

MAE: 0.643 (0.313)

1	MAE: 0.643 (0.313)

We can also use the chained multioutput regression wrapper as a final model and make predictions on new data.

First, the model is fit on all available data, then the predict() function can be called to make predictions on new data.

The example below demonstrates this on our synthetic multioutput regression dataset.

# example of making a prediction with the chained multioutput regression model
from sklearn.datasets import make_regression
from sklearn.multioutput import RegressorChain
from sklearn.svm import LinearSVR
# define dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
# define base model
model = LinearSVR()
# define the chained multioutput wrapper model
wrapper = RegressorChain(model)
# fit the model on the whole dataset
wrapper.fit(X, y)
# make a single prediction
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = wrapper.predict([row])
# summarize the prediction
print('Predicted: %s' % yhat[0])

# example of making a prediction with the chained multioutput regression model

from sklearn.datasets import make_regression

from sklearn.multioutput import RegressorChain

from sklearn.svm import LinearSVR

# define dataset

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

# define base model

model = LinearSVR()

# define the chained multioutput wrapper model

wrapper = RegressorChain(model)

# fit the model on the whole dataset

wrapper.fit(X, y)

# make a single prediction

row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]

yhat = wrapper.predict([row])

# summarize the prediction

print('Predicted: %s' % yhat[0])

Running the example fits the chained wrapper model on the entire dataset and is then used to make a prediction on a new row of data, as we might when using the model in an application.

Predicted: [50.03206    64.73673318]

1	Predicted: [50.03206 64.73673318]

Summary

In this tutorial, you discovered how to develop machine learning models for multioutput regression.

Specifically, you learned:

The problem of multioutput regression in machine learning.
How to develop machine learning models that inherently support multiple-output regression.
How to develop wrapper models that allow algorithms that do not inherently support multiple outputs to be used for multiple-output regression.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

232 Responses to How to Develop Multi-Output Regression Models with Python

Patrick March 27, 2020 at 2:29 pm #

Thank you for this post. I was not aware that scikit-learn had those wrapper classes. That is very handy.

Thanks for show how to use them in a very clear straightforward way.

Reply
- Jason Brownlee March 28, 2020 at 6:11 am #
  
  You’re welcome.
  
  Reply
Salvatore Parisi March 28, 2020 at 2:46 am #

Sereasly interested.

Reply
- Jason Brownlee March 28, 2020 at 6:26 am #
  
  Thanks.
  
  Reply
Asmaa March 30, 2020 at 10:07 pm #

It is just amazing. Thank you

Reply
- Jason Brownlee March 31, 2020 at 8:08 am #
  
  You’re very welcome!
  
  Reply
Yuchuan March 31, 2020 at 2:21 pm #

Hi Dr. Jason,

Thanks for the meaningful tutorial article, it helps me a lot.

I just tested and found all of the code works well in sklearn 0.20, so we don’t necessarily need to update to 0.22 or higher.

Thanks again for the excellent article, looking forward to the next one.

Reply
- Jason Brownlee April 1, 2020 at 5:44 am #
  
  Thanks, great tip!
  
  Reply
Gary March 31, 2020 at 3:21 pm #

Hi Dr. Brownlee,

Thanks for the excellent work!

I have a question, can the wrapped model (either using MultiOutputRegressor or RegressorChain ) work with cross_val_score function? I tested the following code and found the score is 0.0, is there anything wrong with that?

from numpy import absolute
from numpy import mean
from numpy import std
from sklearn.datasets import make_regression
from sklearn.multioutput import MultiOutputRegressor
from sklearn.svm import LinearSVR
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)
# define model
model = LinearSVR()
wrapper = MultiOutputRegressor(model)
# evaluate model
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
n_scores = cross_val_score(wrapper, X, y, scoring=’neg_mean_absolute_error’, cv=cv, n_jobs=-1, error_score=’raise’)
# summarize performance
n_scores = absolute(n_scores)
print(‘Result: %.3f (%.3f)’ % (mean(n_scores), std(n_scores)))

Thank you in advance!

Reply
- Jason Brownlee April 1, 2020 at 5:47 am #
  
  Yes.
  
  A score of 0 means perfect predictions.
  
  Reply
Staffan Falk April 2, 2020 at 8:32 am #

Thanks for a really good and pedagogical tutorial. Fits my need exactly! 🙂

Reply
- Jason Brownlee April 2, 2020 at 8:36 am #
  
  Thanks, I’m very happy to hear that!
  
  Reply
Francisco April 3, 2020 at 3:04 am #

Hi Jason!

Excellent tutorial. Is it possible to apply a Transformed Target Regressor with this multi-output regression models?

Thanks in advance!

Reply
- Jason Brownlee April 3, 2020 at 6:57 am #
  
  Probably. You might have to experiment to confirm it works as expected.
  
  Reply
Sudipta Chowdhury April 18, 2020 at 2:35 am #

Hi Jason,

Thanks a lot for this tutorial. It suits my need perfectly.
I have a question regarding the correlation among different Y variables. Do the KNN or Random Forest models automatically consider the correlation among different Y while predicting? If they do that, is there any documentation on how it is done? I tried to look for it online, but found nothing. Could you please guide me to some resources? Thanks

Reply
- Jason Brownlee April 18, 2020 at 6:06 am #
  
  You’re welcome.
  
  No, but some algorithms are not bothered by colinear inputs (ensembles of trees I think), and some are (linear models).
  
  Reply
Bahar April 19, 2020 at 1:46 am #

Thanks for this very interesting tutorial. I have a question for some cases where we have for example about 500 inputs and 200 outputs. Does this Multi-Output Regression way work with such problems?(In fact I have a about 20 features and 500 input points that are extracted form scan-1 and 200 output points from scan-2.)

Reply
- Jason Brownlee April 19, 2020 at 5:59 am #
  
  You’re welcome.
  
  Perhaps develop a quick prototype and see if the methods are appropriate and effective?
  
  Reply
  - Bahar April 20, 2020 at 8:41 pm #
    
    Thanks for your reply:), the problem is my current data is not completely ready and I have to wait, so I was also thinking about Deep-learning methods such as convolution, but I have to do some research on it. Do you have any suggestion for such types of problems? It is something like image processing because it the input data and the output data are points that are extracted from scan(images).
    
    Reply
    - Jason Brownlee April 21, 2020 at 5:54 am #
      
      If the input data are images, then CNNs would be a good method to explore.
      
      Reply
      - Bahar April 22, 2020 at 6:33 pm #
        
        Actually input data are digits that are extracted from images. But as the number of my sample data are limited (about 30 samples) I cannot use CNNs. And I decided to convert one image into about 500 points/digits. Can you please let me know your opinion?
      - Jason Brownlee April 23, 2020 at 6:00 am #
        
        Test a suite of data preparations and models and discover what works best for your dataset.
pratyush April 21, 2020 at 5:17 am #

This is great. But i have a question on the metrics of these. Based on your test, which one would you say provided the best prediction?

Reply
- Jason Brownlee April 21, 2020 at 6:06 am #
  
  It depends on your dataset.
  
  You must use controlled experiments to discover what works best on your project.
  
  Reply
Sanjan April 24, 2020 at 2:22 am #

Why not simply use a neural network with multiple neurons in output layer or build a multi-target decision tree and let the model figure out patterns instead of us (the modelers) having to decide whether or not the outputs are correlated/ordered? If there’s a strong rationale for ordered outputs, then RegressorChain might be a good option.

PS: I believe by default sklearn’s decision trees are amenable to this as per this post https://stackoverflow.com/questions/46062774/does-scikit-learns-decisiontreeregressor-do-true-multi-output-regression

Reply
- Jason Brownlee April 24, 2020 at 5:50 am #
  
  You can use a neural net, but we cannot know which model will best for a given dataset, so we must test many different methods and select the simplest well performing model.
  
  Reply
César Magno April 27, 2020 at 1:46 pm #

Jason your tutorials is simply amazing. I love them!
Congrats for this awesome material.

Reply
- Jason Brownlee April 28, 2020 at 6:40 am #
  
  Thanks!
  
  Reply
Dennis May 3, 2020 at 2:06 am #

It seems wrapper method doesn’t work on some ensemble, I tried this:

model = GradientBoostingRegressor(ExtraTreesRegressor())
wrapper = MultiOutputRegressor(model)
wrapper.fit(x,y)

it said “TypeError: unsupported format string passed to ExtraTreesRegressor.__format__”

But it works on AdaBoostRegressor.

Reply
- Dennis May 3, 2020 at 2:08 am #
  
  Anyway, this is a good tutorial, and it gave many ideas to my final year project, thank you Jason!
  
  Reply
  - Jason Brownlee May 3, 2020 at 6:15 am #
    
    You’re welcome.
    
    Reply
- Jason Brownlee May 3, 2020 at 6:14 am #
  
  Interesting.
  
  I believe ensembles of trees support multi-output regression directly – e.g. not wrapper required.
  
  Reply
  - zipeng zhang November 22, 2024 at 11:38 am #
    
    However, if the random forest directly performs multiple outputs, it is not possible to take into account the correlation between multiple Y outputs (such as y1 and y2). In order to consider the correlation between y1 and y2, can I bring the random forest into the chain regression?
    
    Reply
Taran Rishit May 9, 2020 at 6:43 am #

Thank you very much !! I was looking for something like this and was unable to find ways for multi output predictions other than the four you mentioned ,now i did .

Reply
- Jason Brownlee May 9, 2020 at 6:48 am #
  
  You’re welcome!
  
  Reply
Taran Rishit May 9, 2020 at 6:55 am #

Thank you

Also i wanted to ask a question:

I have coordinates in table (in time series)
like
a ,b,c->d
b,c,d->e
and so on
where each point is in the form of [lat, long]
So this is where I wanted to use a multiple regression output
And I want to calculate error as well , which error metric would you suggest be best?
I wanted to use MAPE but it has the form like diff/actual… where actual is of the form ([lat, long)]
so I cant divide it with a list, it needs a single value. Any suggestions?

Reply
- Jason Brownlee May 9, 2020 at 1:45 pm #
  
  You’re welcome.
  
  I recommend choosing a metric that best captures the goals of the project for you and project stakeholders. Also, perhaps check the literature to see what others have done on the same type of problem before you.
  
  If you are unsure MAE and RMSE are a great place to start.
  
  Reply
AD May 25, 2020 at 2:49 am #

Very helpful!

When using Chained models using wrapper class, can we use separate models and then wrap them all to fit?

Reply
- Jason Brownlee May 25, 2020 at 5:54 am #
  
  What do you mean exactly? Perhaps you can elaborate.
  
  Reply
Danny Dunne May 25, 2020 at 10:37 pm #

Thanks for this tutorial which I used in conjunction with the eBook.

Can I ask – i am working on a multi-input/multioutput problem, with 3 sets of input (each set with 7 different variables with >10000 values). I have one output set (8 variables) for the combined 3 input sets.

What’s the best way to import/organise this data (vectors/arrays?) at the beginning of the process to make it easy to use with scripts?

Reply
- Jason Brownlee May 26, 2020 at 6:21 am #
  
  Well, generally sklearn expects input as a vector of rows and columns. So working with data in that form might be easier for you.
  
  Reply
Jack HU May 26, 2020 at 7:36 am #

Really good post. It really helps me a lot. I have two questions:
For Linear Regression for Multioutput in scikit-learn, when call fit(X, y). It is actually fit Separate Model for Each Output. Am I right?

Second question, I am using kernel_ridge linear regression(RBF kernel). I use cross validation to choose 2 hyper-parameter- alpha: the parameter for L2 regulazation, and gamma:the parameter for RBF kernel. When I call

kr = GridSearchCV(KernelRidge(kernel=’rbf’, gamma=0.1),
param_grid={“alpha”: np.logspace(-2, 0, 10),
“gamma”: np.logspace(-2, 0, 10)},
scoring=’neg_mean_squared_error’)

It fit Separate Model for each output ,but the Separate models share the same ‘alpha’ and ‘gamma’ in each CV parameter search?

Reply
- Jason Brownlee May 26, 2020 at 1:19 pm #
  
  Thanks!
  
  Yes, MultiOutputRegressor fits a separate model for each target as described in the tutorial.
  
  Correct, they will both use the same hyperparameters. If this is not desirable, you can fit separate models manually.
  
  Reply
  - JACK HU May 26, 2020 at 8:52 pm #
    
    Thanks for your reply. Also I found Gaussian process regression model in scikit learn- GaussianProcessRegressor- support mutioutput. Is the model actually model each outhput dimension as a single gaussian process regression problem?
    
    Reply
    - Jason Brownlee May 27, 2020 at 7:46 am #
      
      Perhaps check the documentation for the model.
      
      Reply
  - Arnold Rosielle July 4, 2021 at 9:25 am #
    
    Great tutorial. Thanks. I am very much a newbie and it helps. a lot. In relation to your answer to Jack Hue that it fits a separate model for each output variable, this means it is not taking into account any relationship of the output variables, correct. Perhaps it is not necessary.
    
    The other question, I have is how would you get the fitted values for a data frame of test predictors. All the examples you gave are with a single 1D list for row.
    
    Thanks!!
    
    Reply
    - Jason Brownlee July 5, 2021 at 5:05 am #
      
      The model will make one prediction (vector) for each input sample.
      
      You can provide one or more samples to the model to make predictions.
      
      Reply
Bahar June 16, 2020 at 8:29 pm #

Thanks for such useful tutorial. I am going to use GridSearchCV inorder to improve the results but I get error:

from sklearn.model_selection import GridSearchCV
modelchain4grid = LinearSVR(max_iter=10000)
wrapper4grid = RegressorChain(modelchain4grid)
tuned_parameters = [{‘C’: [1,3,5,7,9,11,13,15,17,19,21]}]
grid = GridSearchCV(wrapper4grid, tuned_parameters,scoring = ‘neg_mean_squared_error’)
grid_result = grid.fit(X_train, Y_train)
print(grid_result)

I get this error:
ValueError: Invalid parameter C for estimator RegressorChain(base_estimator=LinearSVR(

Could you please let me know where is my issue?

Thanks in advance

Reply
- Jason Brownlee June 17, 2020 at 6:23 am #
  
  I believe you may need to specify the C parameter as sub-parameter of the regressor chain model.
  
  I don’t know the syntax offhand, perhaps RegressorChain__C, but perhaps check the scikit-learn documentation for grid searching composite models.
  
  Reply
  - Bahar June 23, 2020 at 7:33 pm #
    
    Thanks.
    That is “base_estimator__C” and I used then the folowing code:
    
    wrapper4grid = RegressorChain(modelchain4grid)
    print(wrapper4grid.get_params())
    svr = GridSearchCV(wrapper4grid, cv=5, param_grid={“base_estimator__C”: [1e0, 1e1, 1e2, 1e3]}, scoring=’accuracy’)
    grid_result = svr.fit(X_train, Y_train)
    
    But it does not support multioutput and gave me the following error:
    
    continuous-multioutput is not supported
    
    Could you please let me know if there is a good way to tune the parameters in multioutput regression?
    
    Thanks in advace
    
    Reply
    - Bahar June 23, 2020 at 9:44 pm #
      
      Just a correction in scoring (scoring = ‘neg_mean_squared_error’), fixed it.
      
      The complete working code for multi-input-ouptut is:
      
      modelchain4grid = LinearSVR(max_iter=10000)
      wrapper4grid = RegressorChain(modelchain4grid)
      #print(wrapper4grid.get_params())
      svr = GridSearchCV(wrapper4grid, cv=5, param_grid={“base_estimator__C”: [1e0, 1e1, 1e2, 1e3]}, scoring = ‘neg_mean_squared_error’)
      grid_result = svr.fit(X_train, Y_train)
      
      a = grid_result.best_score_
      b = grid_result.best_params_
      c = grid_result.cv_results_[‘mean_test_score’]
      d = grid_result.best_estimator_
      print(a)
      print(b)
      print(c)
      print(d)
      means = grid_result.cv_results_[‘mean_test_score’]
      stds = grid_result.cv_results_[‘std_test_score’]
      params = grid_result.cv_results_[‘params’]
      
      for mean, stdev, param in zip(means, stds, params):
      print(“%f” % mean)
      print(“%f” % stdev)
      print(“%r” % param)
      
      Reply
      - Jason Brownlee June 24, 2020 at 6:31 am #
        
        Well done!
    - Jason Brownlee June 24, 2020 at 6:30 am #
      
      You may have to grid search manually, e.g. with some for loops.
      
      Reply
      - Bahar July 10, 2020 at 9:35 am #
        
        Thanks Jason,
        
        I have further worked on RegressionChain and tried to setup a pipeline to fit and predict new data as follows:
        
        model = Pipeline([(‘sc’, StandardScaler()),(‘pca’, PCA(n_components=10)),(‘SVRchain’, RegressorChain(LinearSVR(max_iter=1000)))])
        model.fit(X_train, Y_train)
        print(model.fit)
        predictions= model.predict(X_test)
        print(predictions)
        
        But the predictions are different from when I do not use this Pipeline and do the StandardScaler and PCA first and then fit the model (not using Pipleline).
        
        Is there something that I have missed or we cannot set up sucha Pipleline for RegressorChain(LinearSVR)?
        The Pipeline itself is a kind if estimator that may conflict with the based estimator of RegressionChain that is LinearSVR?
        
        Thanks in advance
      - Jason Brownlee July 10, 2020 at 1:47 pm #
        
        Good question.
        
        I would recommend using a pipeline for the estimator within the regression train.
      - Bahar July 10, 2020 at 9:16 pm #
        
        Thank but I did not get you can you show in an code example please?
      - Bahar July 10, 2020 at 9:30 pm #
        
        Do you mean something like this:
        
        # create pipeline
        estimators = []
        estimators.append((‘standardize’, StandardScaler()))
        estimators.append((‘pca’, PCA(n_components=10)))
        estimators.append((‘LSVR’, LinearSVR(max_iter=1000)))
        model = RegressorChain(Pipeline(estimators))
        model.fit(X_train, Y_train)
        test = model.predict(X_test)
        print(test)
        
        If you meant something like the above mentioned code then it also does not give the right answer.
        Could you please let me know your feedback?
      - Jason Brownlee July 11, 2020 at 6:11 am #
        
        Yes, that is what I was thinking, does it work as expected?
      - Bahar July 10, 2020 at 9:53 pm #
        
        I hope that you meant the following way (that I tried it and it gives the expected result):
        
        pipe4estimator = Pipeline([(‘sc’, StandardScaler()),(‘pca’, PCA(n_components=10)),(‘SCRChain’, RegressorChain(LinearSVR(max_iter=1000))])
        sc_y = StandardScaler()
        y_train_std = sc_y.fit_transform(Y_train)
        pipe4estimator.fit(X_train, y_train_std)
        y_train_pred = sc_y.inverse_transform(pipe4estimator.predict(X_train))
        y_test_pred = sc_y.inverse_transform(pipe4estimator.predict(X_test))
        print(y_test_pred)
        
        If this is what I meant then we can result that:
        Seting up a pipeline for RegressorChain(LinearSVR()), needs extra standardScaler outof the pipeline after train and prediction steps. Right?
      - Jason Brownlee July 11, 2020 at 6:12 am #
        
        Well done!
        
        No, the pipeline knows how to prepare data fed into it after it is fit on training data.
Chandra June 24, 2020 at 3:54 am #

Really great post, Jason! I have read many of your articles and appreciate your to the point discussions. I have a question.

Say, we are trying to learn a model for multi output regression. And, I would like to add ‘relatively’ more importance to learning one of the dimensions of the output vector.

Do you have any suggestions for this situation? As far as I can think of, we can try to add more weight to that dimension while calculating the loss. But, I will have to add another hyperparameter that will control the weight for that dimension.

Is there any other way you can think of?

Reply
- Jason Brownlee June 24, 2020 at 6:39 am #
  
  Hmmm, you could design a model where the loss is calculated across the output vector and give more penalty to one output than the others. It might be easer with a neural net in that sense.
  
  Or train separate models and use different loss functions for each, with a strong penalty for the one that is more important.
  
  Reply
phillip June 25, 2020 at 1:01 am #

Thank you for this artical.

I have a question about the attribute in Condensed Nearest Neighbor Method. The sample_indices are the indices, which are filtered out? Or they are the indices, which should stay.

Reply
- Jason Brownlee June 25, 2020 at 6:24 am #
  
  We don’t cover CNN in this tutorial, are you referring to the algorithm generally or a specific implementation?
  
  e.g. this may help:
  https://imbalanced-learn.readthedocs.io/en/stable/generated/imblearn.under_sampling.CondensedNearestNeighbour.html
  
  Reply
Nico July 23, 2020 at 8:13 pm #

Hi Jason.
Is it possible to perform variable selection in RegressorChain and see the important variables for each step? For exemple with Lasso ?

Reply
- Jason Brownlee July 24, 2020 at 6:27 am #
  
  Perhaps, you may need to experiment.
  
  Reply
Paul Christian August 1, 2020 at 2:27 am #

Jason to the rescue. Again! Thank you.

Reply
- Jason Brownlee August 1, 2020 at 6:14 am #
  
  You’re welcome Paul.
  
  Reply
THOUMMALA, NALINH August 2, 2020 at 10:38 pm #

Hi…I would like to know if we can also use LSTM model to predict multioutput

Reply
- Jason Brownlee August 3, 2020 at 5:47 am #
  
  Yes, there are many examples on the blog, perhaps start here:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Gee August 22, 2020 at 5:55 pm #

Thank you Jason for the very clear explanation on multi output multi input regression. I have a follow-up question. How do I find out what the coefficients are in a multi-input multi output model? I am asking because I have a few data sets that are similar but sources from different years. I’m trying to see how the coefficients change over years.

Reply
- Jason Brownlee August 23, 2020 at 6:24 am #
  
  You’re welcome!
  
  If you use a linear model, you can access the coefficients of the model directly via the “coef_” attribute, more details here:
  https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
  
  Reply
  - Gee August 23, 2020 at 3:24 pm #
    
    Once again, thank you Jason. I also read from earlier comments that a multi-output regression is actually just a series of independent linear regressions. Does that mean that, for 3 outputs and 3 inputs, the model is simply:
    Y1= w1.x1 + w2.x2 + w3.x3 + intercept1
    Y2= w4.x1 + w5.x2 + w6.x3 + intercept2
    Y3= w7.x1 + w8.x2 + w9.x3 + intercept3
    
    … where w1-w9 are the coefficients that we can print from coef_.
    
    And the r2 is the average of r2 of the 3 equations?
    
    Reply
    - Jason Brownlee August 24, 2020 at 6:17 am #
      
      Yes, something like that.
      
      Reply
Mostafa August 24, 2020 at 1:50 am #

Thank you for the tutorial and your efforts.

I have an inquiry, Is the prediction depends on the number of samples? and does standardization is needed to avoid that?

Reply
- Jason Brownlee August 24, 2020 at 6:29 am #
  
  The prediction depends on all aspects of the dataset, it’s preparation and the model configuration.
  
  Standardization is required for some algorithms and datasets. Perhaps try it and see if it improves model performance.
  
  Reply
Nick August 28, 2020 at 3:09 pm #

Helpful examples, thank you. Do you have any examples of a custom loss function that could penalize the summation of resulting y’s in addition to individual rmse or r2. For example, i have sales of three stores and some inputs. However i know the sum of their sales and can feed this number as well as an input. How can I ensure the resulting 3 sales forecast will sum to the number I believe it to be? Its unusual because I know the resulting sum but I’m interested in forecasting the contributions to the sum. Hope that makes some sense and you have a suggestion. Thx!

Reply
- Jason Brownlee August 29, 2020 at 7:58 am #
  
  I don’t have an example of a loss function as you describe, but the example of a custom RMSE metric in this tutorial can be adapted for your needs:
  https://machinelearningmastery.com/custom-metrics-deep-learning-keras-python/
  
  Reply
  - Nick September 1, 2020 at 1:21 am #
    
    Thx for the pointer. Can you use keras backend with sklearn.multioutput? I’m don’t understand how that would work this.
    
    Also is there a way to extract feature importance from an XGB model used in wrapper? I other words can I get the n model’s most important features? I haven’t figured out how to access or even if this information is actually retained.
    
    Reply
    - Nick September 1, 2020 at 1:28 am #
      
      figured out how to access feature importance right after i posted this:
      
      wrapper.estimators_[n].feature_importances_
      
      in case useful for anyone else. n being the index of the model you are interested in.
      
      Still don’t understand how keras.backend would interact with this. Or is using the backend math not applicable to this example?
      
      Reply
      - Jason Brownlee September 1, 2020 at 6:36 am #
        
        Nice!
    - Jason Brownlee September 1, 2020 at 6:36 am #
      
      Maybe, no need though as the models do the same thing different ways.
      
      Yes, you can retrieve feature importances from a fit XGB model I believe. I recommend checking the API to see the name of the attribute on the object.
      
      Reply
Vani August 30, 2020 at 5:54 pm #

Thanks for the lucid example.

Can we put different regression or classifier in sequence in wrapper function? For example, first dependent variable predicted with linear regression and the next one with ridge regression ?

wrapper = RegressorChain(Linearregresson(), Ridgerregression(), order=[1,0])

Reply
- Jason Brownlee August 31, 2020 at 6:12 am #
  
  No. See the API documentation here:
  https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.RegressorChain.html
  
  Reply
muntahi September 4, 2020 at 2:26 am #

Thanks a lot for sharing. I was looking for a solution for my MIMO problem. I am glad I found this.

Reply
- Jason Brownlee September 4, 2020 at 6:32 am #
  
  You’re welcome!
  
  Reply

jyo September 15, 2020 at 12:27 pm #

Hi ,

In topics to be covered its mentioned “Random Forest for Multioutput Regression” but I don’t see it in the article.

Pardon me if I overlooked something…I am a beginner searching for that…

regards,
Jyo

Jason Brownlee September 15, 2020 at 2:53 pm #

Sorry, I must have deleted that example.

Here it is:

# random forest for multioutput regression
from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestRegressor
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)
# define model
model = RandomForestRegressor()
# fit model
model.fit(X, y)
# make a prediction
data_in = [[-2.02220122, 0.31563495, 0.82797464, -0.30620401, 0.16003707, -1.44411381, 0.87616892, -0.50446586, 0.23009474, 0.76201118]]
yhat = model.predict(data_in)
# summarize prediction
print(yhat[0])

# random forest for multioutput regression

from sklearn.datasets import make_regression

from sklearn.ensemble import RandomForestRegressor

# create datasets

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)

# define model

model = RandomForestRegressor()

# fit model

model.fit(X, y)

# make a prediction

data_in = [[-2.02220122, 0.31563495, 0.82797464, -0.30620401, 0.16003707, -1.44411381, 0.87616892, -0.50446586, 0.23009474, 0.76201118]]

yhat = model.predict(data_in)

# summarize prediction

print(yhat[0])

jyo September 22, 2020 at 1:49 am #

hi Jason,

I have run all models on my data.

I have one input and many target variables.i have normalized all variables to have values between and 1.

In that one of the target variables has only 4 values.I mean it can have any of the four values.

But when I run the model ,the target variable is having many values …and so when I do renormalization ,I don’t see whole number values for that variable.

Is this behaviour expected?

Please help.

Reply
- Jason Brownlee September 22, 2020 at 6:52 am #
  
  Perhaps. I don’t understand the problem you’re having, sorry.
  
  Reply
  - jyo September 22, 2020 at 10:38 am #
    
    Jason,
    
    Let me take your Boston housing data as example.
    
    In that the variable RAD has values 1 to 8 and 24.These are the only 9 values it has. But when I run multiregression, and suppose RAD is one of the target variables..can it happen that the outcome has a value other than these numbers(1:8 and 24?)
    
    When I normalize entire data set to have only to range from 0 to 1 and run the model ,and do scaleback on the predictions ,I see multiple values on a variable which is supposed to have only 4 values actually.
    
    So I am not understanding where I went wrong.Please help…
    
    Reply
    - Jason Brownlee September 22, 2020 at 1:36 pm #
      
      You can invert the transform on predicted values and round to the desired precision.
      
      Reply
jyo September 22, 2020 at 11:51 am #

Hi Jason,

To make it more clear,I have multi target variables and in that I have few that are categorical and few that are continuous.

So if I am trying any of the above regression techniques ,the target variables that have categorical structure are also getting float values…should i make the predicted values to whole numbers by rounding off?

Reply
- Jason Brownlee September 22, 2020 at 1:38 pm #
  
  Perhaps try one model per variable?
  Perhaps try post-processing the output?
  Perhaps try a mutli-output neural net model with a separate loss per output?
  
  Reply
jyo September 22, 2020 at 2:21 pm #

jason,

i was asked to do on ML only.What do u mean by postprocessing the output….

Reply
- Jason Brownlee September 23, 2020 at 6:32 am #
  
  E.g. scale, round, etc. the output of the model, interpret it for your application.
  
  Reply
Shantanu September 23, 2020 at 2:56 am #

We should normalize entire dataset before using any regressor for multiple targets?. For single target, usually normalizing only X appears to be more common.

Reply
- Jason Brownlee September 23, 2020 at 6:42 am #
  
  No, fit the transform on the training dataset, then apply to the train and test sets.
  
  This is to avoid data leakage:
  https://machinelearningmastery.com/data-preparation-without-data-leakage/
  
  Reply
  - Shantanu September 23, 2020 at 9:18 am #
    
    Thank you for clarifying. I was going through your other post: https://machinelearningmastery.com/feature-selection-for-regression-data/
    
    The feature selection doesn’t work for multiple targets, any idea if there is a way to do the same for y with shape (n, 2)
    
    Thank you
    
    Reply
    - Jason Brownlee September 23, 2020 at 1:41 pm #
      
      Perhaps operate on each target separately.
      
      Reply
jyo September 23, 2020 at 11:21 am #

Hi Jason,

I have the variable in label encoded format(not strings ,numerical itself),i can’t further one hot encode it because the columns bloat in my case.

I normalized ..so all the data is is 0-1 range .

Ran the models.

Now to interpret output I am having the float numbers on the categorical which I can’t accet and so trying to get integers.

Can I apply ceiling if it is below 0.5 and floor above 0.5 of a number?

Will this be correct approach?

Reply
- Jason Brownlee September 23, 2020 at 1:46 pm #
  
  You can invert the scaling of the target using the same object that you used to scale the variable in the first place. e.g. call inverse_transform()
  
  Reply
  - jyo September 23, 2020 at 5:01 pm #
    
    But that will not guarantee me integer values of predicted values ,right?
    
    Reply
    - Jason Brownlee September 24, 2020 at 6:09 am #
      
      It will be consistent, then you can round the result.
      
      Reply
jyo September 23, 2020 at 11:26 am #

Also how can I decide whether to go with one model per variable or all variables in single model?

Reply
- Jason Brownlee September 23, 2020 at 1:46 pm #
  
  Choose the approach that results in the best performance for your chosen metric/test harness.
  
  Reply
Saurabh September 25, 2020 at 5:44 am #

You Jason are a genius!!!! Tried to code out a similar problem, but was stuck at it. And then came along this article!!. I would highly suggest you to link this article with Time Series Forecasting articles, as it has great resemblance with multi-step head forecasting (of course with suitable changes) problems.

Reply
- Jason Brownlee September 25, 2020 at 6:43 am #
  
  No, just a simple human.
  
  Thanks. Yes, you can use it for multi-step forecasting, e.g. the “direct” approach listed here:
  https://machinelearningmastery.com/multi-step-time-series-forecasting/
  
  Reply
John October 2, 2020 at 12:28 am #

Hi, Jason,

do all inherent multioutput regression algorithms you mentioned (LinearRegression, KNeighborsRegressor, …) take into account dependencies between the outputs?

Many thanks in advance.

Best regards
John

Reply
- Jason Brownlee October 2, 2020 at 5:59 am #
  
  No, I don’t believe so.
  
  Reply
  - Dominic February 1, 2021 at 2:10 am #
    
    Is there a possibility to do that?
    
    Reply
Shirley Kokane October 18, 2020 at 4:17 am #

Hi Jason,

I have already existing data and I dont want to use make regression. But with my dataset I am not able to use the rest of your function. How can I do that ?

Reply
- Jason Brownlee October 18, 2020 at 6:11 am #
  
  Sorry, I don’t understand your question – perhaps you can rephrase?
  
  Reply
- zipeng zhang November 22, 2024 at 12:25 pm #
  
  I think you should convert the data to array format
  
  Reply
Ben Bartling November 30, 2020 at 2:29 am #

Hi Jason,

Long question sorry, but sort of confused…

If I am attempting to use regressor chain to forecast out 24 future values (hourly data, or one day look ahead)

Does the order matter much?

For example, if I want 24 future values would I always need to use: 0 through 23 on order?

chain_regr = RegressorChain(model, order=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23])

OR should I just experiment?

I also get a little confused on how many leading and lagging variables to use on the series_to_supervised function but I think you always teach just to keep experimenting to check results…

Reply
- Jason Brownlee November 30, 2020 at 6:39 am #
  
  Sounds like a great project!
  
  Probably keep the order linear, but experiment to confirm.
  
  Same with number of inputs, experiment the input sequence length to discover what works best.
  
  Reply
  - Ben Bartling December 2, 2020 at 12:29 am #
    
    Hi Jason,
    
    One other item I still cant wrap my head around is the train test split with more than one X variable. So if I am forecasting out 24 samples (one day look ahead) that I was talking about previously. What would be my target variable (y) and explainer variables? For example my data coming back from the series_to_supervised function looks like this:
    
    train = series_to_supervised(data,11,14)
    
    var1(t-14) var1(t-13) var1(t-12) … var1(t+8) var1(t+9) var1(t+10)
    0 NaN NaN NaN … -0.524479 -0.618750 -0.707683
    1 NaN NaN NaN … -0.618750 -0.707683 -0.806900
    2 NaN NaN NaN … -0.707683 -0.806900 -0.873959
    3 NaN NaN NaN … -0.806900 -0.873959 -0.899870
    4 NaN NaN NaN … -0.873959 -0.899870 -1.032032
    .. … … … … … … …
    827 -1.483986 -1.532290 -1.456250 … NaN NaN NaN
    828 -1.532290 -1.456250 -1.226042 … NaN NaN NaN
    829 -1.456250 -1.226042 -1.200911 … NaN NaN NaN
    830 -1.226042 -1.200911 -1.441015 … NaN NaN NaN
    831 -1.200911 -1.441015 2.416797 … NaN NaN NaN
    .. … … … … … … …
    827 0.524870 0.208073 -0.200912 … NaN NaN NaN
    828 0.208073 -0.200912 0.626172 … NaN NaN NaN
    829 -0.200912 0.626172 0.591797 … NaN NaN NaN
    830 0.626172 0.591797 0.616145 … NaN NaN NaN
    831 0.591797 0.616145 0.108594 … NaN NaN NaN
    
    Would my target y always be var1(t) and explainer variables be all others? For example:
    
    trainX = np.array(train.drop(['var1(t)'],1))
    trainy = np.array(train['var1(t)'])
    
    This way I can test out different variations of leading/lagging variables when calling the series_to_supervised
    
    Reply
    - Jason Brownlee December 2, 2020 at 7:47 am #
      
      Some train data will be needed for the first few predictions in the test, most likely.
      
      This may help:
      https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
      
      Reply
Akhila December 1, 2020 at 8:21 pm #

Hi Jason,
I have data of Drumspeed, mixing rotations, discharging rotations, weight values of a concrete mixer. The process is a combination of mixing(positive cycle) and discharging(negative cycle). I want to predict how much the weight of concrete is reduced after every discharging process. I am thinking to use multilinear regression.
But the problem is How Can I predict the reduced weight at each discharging process? Is it possible with the multilinear regression model? I am new to regression problems. Do I need to use some mathematical formulas before predicting?

Reply
- Jason Brownlee December 2, 2020 at 7:42 am #
  
  This framework may help you frame your prediction problem:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  Then follow this process:
  https://machinelearningmastery.com/start-here/#process
  
  Reply
  - Akhila December 3, 2020 at 12:50 am #
    
    Hi Jason,
    I have a dataset like this. It is a sample dataset.
    No Drumspeed Rot.discharging Rot.mixing Weight Predicted_weight
    1 12 10 6170 29000 28765
    2 8 20 6270 27000 27320
    3 4 25 0 25000 24569
    4 10 30 6370 30000 29890
    5 7 35 6378 28500 28120
    6 5 40 0 26000 26789
    7 6 28 6235 28500 28435
    8 7 36 6298 27564 27111
    9 10 43 6300 26560 26780
    10 12 47 0 24000 24361
    
    3,6,10 are the discharging process and remaining are mixing process. I want to predict the weight only at the discharging process(3,6,10).
    
    The problem is I just want to predict the weight values only at the discharging process. Is it possible with machine learning multiple regression? Because it just predicts the continuous values.
    
    Reply
    - Jason Brownlee December 3, 2020 at 8:20 am #
      
      Perhaps try prototyping a few different models and discover if you can achieve your desired outcome.
      
      Reply
Asilkan December 4, 2020 at 11:13 pm #

Hi Jason,
Do you know which other algorithms are supported for RegressorChain ? I use the LinearSVR following your sample but got an MAE: 991.290 (128.681) which are very high values. Should I change the scorer or the algorithm regarding to my data?

LSVR = LinearSVR()
reg_model = RegressorChain(LSVR, order=[0,1])
reg_model.fit(X_train, y_train)
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=42)
cv_r2_scores_lsvr = cross_val_score(estimator =reg_model, X= X_train, y= y_train,scoring=’neg_mean_absolute_error’, cv=cv)
abs_cv_r2_scores_lsvr = absolute(cv_r2_scores_lsvr)

print(‘MAE: %.3f (%.3f)’ % (mean(abs_cv_r2_scores_lsvr), std(abs_cv_r2_scores_lsvr)))

Also,
Do you think GridSearchCV can be used for RegressorChain ? I used in my data as you can see in below but not sure it is logical for RegressorChain method.

#model tuning
svr_params = {‘base_estimator__tol’ : [000.1, 00.1, 0.1],
‘base_estimator__max_iter’: [1000,2000,3000]
}

SVR = LinearSVR()
svr_model = RegressorChain(SVR, order=[0,1])
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=42)
svr_cv_model = GridSearchCV(svr_model,
svr_params,
cv = cv,
n_jobs = -1)
svr_cv_model.fit(X_train, y_train)
print(“Best parameters are: ” + str(svr_cv_model.best_params_))

Thank you for your hardwork btw, It helps me a lot. I would like to use as a reference if I can publish my work in Kaggle. Hope it is okay for you 🙂

Reply
- Jason Brownlee December 5, 2020 at 8:07 am #
  
  Perhaps try a number of algorithms and see what works best.
  
  Not sure about combining grid search with regression chain – try it and see.
  
  Please clearly cite and link to the source if you reuse my code, details here:
  https://machinelearningmastery.com/faq/single-faq/can-i-use-your-code-in-my-own-project
  
  Reply
Vadim December 22, 2020 at 9:50 pm #

We have developed a python package for that:

https://github.com/DSARG/amorf

It combines several different approaches to help you get started with multi-output regression analysis.

Reply
- Jason Brownlee December 23, 2020 at 5:35 am #
  
  Thanks for sharing.
  
  Reply
fabou December 24, 2020 at 2:47 am #

hi Jason, thanks for this post.

I have a question concerning models MAE :

The MAE for the Inherently Multioutput Regression Algorithms approach is 51.817 (2.863) whereas for the Direct Multioutput Regression approach the MAE is 0.419 (0.024).

You mention the fact that “error is reported across both output variables, rather than separate error scores for each output variable”. My understanding of your statement is that it is the sum of MAE on y1 added to the MAE on y2.IS that correct?

My question is : what is the meaning of the MAE of 0.419 ? It is quite small compared to 51.817.

Thanks.

Reply
- Jason Brownlee December 24, 2020 at 5:35 am #
  
  No, I believe the MAE is averaged across variables and samples.
  
  The units of MAE will be the same as the target variable.
  
  You are comparing different things. “51.817” is a specific predicted value, “0.419” is a MAE.
  
  Reply
  - Mohammad Taheri August 26, 2021 at 11:19 pm #
    
    Hello Jason.
    
    Thank you for your helpful tutorial.
    
    Is it possible to obtain a separated error for each output variable using the cross_val_score function?
    
    Reply
    - Adrian Tam August 27, 2021 at 6:12 am #
      
      In that case, you might get a vector output from the error function. But the cross_val_score expects a single scalar (see scoring parameter description in https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html). You can write your own function to record the errors in vectors, however.
      
      Reply
fabou December 26, 2020 at 4:08 am #

MAE: 51.817 (2.863) is what I read in your post, in the Evaluate Multioutput Regression With Cross-Validation paragraph.

Reply
- Jason Brownlee December 26, 2020 at 5:12 am #
  
  Right you are, yes the decision tree performs poorly on the problem.
  
  Reply
mlnewbie January 10, 2021 at 5:29 am #

Hello, Jason, a great post as always!

Consider the following task:

You have a data frame where each entry is some biomarker, its connection with a therapy and some other data (with results taken from research papers), and the task is to inference what factors affect the therapy composition of the database i.e. how many therapies of each kind are in it based on other data in the table. Would you consider this a multi-output regression problem (I was thinking of glm for poisson regression)? On the one hand, it does look like a multi-output regression problem to me, since we have multiple numerical response variables, but on the other hand it looks like all therapies should be dealt with separately…

Reply
- Jason Brownlee January 10, 2021 at 5:48 am #
  
  Thanks.
  
  Perhaps prototype some code for your data modelled as a multi-output regression and see if it makes sense.
  
  Reply
WWJD January 12, 2021 at 8:35 am #

Question about inputs and targets: Could the inputs and targets be the same? Would this cause any issues?

For example if dealing with multiple sites each with their own values, and we want to predict values for all sites.

Say we had 3 sites and wanted to predict 3 target values one for each site. Can we make the input and target data the same?

Reply
- Jason Brownlee January 12, 2021 at 10:33 am #
  
  Yes.
  
  Perhaps try it out with a small prototype.
  
  Reply
Zoe January 13, 2021 at 4:04 pm #

Hi Dr. Drownlee,

I have one questions: if the two outputs are kind of related to each other, will this cause any issue?

Thanks!

Reply
- Jason Brownlee January 14, 2021 at 6:12 am #
  
  It depends. I don’t think so. Perhaps try it and see.
  
  Reply
Amaury Simondet January 26, 2021 at 10:30 pm #

Hi,
Thank you so much for this article and so much more, it really helps me out in my ML project.
Nevertheless I have this error for both method of use of SVM:

C:\Users\Amaury\AppData\Roaming\Python\Python38\site-packages\sklearn\svm\_base.py:985: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
warnings.warn(“Liblinear failed to converge, increase ”
C:\Users\Amaury\AppData\Roaming\Python\Python38\site-packages\sklearn\svm\_base.py:985: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
warnings.warn(“Liblinear failed to converge, increase ”

And the results are pretty bad and very different at every executions, do you have an answer ?

Reply
- Jason Brownlee January 27, 2021 at 6:06 am #
  
  You’re welcome.
  
  They look like warnings that you can ignore.
  
  Nevertheless, perhaps change SVM to something else? Perhaps scale your data first? Perhaps change the solver used by SVM?
  
  Reply
shaheen February 14, 2021 at 5:34 pm #

What does mean standard deviation and the mean equal to zero in my Multi-Output Regression using linear regression and Linear SVM. is my model fit or I must change my algorithms I have huge excel data of 10 outputs and 80 inputs thank very much for your efforts to make machine learning available and open sorce for every one

Reply
- Jason Brownlee February 15, 2021 at 5:43 am #
  
  If the mean and standard deviation are zero, then the field has a single value for all rows – in which case the variable has no data and can be removed from the dataset.
  
  Reply
Marcus Gonzaga February 14, 2021 at 11:24 pm #

Great article as aways.

Did you write a similar article (Multioutput Regression) for R language?

Reply
- Jason Brownlee February 15, 2021 at 5:44 am #
  
  Thanks!
  
  Not at this stage, sorry.
  
  Reply
Namita February 18, 2021 at 1:30 pm #

Hi Jason,
This has been very helpful! Very thorough and well explained.

Is there a way to get prediction intervals also for the multiple values we are predicting? Can you direct to some help in that direction?

Reply
- Jason Brownlee February 19, 2021 at 5:51 am #
  
  Thanks!
  
  A simple approach is to use an ensemble of models, make predictions with each and use the distribution of predictions for each output to determine the interval (percentiles or mean+stdev)
  
  Reply
  - Namita February 24, 2021 at 11:35 am #
    
    Thank you, that is helpful! 🙂
    
    Reply
    - Jason Brownlee February 24, 2021 at 1:25 pm #
      
      You’re welcome.
      
      Reply
Rei Balachandran March 4, 2021 at 1:58 pm #

Thanks a lot for the tutorial Jason, it is extremely useful!

Reply
- Jason Brownlee March 5, 2021 at 5:29 am #
  
  Thanks’ I’m happy to hear that!
  
  Reply
Hya March 8, 2021 at 9:20 pm #

Thanks a lot for the tutorial, I have a question. Do Linear regression, KNN, Decision tree implemented in Scikit-Learn use Direct Multioutput or Chained Multioutput ?

Reply
- Jason Brownlee March 9, 2021 at 5:18 am #
  
  Direct vector output.
  
  Reply
Yasser Zeinali May 19, 2021 at 3:52 am #

Hello,

Thank you for the explanations. Do the linear regression, k-NN, Decision tree and random forest consider the dependencies between targets? Or they just build their models based on each target separately?

Reply
- Jason Brownlee May 19, 2021 at 6:38 am #
  
  Ideally, yes they do, but it may differ from algorithm to algorithm or implementation to implementation.
  
  Reply
Bundit May 23, 2021 at 10:06 am #

Hi Jason,

Thanks for your great tutorials to community.

I am wondering about the algorithm operation for inherent MTR method (kNN, Decision Tree, RF). The method of using MultiOutputRegressor wrapper is understandable that it uses the base estimator to create the model for each target independently, but what about the inherent algorithm methods, can you please give some explanation?

Best,

Reply
- Jason Brownlee May 24, 2021 at 5:41 am #
  
  You’re welcome.
  
  Sure, but each inherent algorithm operates differently.
  
  Reply
Md Ashiqur Rahman June 25, 2021 at 8:15 pm #

Hi Jason,

You use MAE here to evaluate the prediction, Is there any other parameter needed to evaluate multi-output regression?

I mean whether r-square can be a parameter for prediction or not. Or is it possible to get r-square for multioutput-regression.

Thank you. Appreciate your works.

Reply
- Jason Brownlee June 26, 2021 at 4:55 am #
  
  Yes, there are many. You must choose a metric that best captures the goals of your project.
  
  Reply
Elham July 5, 2021 at 3:26 am #

Hi,
Thanks for this great tutorial.

Reply
- Jason Brownlee July 5, 2021 at 5:09 am #
  
  Thanks!
  
  Reply
MD ASHIQUR RAHMAN July 9, 2021 at 3:50 am #

Hi Jason,

Is there any acceptable limit for MAE value?
I mean how it can be concluded for acceptable limit for MAE?

thanks,

Ashiq

Reply
- Jason Brownlee July 9, 2021 at 5:14 am #
  
  Yes, lower bound is zero for perfect, upper bound is the error on whatever a naive model predicts, more here:
  https://machinelearningmastery.com/faq/single-faq/how-to-know-if-a-model-has-good-performance
  
  Reply
  - MD ASHIQUR RAHMAN July 9, 2021 at 6:16 am #
    
    Hi Jason,,
    
    Thank you for sharing your ideas. I am assuming whether any ideal upper found is followed universally or not. If yes, then please share your thoughts on this. I mean what value for MAE could be acceptable.
    
    Ashiq
    
    Reply
    - Jason Brownlee July 10, 2021 at 6:04 am #
      
      No, only relative upper bound.
      
      Reply
Farshad July 15, 2021 at 2:01 pm #

If I decide to use gbmregressor, how I can do parameter tuning, to pick the best parameters, Thank you.

Reply
- Jason Brownlee July 16, 2021 at 5:20 am #
  
  You can use a grid search or a random search to tune the hyperparameters.
  
  There are many examples of hyperparameter tuning on the blog, use the search box.
  
  Reply
Farshad July 16, 2021 at 6:10 am #

Thank you, but when I write the gridsearch, and use y as target (if y is more than one element), I get an error, since the gbm just work on one output, how do you solve that issue? do you have an example some where that for multioutput you did gridsearch parameter tuning for gbm? Thank you.

Reply
- Jason Brownlee July 17, 2021 at 5:15 am #
  
  You may have to grid search manually, e.g. with a for loop over configurations.
  
  Reply
João Martins July 19, 2021 at 2:51 am #

Hi Jason,

do you know if the RegressorChain can be used with pipelines that include a ColumnTransfomer?

I’ve posted a question here: https://stackoverflow.com/questions/68430993/sklearn-using-regressorchain-with-columntransformer-in-pipelines but suspect this may not be possible, would appreciate your insight.

Cheers and thanks for a great website/contents!

Reply
- Jason Brownlee July 19, 2021 at 5:19 am #
  
  I would guess that it can, but perhaps prototype an example to see if it is possible.
  
  Reply
Kathleen July 24, 2021 at 1:00 pm #

Hi, Jason! This article changed my world, so thank you! I have read all your articles on Ensemble Learning, and those weren’t quite what I needed, because I did not want to divide my output into smaller bags to train different model types, but I actually have multiple different outputs, so this is great!

Since Direct Multi-output Regression does not consider the dependencies between the target variables, but the Chained Multi-output Regression only considers an ordered/sequential dependence between the target variables, is there a way to account for the dependence (that is not ordered/sequential) between target variables?

For example, can I create a single model using support vector regression on each target variable, then somehow perform a final model tweak or even train a final model on the individual models to account for any dependence between the targets?

Reply
- Jason Brownlee July 25, 2021 at 5:12 am #
  
  You’re welcome.
  
  Yes, a neural network that outputs a direct vector.
  
  Reply
  - Kathleen July 27, 2021 at 7:26 am #
    
    Thank you for your prompt reply!
    
    Just to be clear, a neural network that outputs a direct vector would take my input, perform support vector regression on each output (so my number of neurons in my first hidden layer is the same as my number of distinct outputs), then my second hidden layer performs another support vector regression with some (I’m not sure) number of neurons to account for the dependence between my outputs, which then feeds into my output layer as a direct vector.
    
    I would then optimize my parameters on the neural network as a whole.
    
    Is this correct in what you are suggesting?
    
    Reply
    - Jason Brownlee July 28, 2021 at 5:20 am #
      
      I don’t know about combining SVR with neural networks sorry.
      
      Reply
Andrea August 14, 2021 at 6:05 pm #

Hi,
I’m a novice of AI and other stuffs. I have a simple question. What is the differences between Multi-Output Regression and an Artificial Neural Network? I hope I haven’t asked a stupid question.

Reply
- Adrian Tam August 14, 2021 at 11:32 pm #
  
  Neural Network is just one way to do regression. And it can also do more than regression. So in other words, they are different things but with some overlap.
  
  Reply
Andrea August 15, 2021 at 12:39 am #

So if I want to predict a couple of values like gps position (latitude, longitude) from a large dataset of gps position, i think that a Multi-Output Regression model is better then an ANN with 1 or more hidden layer. What do you think of this?

Reply
- Adrian Tam August 17, 2021 at 7:15 am #
  
  Indeed, ANN can also be a multi-output regression model.
  
  Reply
Thai August 20, 2021 at 2:20 am #

Thank you very much for your excellent work. I have a question: can I do ‘Direct Multioutput’ and ‘Chained Multioutput’ for Multi-output classifier problem? Do you have any suggesstion for me about Multi-output classifier problem.

Reply
- Adrian Tam August 21, 2021 at 1:01 am #
  
  Do you think this tutorial works for you? https://machinelearningmastery.com/one-vs-rest-and-one-vs-one-for-multi-class-classification/
  
  Reply
  - Thai August 23, 2021 at 8:19 pm #
    
    Thank you very much for your reply.
    But I mean that how can I do with multi-targets classification. For example, I have a dataset with 5 features, and two targets which are need to classified. But each target has 4 classes.
    Thank you for your help
    
    Reply
    - Adrian Tam August 24, 2021 at 8:34 am #
      
      It seems to me that the two targets are independent of each other. In that case, you better build two separate networks, one for each target.
      
      Reply
      - Thai August 27, 2021 at 4:37 am #
        
        Thank you very much!
Bob August 31, 2021 at 9:50 pm #

Dear Jason, thank you for the great blogposts!!

Given y target output, and z feature input, how many minimum observation necessary to start build a model? are there rule of thumb? y times z?

Thank you

Reply
- Adrian Tam September 1, 2021 at 8:53 am #
  
  30 is the magic number in statistics (https://www.researchgate.net/post/What_is_the_rationale_behind_the_magic_number_30_in_statistics) so a brainless estimate would be y times z times 30. But you may not need that much if you can do some preprocessing, e.g., reduce z by feature selection.
  
  Reply
  - Bob September 1, 2021 at 3:06 pm #
    
    Dear Adrian, Thank you very much. Warmest regards!
    
    Reply
Brett September 2, 2021 at 6:43 pm #

Thank you for this post! I have a question regarding individual accuracy on multi-outputs. I have built 2 multi-output randomforest models. 1) predicts two outpus: x and y coordinates, 2) predicts three outputs: x and y coordinates and another unrelated variable. However, when predicting 3 variables the x-y coordinate accuracy decreases compared to just predicting the x-y coordinates. Is this expected? May having more outputs require building more trees to maintain similar accuracy? Any insight would be very appreciated.

Reply
- Jason Brownlee September 3, 2021 at 5:30 am #
  
  You can choose how to evaluate your model, e.g. error score for each output or combined.
  
  Reply
Grey September 22, 2021 at 10:44 pm #

Hey,
Thanks for your post! It is super helpful.

I have two questions:
1. Is it possible that the number of the input is smaller than the number of the output, will it harm the effect of the prediction?
2. Did you do a similar tutorial for deep learning (regression with neural networks)?

Thanks!

Reply
- Adrian Tam September 23, 2021 at 3:43 am #
  
  1. Should not be. The reason is for any input, there must be a corresponding output or otherwise you cannot do regression. The dimension, however, is a different issue. You have have Y=f(X) for Y a vector of 5 while X is a vector of 3.
  
  2. Yes, please see this one, for example: https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/
  
  Reply
MD ASHIQUR RAHMAN September 29, 2021 at 4:16 am #

Hi Jason,

Do the multi output fit regression model for each output variable and then publish it as a multioutput?
If yes, how can the output variable make inherent relation among each other?

Suppose, for a supervised learning method, If I need to assign the four output as a sum of 100(%), So, the prediction also have to have four output and the sum would be 100(%), right?
How multioutput regression can do that?

Thank you!

Reply
- Adrian Tam September 30, 2021 at 1:11 am #
  
  Not really. Depends on how you create your model. In a neural network with softmax output, however, it is true because it will provide 4 output values normalized to a sum of 1.0. In an ensemble of four OvR models, each will provide a different probability between 0 and 1 but no model guarantee their sum is exactly equal to 1.
  
  Reply
Paulo Gomes November 3, 2021 at 12:09 am #

Good Morning Jason, thanks a lot for this Blog!

I’m a novice in AI, Machine Learning, Deep learning, and so…

My simple question is, in all of the above models you create a dataset

# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

Y have already a dataset with 10 named columns and 2 target ones (all numerical values).

Can you at least tell me how to Change this line for a created dataset?
Cause with this i can make all other models

Thank you so much!!

Reply
- Adrian Tam November 7, 2021 at 7:15 am #
  
  If you already have the dataset, you don’t need to call this “make_regression()” function. Rather, you need to read the data. See here for an example on how to read data from CSV files: https://machinelearningmastery.com/how-to-load-and-explore-household-electricity-usage-data/
  
  Reply
Islam Hussain January 21, 2022 at 10:42 pm #

Hello can you help me in ploting multiple linear regression variable in a single plot

Reply
- James Carmichael January 22, 2022 at 10:06 am #
  
  Hello Islam…Please refer to the following:
  
  https://www.kite.com/python/answers/how-to-make-multiple-plots-on-the-same-figure-in-matplotlib-in-python#:~:text=Call%20matplotlib.,plots%20in%20the%20same%20graph.
  
  Reply
Ogawa February 15, 2022 at 4:51 pm #

There is a MultiOutputClassifier as well as a MultiOutputRegressor.

If I want to combine regression and classification, how can I do that?

Reply
Alex February 19, 2022 at 2:10 am #

Hello ,i am researching about the multi-output regression for two month ,i find that chain is always suck,although the output is relevant.So can we find when the chain will work!

Reply
- James Carmichael February 19, 2022 at 12:53 pm #
  
  Hi Alex…Please clarify your question to be more specific to a machine learning technique or concept so that I may better assist you.
  
  Reply
Giacomo March 12, 2022 at 1:17 am #

Good morning, I have developed a multi-output regression model in Keras (1D CNN) with 4 different outputs. MSE is the loss function defined for each output variable. For me it’s not so clear how to set the loss weight associated to each output variable. Does it depend on the order of magnitude of the output variables? Is there a rule of thumb to set the loss weights? In general, what are the intuitions behind the choice of the values? Thanks a lot.

Reply
bel March 22, 2022 at 1:11 am #

Thank you for the great post.

Regarding Linear Regression for Multioutput Regression: it could be that different output variables have very different errors (MAE or relative error).
What would you suggest doing in order to improve the predictions for those target variables with larger errors?

Thank you!

Reply
ali April 12, 2022 at 10:42 pm #

good tutorial to understand concepts
thanks a lot

Reply
- James Carmichael April 14, 2022 at 2:39 am #
  
  Thank you for the feedback and support Ali!
  
  Reply
Dave April 26, 2022 at 9:29 pm #

Hi Jason,

Do you have anything on using machine learning to optimise, say meet a end target given said conditions? Thank you!

Reply
- James Carmichael April 30, 2022 at 10:22 am #
  
  Hi Dave…you may find the following of interest:
  
  https://machinelearningmastery.com/optimization-for-machine-learning-crash-course/
  
  Reply
shihab May 24, 2022 at 11:48 pm #

hi,how i can predict y if i have only a single value of y (scalar). i mean a tensor and a scalar (y) to obtain predicted yi regards

Reply
- James Carmichael May 25, 2022 at 9:13 am #
  
  Hi Shihab…you may find the following of interest:
  
  https://machinelearningmastery.com/how-to-make-classification-and-regression-predictions-for-deep-learning-models-in-keras/
  
  https://machinelearningmastery.com/make-predictions-scikit-learn/
  
  Reply
Aayush Shah June 28, 2022 at 8:45 pm #

Hello, I have a couple of questions:

1. Does the problem of the assumption “output are independent” get solved by second approach? (This might be obvious – not askable question but I think it is not stated in the description in the second approach, so I asked)

2. I have tried RandomForest running for multi output problem. Where I have observed that: RandomForest if ran alone (without wrapper) and with wrapper results in different outputs. (checked with np.allclose) – I have kept the same random_state to avoid randomness.

Secondly, the same thing results in the same answer with other models such as LinearReg, DTree, KNN etc. Meaning, I ran LinReg alone to predict multiple outputs and then ran LinReg with wrapper (any of those ie. MultiOutputRegressor, or Chain) but np.allclose returns true in either case.

Also, tried running LinReg with Single target (say out of y1, y2, y3 – I trained LinReg with only y1) and compared the predictions with the other LinReg model which was trained on all targets. Then checked the first column (y1) in the prediction which was the exactly the same with the model with single target.

Same thing tried with RandomForest but that resulted in different values. Why is that happening? Is randomforest working differently? How is it preserving output relationships?

—

This was an outstanding article. Helped me much much of it.
Thanks for that. And sorry for the long explanation of my problem.

Reply
Aayush Shah June 29, 2022 at 8:26 pm #

Hello Jason, I am Aayush Again – I have been reading this paper: https://oa.upm.es/40804/1/INVE_MEM_2015_204213.pdf where multiple solutions to the Multi Output Regression have been suggested.

I want to ask a question based on the page number: 4, Topic 2.1.3 Regressor chains (Which is the same concept as you have mentioned as an 2nd option in this article)

The thing is that: Here you have stated that the “subsequent models are trained on the inputs X1..Xn along with the previous models' predicted y hats” Meaning, new models will have the input of y as y hats of previous models.

But in the paper, it suggests that instead of y hats, use the actual y to train the new models. So that means there is no role for y hats.

—

I have tried implementing both version in python (with y hats, with actual y) used some datasets to test and got to know that – overall “the version in which we train the new models on y hats work better”. This was strange… sometimes those with actual y worked better but that time the difference was negligible.

I want to ask you that – do you have any research on that which works better based on different tests? I mean what is the industry standard ie. to use the models on y hats or actual y?

Thanks and Regards,
Aayush Shah

Reply
Abdul Qadar June 29, 2022 at 10:37 pm #

Hello Jason,
Just imagine we have 10 features and if we will give 4 features as input to the model then model will give us 6 features as output and if we will give 5 features to the model as input then model will give us 5 features as output and so on.How this could be done? Please reply.

Reply
- James Carmichael June 30, 2022 at 12:02 pm #
  
  Hi Abdul…Please clarify and/or rephrase your question to detail what you are trying to accomplish so that we may better assist you.
  
  Reply
Aayush Shah July 4, 2022 at 4:30 pm #

Hello, Jason… I think my question is missed by mistake. Have just asked before Abdul’s query. If that’s fine will you please solve a couple of my queries?

Thank you, and sorry for writing a separate message just for this, please pardon me.

Reply
Alec Bench February 18, 2023 at 1:01 am #

Hi. Is it possible to implement hyperparameter tuning for each individual model that are created by Regressor chain or Multioutput regressor? or would I have to do it manually? Thanks.

Reply
- James Carmichael February 18, 2023 at 9:24 am #
  
  Hi Alec…I would recommed Bayesian Optimization for this task:
  
  https://github.com/rmcantin/bayesopt
  
  Reply
Erik D. March 9, 2023 at 10:42 pm #

Thanks for the article!

One thing that I have been wondering: Is it sometimes actually preferable to have 2 separate models with a single output instead of 1 model with 2 outputs? I mean since the error is calculated over both outputs, I could imagine that it can happen that one output ends up being underfit while the other ends up overfit.
Having 2 separate models then might give more control over such issues.

Is this reasoning correct?

Reply
- James Carmichael March 10, 2023 at 8:02 am #
  
  Hi Erik…You are very welcome! In some cases your suggestion may be the preferred method. It is also helpful to consider ensemble diversity:
  
  https://machinelearningmastery.com/ensemble-diversity-for-machine-learning/
  
  Reply
MBT September 8, 2023 at 2:11 am #

Hi

Can we use these two approaches for time series prediction as direct and hybrid methods?

Reply
- James Carmichael September 8, 2023 at 9:26 am #
  
  Hi MBT…You may wish to consider deep learning methods for time series prediction:
  
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Zeinab November 18, 2023 at 8:36 pm #

Hi
Is there exist à multioutput classifier in deep learning?

Reply
Nikita February 20, 2024 at 8:41 pm #

Hi, could you explain what happens in the last line of code? Why do we take yhat[0]? And why yhat returns a ndarray containing multiple rows? Shouldn’t prediction return 1-dimentional ndarray?

Reply
- James Carmichael February 21, 2024 at 9:53 am #
  
  Hi Nikita…This array contains the actual values that were predicted. If you have not already done so, execute the code to confirm the values stored in yhat[0].
  
  Reply
Ahmad April 18, 2024 at 12:01 pm #

Hi. I am using Multioutput regression function in python to build a multitask learning model like this: # Create the Multi-task Model (using RandomForestRegressor as an example)
multi_task_model = MultiOutputRegressor(RandomForestRegressor()). Is this the right way or something different?

Reply
- James Carmichael April 21, 2024 at 10:15 am #
  
  Hi Ahmad…Using the MultiOutputRegressor from scikit-learn to handle multi-task learning where you need to predict multiple outputs with a single model is indeed a practical approach. The example you provided is correct for setting up a multi-task model using a RandomForestRegressor as the underlying regressor. Here’s how you might typically use it:
  
  ### Basic Setup
  
  The setup you described is used to create a model that can independently predict each output with a separate regressor per output. Here’s a step-by-step guide:
  
  1. **Import Required Libraries**
  First, ensure you have the necessary imports:
  
  python from sklearn.ensemble import RandomForestRegressor from sklearn.multioutput import MultiOutputRegressor from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error
  
  2. **Prepare Your Data**
  You need to have your features (X) and targets (Y) prepared. Y should be a matrix where each column represents a different target for regression.
  
  python X = ... # Your features Y = ... # Your targets, with multiple columns X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
  
  3. **Create and Train the MultiOutput Model**
  As you’ve described, you wrap a RandomForestRegressor within a MultiOutputRegressor to handle each output feature with a separate model:
  
  python multi_task_model = MultiOutputRegressor(RandomForestRegressor(random_state=42)) multi_task_model.fit(X_train, Y_train)
  
  4. **Evaluate the Model**
  After training, you can evaluate the model using the test set:
  
  python Y_pred = multi_task_model.predict(X_test) mse = mean_squared_error(Y_test, Y_pred, multioutput='raw_values') print("Mean Squared Error for each output:", mse)
  
  ### Considerations and Alternatives
  
  – **Model Independence**: Each target in MultiOutputRegressor is predicted independently using a separate instance of the specified base estimator. This means that there is no sharing of information between the tasks, which might not be optimal if your tasks are related.
  
  – **Correlated Outputs**: If your tasks are related (the output variables are correlated), you might want to consider models that can explicitly exploit this correlation. Techniques such as chain regressions (RegressorChain) or building a custom model in a neural network framework that can learn shared representations might be more appropriate.
  
  – **Model Choice**: While RandomForestRegressor is a strong general-purpose model due to its robustness and ability to model nonlinear relationships, consider whether another model might be more suitable based on your specific data characteristics. For example, gradient boosting machines (like those from the XGBoost or LightGBM libraries) might provide better performance if configured correctly.
  
  – **Hyperparameter Tuning**: Regardless of the model you choose, tuning its parameters is crucial for achieving the best performance. Consider using techniques like grid search (GridSearchCV) or random search (RandomSearchCV) to find the optimal model settings.
  
  Using MultiOutputRegressor is a straightforward way to extend any single-output model from scikit-learn to handle multiple outputs. If your tasks have complex interdependencies or if you need a model that leverages shared information between outputs, you may need to explore more specialized multi-task learning models or custom solutions.
  
  Reply
  - zipeng zhang November 22, 2024 at 1:24 pm #
    
    Hi, I have some questions.
    If the random forest directly performs multiple outputs, it is not possible to take into account the correlation between multiple Y outputs (such as y1 and y2). In order to consider the correlation between y1 and y2, can I bring the random forest into the chain regression? That is, to use Random Forest as a meta-estimator for chained regression.
    Similarly, can LightGBM be introduced into chained regression in order to consider the correlation between y1 and y2?
    
    Reply
Zipeng Zhang November 22, 2024 at 3:22 pm #

Hi, I have some questions.
If the random forest directly performs multiple outputs, it is not possible to take into account the correlation between multiple Y outputs (such as y1 and y2). In order to consider the correlation between y1 and y2, can I bring the random forest into the chain regression? That is, to use Random Forest as a meta-estimator for chained regression.
Similarly, can LightGBM be introduced into chained regression in order to consider the correlation between y1 and y2?
Your comment is awaiting moderation.

Reply
- James Carmichael November 23, 2024 at 5:22 am #
  
  Yes, both **Random Forest** and **LightGBM** can be incorporated into **chained regression** to account for the correlation between multiple outputs (e.g., \( y_1 \) and \( y_2 \)). Here’s an explanation and some practical insights:
  
  —
  
  ### **Understanding Chained Regression**
  Chained regression works by converting a multi-output regression problem into a sequence of single-output regression problems. Each output (\( y_1, y_2, \ldots, y_n \)) is predicted using both the input features and the predictions of the preceding outputs in the chain.
  
  This allows the model to:
  1. Leverage correlations between the outputs (\( y_1 \) and \( y_2 \)).
  2. Sequentially refine predictions based on earlier outputs in the chain.
  
  —
  
  ### **Random Forest in Chained Regression**
  Random Forests (RFs) can indeed be used as the meta-estimator for chained regression. The key advantage of using RF in this context is its robustness to non-linear relationships and noise. When integrated into chained regression:
  1. **Each RF model predicts one output variable** while considering the previous outputs as additional features.
  2. This helps capture relationships between the input variables and the outputs, as well as dependencies between the outputs themselves.
  
  —
  
  ### **LightGBM in Chained Regression**
  LightGBM is another excellent candidate for chained regression, especially when dealing with large datasets or high-dimensional features. It offers:
  1. **Efficiency and speed**: Faster training times compared to Random Forest.
  2. **Customizability**: Ability to tune hyperparameters for gradient boosting to achieve better performance.
  3. **Handling of correlation**: Like Random Forest, LightGBM can also use previous outputs in the chain as features to model output dependencies.
  
  —
  
  ### **Implementation Outline**
  To implement chained regression with either Random Forest or LightGBM in Python:
  1. Use **sklearn.multioutput.RegressorChain** for chaining the regressors.
  2. Pass your chosen meta-estimator (Random Forest or LightGBM) as the base model.
  3. Fit the model and evaluate it on your multi-output data.
  
  Here’s an example with **Random Forest**:
  
  python from sklearn.multioutput import RegressorChain from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error import numpy as np
  # Example data X, y = np.random.rand(100, 10), np.random.rand(100, 2) # 10 features, 2 outputs X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Random Forest as base estimator base_model = RandomForestRegressor(n_estimators=100, random_state=42) # Chained regression chain_regressor = RegressorChain(base_model) chain_regressor.fit(X_train, y_train) # Predictions y_pred = chain_regressor.predict(X_test)
  # Evaluate performance mse = mean_squared_error(y_test, y_pred) print(f"Mean Squared Error: {mse}")
  
  For **LightGBM**, you can use the LGBMRegressor class from the lightgbm package in a similar way:
  
  python from lightgbm import LGBMRegressor
  # LightGBM as base estimator base_model = LGBMRegressor() # Chained regression chain_regressor = RegressorChain(base_model) chain_regressor.fit(X_train, y_train)
  # Predictions and evaluation (same as above)
  
  —
  
  ### **Things to Keep in Mind**
  1. **Correlation Handling**: Chained regression works well when the outputs are correlated. If there is little or no correlation between outputs, other methods (e.g., independent regressors) may perform equally well or better.
  2. **Order of Outputs**: The order in which outputs are chained can affect performance. Experiment with different orders to find the best sequence for your dataset.
  3. **Performance vs. Complexity**: LightGBM tends to scale better with larger datasets, while Random Forest might perform better on smaller datasets or when interpretability is a concern.
  
  —
  
  ### **Alternative Approaches**
  If the outputs are strongly correlated, you might also consider:
  1. **Multi-Output Gradient Boosting**: Use LightGBM’s built-in support for multi-output regression by directly fitting it to your data without chaining.
  2. **Neural Networks**: Deep learning models can naturally handle multi-output regression with shared hidden layers.
  3. **Multivariate Regression (PLS, etc.)**: Dimensionality reduction techniques like Partial Least Squares can capture interdependencies in the outputs.
  
  Reply
  - Zipeng Zhang December 6, 2024 at 1:06 am #
    
    Thank you very much for your response, Jason. Your answer helped me a lot.
    
    Reply
  - Zipeng Zhang December 6, 2024 at 11:43 am #
    
    Thank you very much for your response, James.
    
    Reply
  - Zipeng Zhang December 6, 2024 at 12:15 pm #
    
    Hi, I have a new question. When I want to perform hyperparameter tuning for chained regression in random forests, is the Bayesian optimization process calculating the scoring criteria based on multiple dependent variables (e.g., y1 and y2) at the same time? For example, is the MSE calculated separately for y1 and y2 and then averaged to minimize the mean of the MSE?
    
    bayes_search = BayesSearchCV(
    estimator = RandomForestRegressor(random_state=42),
    search_spaces=param_grid,
    scoring=’neg_mean_squared_error’,
    cv=cv,
    n_iter=50,
    random_state=42,
    verbose=3,
    n_jobs=-1
    )
    
    Reply
Zipeng Zhang December 8, 2024 at 6:39 pm #

Hi, I have a new question. When I want to perform hyperparameter tuning for chained regression in random forests (I have multiple Y outputs, such as y1 and y2), is the Bayesian optimization process calculating the scoring criteria based on multiple dependent variables (e.g., y1 and y2) at the same time? For example, is the MSE calculated separately for y1 and y2 and then averaged to minimize the mean of the MSE?

bayes_search = BayesSearchCV(
estimator = RandomForestRegressor(random_state=42),
search_spaces=param_grid,
scoring=’neg_mean_squared_error’,
cv=cv,
n_iter=50,
random_state=42,
verbose=3,
n_jobs=-1
)

Reply
Anirudh Upadhyay March 27, 2025 at 3:54 pm #

Hi
I want to understand one thing, the MultiOutputRegressor class when used, does it account for dependencies between Output variables(or they treat them independently?). How can we achieve that? Do models like that exists?

Reply
- James Carmichael March 28, 2025 at 8:21 am #
  
  Great question!
  
  The short answer is:
  
  ### ❌ MultiOutputRegressor does **not** account for dependencies between output variables.
  It simply **fits one regressor per target**, treating each target as **independent** of the others.
  
  —
  
  ### 🔍 How does MultiOutputRegressor work?
  
  python from sklearn.multioutput import MultiOutputRegressor from sklearn.ensemble import RandomForestRegressor
  multi_target_model = MultiOutputRegressor(RandomForestRegressor()) multi_target_model.fit(X_train, Y_train)
  
  Here, Y_train has multiple columns (e.g., predicting [height, weight, age]).
  The model trains a **separate RandomForestRegressor** for each column in Y_train, independently. So:
  
  – It **does not model** any correlation or dependency between the targets.
  – If height and weight are correlated, the model **won’t know that**.
  
  —
  
  ### ✅ What if you want to **model dependencies between output variables**?
  
  Here are some approaches:
  
  —
  
  ### ✅ 1. Use models that **natively support multi-output with dependencies**
  These models internally **jointly model** all output variables:
  
  – **Neural Networks (MLPRegressor)**: You can build a neural network with **multiple outputs**.
  – One forward pass predicts all outputs at once.
  – The weights can be shared, allowing the model to **capture dependencies**.
  
  python from sklearn.neural_network import MLPRegressor
  model = MLPRegressor() model.fit(X_train, Y_train) # Y_train with multiple columns
  
  Or custom PyTorch/Keras models for more flexibility.
  
  —
  
  ### ✅ 2. Use **Regressor Chains** (RegressorChain in sklearn)
  This method models dependencies **sequentially** by feeding previous outputs as inputs to the next model.
  
  python from sklearn.multioutput import RegressorChain from sklearn.linear_model import LinearRegression
  chain = RegressorChain(base_estimator=LinearRegression()) chain.fit(X_train, Y_train)
  
  – For target variables [y1, y2, y3], it models:
  – y1 = f1(X)
  – y2 = f2(X, y1)
  – y3 = f3(X, y1, y2)
  
  So the predictions of earlier outputs influence the later ones.
  
  —
  
  ### ✅ 3. Deep learning (PyTorch / Keras)
  
  With neural nets, you can design **shared layers** followed by **multiple heads** or just a single output layer with multiple units.
  
  This setup naturally allows modeling interactions between outputs.
  
  —
  
  ### ✅ 4. Multivariate regression techniques
  
  In statistics, **Multivariate Linear Regression** (not to be confused with multiple linear regression!) can jointly model multiple outputs using a **multivariate normal assumption**.
  
  Packages like statsmodels or frameworks like TensorFlow Probability allow you to define such models.
  
  —
  
  ### ✅ Summary Table
  
  | Method | Accounts for Output Dependencies? | Notes |
  |————————-|————————————|——-|
  | MultiOutputRegressor | ❌ No | Trains separate models |
  | RegressorChain | ✅ Partially | Sequential dependency |
  | MLP (Neural Networks) | ✅ Yes | Shared layers, custom |
  | Multivariate Regression | ✅ Yes | Statistical modeling |
  
  —
  
  Reply

Navigation

How to Develop Multi-Output Regression Models with Python

Tutorial Overview

Problem of Multioutput Regression

Want to Get Started With Ensemble Learning?

Check Scikit-Learn Version

Multioutput Regression Test Problem

Inherently Multioutput Regression Algorithms

Linear Regression for Multioutput Regression

k-Nearest Neighbors for Multioutput Regression

Decision Tree for Multioutput Regression

Evaluate Multioutput Regression With Cross-Validation

Wrapper Multioutput Regression Algorithms

Direct Multioutput Regression

Chained Multioutput Regression

Further Reading

APIs

Summary

Get a Handle on Modern Ensemble Learning!

Improve Your Predictions in Minutes

Bring Modern Ensemble Learning Techniques to
Your Machine Learning Projects

More On This Topic

232 Responses to How to Develop Multi-Output Regression Models with Python

Leave a Reply Click here to cancel reply.

Navigation

Tutorial Overview

Problem of Multioutput Regression

Want to Get Started With Ensemble Learning?

Check Scikit-Learn Version

Multioutput Regression Test Problem

Inherently Multioutput Regression Algorithms

Linear Regression for Multioutput Regression

k-Nearest Neighbors for Multioutput Regression

Decision Tree for Multioutput Regression

Evaluate Multioutput Regression With Cross-Validation

Wrapper Multioutput Regression Algorithms

Direct Multioutput Regression

Chained Multioutput Regression

Further Reading

APIs

Summary

Get a Handle on Modern Ensemble Learning!

Improve Your Predictions in Minutes

Bring Modern Ensemble Learning Techniques to Your Machine Learning Projects

More On This Topic

232 Responses to How to Develop Multi-Output Regression Models with Python

Leave a Reply Click here to cancel reply.

Bring Modern Ensemble Learning Techniques to
Your Machine Learning Projects