How to Develop Multi-Output Regression Models with Python

Last Updated on

Multioutput regression are regression problems that involve predicting two or more numerical values given an input example.

An example might be to predict a coordinate given an input, e.g. predicting x and y values. Another example would be multi-step time series forecasting that involves predicting multiple future time series of a given variable.

Many machine learning algorithms are designed for predicting a single numeric value, referred to simply as regression. Some algorithms do support multioutput regression inherently, such as linear regression and decision trees. There are also special workaround models that can be used to wrap and use those algorithms that do not natively support predicting multiple outputs.

In this tutorial, you will discover how to develop machine learning models for multioutput regression.

After completing this tutorial, you will know:

  • The problem of multioutput regression in machine learning.
  • How to develop machine learning models that inherently support multiple-output regression.
  • How to develop wrapper models that allow algorithms that do not inherently support multiple outputs to be used for multiple-output regression.

Let’s get started.

How to Develop Multioutput Regression Models in Python

How to Develop Multioutput Regression Models in Python
Photo by a_terracini, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Problem of Multioutput Regression
    1. Check Scikit-Learn Version
    2. Multioutput Regression Test Problem
  2. Inherently Multioutput Regression Algorithms
    1. Linear Regression for Multioutput Regression
    2. k-Nearest Neighbors for Multioutput Regression
    3. Random Forest for Multioutput Regression
    4. Evaluate Multioutput Regression With Cross-Validation
  3. Wrapper Multioutput Regression Algorithms
    1. Separate Model for Each Output (MultiOutputRegressor)
    2. Chained Models for Each Output (RegressorChain)

Problem of Multioutput Regression

Regression refers to a predictive modeling problem that involves predicting a numerical value.

For example, predicting a size, weight, amount, number of sales, and number of clicks are regression problems. Typically, a single numeric value is predicted given input variables.

Some regression problems require the prediction of two or more numeric values. For example, predicting an x and y coordinate.

These problems are referred to as multiple-output regression, or multioutput regression.

  • Regression: Predict a single numeric output given an input.
  • Multioutput Regression: Predict two or more numeric outputs given an input.

In multioutput regression, typically the outputs are dependent upon the input and upon each other. This means that often the outputs are not independent of each other and may require a model that predicts both outputs together or each output contingent upon the other outputs.

Multi-step time series forecasting may be considered a type of multiple-output regression where a sequence of future values are predicted and each predicted value is dependent upon the prior values in the sequence.

There are a number of strategies for handling multioutput regression and we will explore some of them in this tutorial.

Check Scikit-Learn Version

First, confirm that you have a modern version of the scikit-learn library installed.

This is important because some of the models we will explore in this tutorial require a modern version of the library.

You can check the version of the library with the following code example:

Running the example will print the version of the library.

At the time of writing, this is about version 0.22. You need to be using this version of scikit-learn or higher.

Multioutput Regression Test Problem

We can define a test problem that we can use to demonstrate the different modeling strategies.

We will use the make_regression() function to create a test dataset for multiple-output regression. We will generate 1,000 examples with 10 input features, five of which will be redundant and five that will be informative. The problem will require the prediction of two numeric values.

  • Problem Input: 10 numeric variables.
  • Problem Output: 2 numeric variables.

The example below generates the dataset and summarizes the shape.

Running the example creates the dataset and summarizes the shape of the input and output elements of the dataset for modeling, confirming the chosen configuration.

Next, let’s look at modeling this problem directly.

Inherently Multioutput Regression Algorithms

Some regression machine learning algorithms support multiple outputs directly.

This includes most of the popular machine learning algorithms implemented in the scikit-learn library, such as:

  • LinearRegression (and related)
  • KNeighborsRegressor
  • DecisionTreeRegressor
  • RandomForestRegressor (and related)

Let’s look at a few examples to make this concrete.

Linear Regression for Multioutput Regression

The example below fits a linear regression model on the multioutput regression dataset, then makes a single prediction with the fit model.

Running the example fits the model and then makes a prediction for one input, confirming that the model predicted two required values.

k-Nearest Neighbors for Multioutput Regression

The example below fits a k-nearest neighbors model on the multioutput regression dataset, then makes a single prediction with the fit model.

Running the example fits the model and then makes a prediction for one input, confirming that the model predicted two required values.

Random Forest for Multioutput Regression

The example below fits a random forest model on the multioutput regression dataset, then makes a single prediction with the fit model.

Running the example fits the model and then makes a prediction for one input, confirming that the model predicted two required values.

Evaluate Multioutput Regression With Cross-Validation

We may want to evaluate a multioutput regression using k-fold cross-validation.

This can be achieved in the same way as evaluating any other machine learning model.

We will fit and evaluate a DecisionTreeRegressor model on the test problem using 10-fold cross-validation with three repeats. We will use the mean absolute error (MAE) performance metric as the score.

The complete example is listed below.

Running the example evaluates the performance of the decision tree model for multioutput regression on the test problem. The mean and standard deviation of the MAE is reported calculated across all folds and all repeats.

Importantly, error is reported across both output variables, rather than separate error scores for each output variable.

Wrapper Multioutput Regression Algorithms

Not all regression algorithms support multioutput regression.

One example is the support vector machine, although for regression, it is referred to as support vector regression, or SVR.

This algorithm does not support multiple outputs for a regression problem and will raise an error. We can demonstrate this with an example, listed below.

Running the example reports an error message indicating that the model does not support multioutput regression.

There are two workarounds that we can adopt in order to use an algorithm like SVR for multioutput regression.

They are to create a separate model for each output and to create a linear sequence of models, one for each output, where the output of each model is dependent upon the output of the previous models.

Thankfully, the scikit-learn library supports both of these cases. Let’s take a closer look at each.

Separate Model for Each Output (MultiOutputRegressor)

We can create a separate model for each output of the problem.

This assumes that the outputs are independent of each other, which might not be a correct assumption. Nevertheless, this approach can provide surprisingly effective predictions on a range of problems and may be worth trying, at least as a performance baseline.

You never know. The outputs for your problem may, in fact, be mostly independent, if not completely independent, and this strategy can help you find out.

This approach is supported by the MultiOutputRegressor class that takes a regression model as an argument. It will then create one instance of the provided model for each output in the problem.

The example below demonstrates using the MultiOutputRegressor class with linear SVR for the test problem.

Running the example fits a separate LinearSVR for each of the outputs in the problem using the MultiOutputRegressor wrapper class.

This wrapper can then be used directly to make a prediction on new data, confirming that multiple outputs are supported.

Chained Models for Each Output (RegressorChain)

Another approach to using single-output regression models for multioutput regression is to create a linear sequence of models.

The first model in the sequence uses the input and predicts one output; the second model uses the input and the output from the first model to make a prediction; the third model uses the input and output from the first two models to make a prediction, and so on.

This can be achieved using the RegressorChain class in the scikit-learn library.

The order of the models may be based on the order of the outputs in the dataset (the default) or specified via the “order” argument. For example, order=[0,1] would first predict the 0th output, then the 1st output, whereas an order=[1,0] would first predict the last output variable and then the first output variable in our test problem.

The example below uses the RegressorChain with the default output order to fit a linear SVR on the multioutput regression test problem.

Running the example first fits a linear SVR to predict the first output variable, then a second linear SVR to predict the second output variable using the input and the output of the first model. These models are fit on the entire dataset.

The fit chain of models is then used directly to make a prediction on a new test instance, predicting the required two output variables.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

APIs

Summary

In this tutorial, you discovered how to develop machine learning models for multioutput regression.

Specifically, you learned:

  • The problem of multioutput regression in machine learning.
  • How to develop machine learning models that inherently support multiple-output regression.
  • How to develop wrapper models that allow algorithms that do not inherently support multiple outputs to be used for multiple-output regression.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

67 Responses to How to Develop Multi-Output Regression Models with Python

  1. Patrick March 27, 2020 at 2:29 pm #

    Thank you for this post. I was not aware that scikit-learn had those wrapper classes. That is very handy.

    Thanks for show how to use them in a very clear straightforward way.

  2. Salvatore Parisi March 28, 2020 at 2:46 am #

    Sereasly interested.

  3. Asmaa March 30, 2020 at 10:07 pm #

    It is just amazing. Thank you

  4. Yuchuan March 31, 2020 at 2:21 pm #

    Hi Dr. Jason,

    Thanks for the meaningful tutorial article, it helps me a lot.

    I just tested and found all of the code works well in sklearn 0.20, so we don’t necessarily need to update to 0.22 or higher.

    Thanks again for the excellent article, looking forward to the next one.

  5. Gary March 31, 2020 at 3:21 pm #

    Hi Dr. Brownlee,

    Thanks for the excellent work!

    I have a question, can the wrapped model (either using MultiOutputRegressor or RegressorChain ) work with cross_val_score function? I tested the following code and found the score is 0.0, is there anything wrong with that?

    from numpy import absolute
    from numpy import mean
    from numpy import std
    from sklearn.datasets import make_regression
    from sklearn.multioutput import MultiOutputRegressor
    from sklearn.svm import LinearSVR
    from sklearn.model_selection import cross_val_score
    from sklearn.model_selection import RepeatedKFold
    # create datasets
    X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)
    # define model
    model = LinearSVR()
    wrapper = MultiOutputRegressor(model)
    # evaluate model
    cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
    n_scores = cross_val_score(wrapper, X, y, scoring=’neg_mean_absolute_error’, cv=cv, n_jobs=-1, error_score=’raise’)
    # summarize performance
    n_scores = absolute(n_scores)
    print(‘Result: %.3f (%.3f)’ % (mean(n_scores), std(n_scores)))

    Thank you in advance!

  6. Staffan Falk April 2, 2020 at 8:32 am #

    Thanks for a really good and pedagogical tutorial. Fits my need exactly! 🙂

  7. Francisco April 3, 2020 at 3:04 am #

    Hi Jason!

    Excellent tutorial. Is it possible to apply a Transformed Target Regressor with this multi-output regression models?

    Thanks in advance!

    • Jason Brownlee April 3, 2020 at 6:57 am #

      Probably. You might have to experiment to confirm it works as expected.

  8. Sudipta Chowdhury April 18, 2020 at 2:35 am #

    Hi Jason,

    Thanks a lot for this tutorial. It suits my need perfectly.
    I have a question regarding the correlation among different Y variables. Do the KNN or Random Forest models automatically consider the correlation among different Y while predicting? If they do that, is there any documentation on how it is done? I tried to look for it online, but found nothing. Could you please guide me to some resources? Thanks

    • Jason Brownlee April 18, 2020 at 6:06 am #

      You’re welcome.

      No, but some algorithms are not bothered by colinear inputs (ensembles of trees I think), and some are (linear models).

  9. Bahar April 19, 2020 at 1:46 am #

    Thanks for this very interesting tutorial. I have a question for some cases where we have for example about 500 inputs and 200 outputs. Does this Multi-Output Regression way work with such problems?(In fact I have a about 20 features and 500 input points that are extracted form scan-1 and 200 output points from scan-2.)

    • Jason Brownlee April 19, 2020 at 5:59 am #

      You’re welcome.

      Perhaps develop a quick prototype and see if the methods are appropriate and effective?

      • Bahar April 20, 2020 at 8:41 pm #

        Thanks for your reply:), the problem is my current data is not completely ready and I have to wait, so I was also thinking about Deep-learning methods such as convolution, but I have to do some research on it. Do you have any suggestion for such types of problems? It is something like image processing because it the input data and the output data are points that are extracted from scan(images).

        • Jason Brownlee April 21, 2020 at 5:54 am #

          If the input data are images, then CNNs would be a good method to explore.

          • Bahar April 22, 2020 at 6:33 pm #

            Actually input data are digits that are extracted from images. But as the number of my sample data are limited (about 30 samples) I cannot use CNNs. And I decided to convert one image into about 500 points/digits. Can you please let me know your opinion?

          • Jason Brownlee April 23, 2020 at 6:00 am #

            Test a suite of data preparations and models and discover what works best for your dataset.

  10. pratyush April 21, 2020 at 5:17 am #

    This is great. But i have a question on the metrics of these. Based on your test, which one would you say provided the best prediction?

    • Jason Brownlee April 21, 2020 at 6:06 am #

      It depends on your dataset.

      You must use controlled experiments to discover what works best on your project.

  11. Sanjan April 24, 2020 at 2:22 am #

    Why not simply use a neural network with multiple neurons in output layer or build a multi-target decision tree and let the model figure out patterns instead of us (the modelers) having to decide whether or not the outputs are correlated/ordered? If there’s a strong rationale for ordered outputs, then RegressorChain might be a good option.

    PS: I believe by default sklearn’s decision trees are amenable to this as per this post https://stackoverflow.com/questions/46062774/does-scikit-learns-decisiontreeregressor-do-true-multi-output-regression

    • Jason Brownlee April 24, 2020 at 5:50 am #

      You can use a neural net, but we cannot know which model will best for a given dataset, so we must test many different methods and select the simplest well performing model.

  12. César Magno April 27, 2020 at 1:46 pm #

    Jason your tutorials is simply amazing. I love them!
    Congrats for this awesome material.

  13. Dennis May 3, 2020 at 2:06 am #

    It seems wrapper method doesn’t work on some ensemble, I tried this:

    model = GradientBoostingRegressor(ExtraTreesRegressor())
    wrapper = MultiOutputRegressor(model)
    wrapper.fit(x,y)

    it said “TypeError: unsupported format string passed to ExtraTreesRegressor.__format__”

    But it works on AdaBoostRegressor.

    • Dennis May 3, 2020 at 2:08 am #

      Anyway, this is a good tutorial, and it gave many ideas to my final year project, thank you Jason!

    • Jason Brownlee May 3, 2020 at 6:14 am #

      Interesting.

      I believe ensembles of trees support multi-output regression directly – e.g. not wrapper required.

  14. Taran Rishit May 9, 2020 at 6:43 am #

    Thank you very much !! I was looking for something like this and was unable to find ways for multi output predictions other than the four you mentioned ,now i did .

  15. Taran Rishit May 9, 2020 at 6:55 am #

    Thank you

    Also i wanted to ask a question:

    I have coordinates in table (in time series)
    like
    a ,b,c->d
    b,c,d->e
    and so on
    where each point is in the form of [lat, long]
    So this is where I wanted to use a multiple regression output
    And I want to calculate error as well , which error metric would you suggest be best?
    I wanted to use MAPE but it has the form like diff/actual… where actual is of the form ([lat, long)]
    so I cant divide it with a list, it needs a single value. Any suggestions?

    • Jason Brownlee May 9, 2020 at 1:45 pm #

      You’re welcome.

      I recommend choosing a metric that best captures the goals of the project for you and project stakeholders. Also, perhaps check the literature to see what others have done on the same type of problem before you.

      If you are unsure MAE and RMSE are a great place to start.

  16. AD May 25, 2020 at 2:49 am #

    Very helpful!

    When using Chained models using wrapper class, can we use separate models and then wrap them all to fit?

    • Jason Brownlee May 25, 2020 at 5:54 am #

      What do you mean exactly? Perhaps you can elaborate.

  17. Danny Dunne May 25, 2020 at 10:37 pm #

    Thanks for this tutorial which I used in conjunction with the eBook.

    Can I ask – i am working on a multi-input/multioutput problem, with 3 sets of input (each set with 7 different variables with >10000 values). I have one output set (8 variables) for the combined 3 input sets.

    What’s the best way to import/organise this data (vectors/arrays?) at the beginning of the process to make it easy to use with scripts?

    • Jason Brownlee May 26, 2020 at 6:21 am #

      Well, generally sklearn expects input as a vector of rows and columns. So working with data in that form might be easier for you.

  18. Jack HU May 26, 2020 at 7:36 am #

    Really good post. It really helps me a lot. I have two questions:
    For Linear Regression for Multioutput in scikit-learn, when call fit(X, y). It is actually fit Separate Model for Each Output. Am I right?

    Second question, I am using kernel_ridge linear regression(RBF kernel). I use cross validation to choose 2 hyper-parameter- alpha: the parameter for L2 regulazation, and gamma:the parameter for RBF kernel. When I call

    kr = GridSearchCV(KernelRidge(kernel=’rbf’, gamma=0.1),
    param_grid={“alpha”: np.logspace(-2, 0, 10),
    “gamma”: np.logspace(-2, 0, 10)},
    scoring=’neg_mean_squared_error’)

    It fit Separate Model for each output ,but the Separate models share the same ‘alpha’ and ‘gamma’ in each CV parameter search?

    • Jason Brownlee May 26, 2020 at 1:19 pm #

      Thanks!

      Yes, MultiOutputRegressor fits a separate model for each target as described in the tutorial.

      Correct, they will both use the same hyperparameters. If this is not desirable, you can fit separate models manually.

      • JACK HU May 26, 2020 at 8:52 pm #

        Thanks for your reply. Also I found Gaussian process regression model in scikit learn- GaussianProcessRegressor- support mutioutput. Is the model actually model each outhput dimension as a single gaussian process regression problem?

  19. Bahar June 16, 2020 at 8:29 pm #

    Thanks for such useful tutorial. I am going to use GridSearchCV inorder to improve the results but I get error:

    from sklearn.model_selection import GridSearchCV
    modelchain4grid = LinearSVR(max_iter=10000)
    wrapper4grid = RegressorChain(modelchain4grid)
    tuned_parameters = [{‘C’: [1,3,5,7,9,11,13,15,17,19,21]}]
    grid = GridSearchCV(wrapper4grid, tuned_parameters,scoring = ‘neg_mean_squared_error’)
    grid_result = grid.fit(X_train, Y_train)
    print(grid_result)

    I get this error:
    ValueError: Invalid parameter C for estimator RegressorChain(base_estimator=LinearSVR(

    Could you please let me know where is my issue?

    Thanks in advance

    • Jason Brownlee June 17, 2020 at 6:23 am #

      I believe you may need to specify the C parameter as sub-parameter of the regressor chain model.

      I don’t know the syntax offhand, perhaps RegressorChain__C, but perhaps check the scikit-learn documentation for grid searching composite models.

      • Bahar June 23, 2020 at 7:33 pm #

        Thanks.
        That is “base_estimator__C” and I used then the folowing code:

        wrapper4grid = RegressorChain(modelchain4grid)
        print(wrapper4grid.get_params())
        svr = GridSearchCV(wrapper4grid, cv=5, param_grid={“base_estimator__C”: [1e0, 1e1, 1e2, 1e3]}, scoring=’accuracy’)
        grid_result = svr.fit(X_train, Y_train)

        But it does not support multioutput and gave me the following error:

        continuous-multioutput is not supported

        Could you please let me know if there is a good way to tune the parameters in multioutput regression?

        Thanks in advace

        • Bahar June 23, 2020 at 9:44 pm #

          Just a correction in scoring (scoring = ‘neg_mean_squared_error’), fixed it.

          The complete working code for multi-input-ouptut is:

          modelchain4grid = LinearSVR(max_iter=10000)
          wrapper4grid = RegressorChain(modelchain4grid)
          #print(wrapper4grid.get_params())
          svr = GridSearchCV(wrapper4grid, cv=5, param_grid={“base_estimator__C”: [1e0, 1e1, 1e2, 1e3]}, scoring = ‘neg_mean_squared_error’)
          grid_result = svr.fit(X_train, Y_train)

          a = grid_result.best_score_
          b = grid_result.best_params_
          c = grid_result.cv_results_[‘mean_test_score’]
          d = grid_result.best_estimator_
          print(a)
          print(b)
          print(c)
          print(d)
          means = grid_result.cv_results_[‘mean_test_score’]
          stds = grid_result.cv_results_[‘std_test_score’]
          params = grid_result.cv_results_[‘params’]

          for mean, stdev, param in zip(means, stds, params):
          print(“%f” % mean)
          print(“%f” % stdev)
          print(“%r” % param)

        • Jason Brownlee June 24, 2020 at 6:30 am #

          You may have to grid search manually, e.g. with some for loops.

          • Bahar July 10, 2020 at 9:35 am #

            Thanks Jason,

            I have further worked on RegressionChain and tried to setup a pipeline to fit and predict new data as follows:

            model = Pipeline([(‘sc’, StandardScaler()),(‘pca’, PCA(n_components=10)),(‘SVRchain’, RegressorChain(LinearSVR(max_iter=1000)))])
            model.fit(X_train, Y_train)
            print(model.fit)
            predictions= model.predict(X_test)
            print(predictions)

            But the predictions are different from when I do not use this Pipeline and do the StandardScaler and PCA first and then fit the model (not using Pipleline).

            Is there something that I have missed or we cannot set up sucha Pipleline for RegressorChain(LinearSVR)?
            The Pipeline itself is a kind if estimator that may conflict with the based estimator of RegressionChain that is LinearSVR?

            Thanks in advance

          • Jason Brownlee July 10, 2020 at 1:47 pm #

            Good question.

            I would recommend using a pipeline for the estimator within the regression train.

          • Bahar July 10, 2020 at 9:16 pm #

            Thank but I did not get you can you show in an code example please?

          • Bahar July 10, 2020 at 9:30 pm #

            Do you mean something like this:

            # create pipeline
            estimators = []
            estimators.append((‘standardize’, StandardScaler()))
            estimators.append((‘pca’, PCA(n_components=10)))
            estimators.append((‘LSVR’, LinearSVR(max_iter=1000)))
            model = RegressorChain(Pipeline(estimators))
            model.fit(X_train, Y_train)
            test = model.predict(X_test)
            print(test)

            If you meant something like the above mentioned code then it also does not give the right answer.
            Could you please let me know your feedback?

          • Jason Brownlee July 11, 2020 at 6:11 am #

            Yes, that is what I was thinking, does it work as expected?

          • Bahar July 10, 2020 at 9:53 pm #

            I hope that you meant the following way (that I tried it and it gives the expected result):

            pipe4estimator = Pipeline([(‘sc’, StandardScaler()),(‘pca’, PCA(n_components=10)),(‘SCRChain’, RegressorChain(LinearSVR(max_iter=1000))])
            sc_y = StandardScaler()
            y_train_std = sc_y.fit_transform(Y_train)
            pipe4estimator.fit(X_train, y_train_std)
            y_train_pred = sc_y.inverse_transform(pipe4estimator.predict(X_train))
            y_test_pred = sc_y.inverse_transform(pipe4estimator.predict(X_test))
            print(y_test_pred)

            If this is what I meant then we can result that:
            Seting up a pipeline for RegressorChain(LinearSVR()), needs extra standardScaler outof the pipeline after train and prediction steps. Right?

          • Jason Brownlee July 11, 2020 at 6:12 am #

            Well done!

            No, the pipeline knows how to prepare data fed into it after it is fit on training data.

  20. Chandra June 24, 2020 at 3:54 am #

    Really great post, Jason! I have read many of your articles and appreciate your to the point discussions. I have a question.

    Say, we are trying to learn a model for multi output regression. And, I would like to add ‘relatively’ more importance to learning one of the dimensions of the output vector.

    Do you have any suggestions for this situation? As far as I can think of, we can try to add more weight to that dimension while calculating the loss. But, I will have to add another hyperparameter that will control the weight for that dimension.

    Is there any other way you can think of?

    • Jason Brownlee June 24, 2020 at 6:39 am #

      Hmmm, you could design a model where the loss is calculated across the output vector and give more penalty to one output than the others. It might be easer with a neural net in that sense.

      Or train separate models and use different loss functions for each, with a strong penalty for the one that is more important.

  21. phillip June 25, 2020 at 1:01 am #

    Thank you for this artical.

    I have a question about the attribute in Condensed Nearest Neighbor Method. The sample_indices are the indices, which are filtered out? Or they are the indices, which should stay.

  22. Nico July 23, 2020 at 8:13 pm #

    Hi Jason.
    Is it possible to perform variable selection in RegressorChain and see the important variables for each step? For exemple with Lasso ?

  23. Paul Christian August 1, 2020 at 2:27 am #

    Jason to the rescue. Again! Thank you.

  24. THOUMMALA, NALINH August 2, 2020 at 10:38 pm #

    Hi…I would like to know if we can also use LSTM model to predict multioutput

Leave a Reply