Last Updated on August 28, 2020

Multi-output regression involves predicting two or more numerical variables.

Unlike normal regression where a single value is predicted for each sample, multi-output regression requires specialized machine learning algorithms that support outputting multiple variables for each prediction.

Deep learning neural networks are an example of an algorithm that natively supports multi-output regression problems. Neural network models for multi-output regression tasks can be easily defined and evaluated using the Keras deep learning library.

In this tutorial, you will discover how to develop deep learning models for multi-output regression.

After completing this tutorial, you will know:

- Multi-output regression is a predictive modeling task that involves two or more numerical output variables.
- Neural network models can be configured for multi-output regression tasks.
- How to evaluate a neural network for multi-output regression and make a prediction for new data.

Let’s get started.

## Tutorial Overview

This tutorial is divided into three parts; they are:

- Multi-Output Regression
- Neural Networks for Multi-Outputs
- Neural Network for Multi-Output Regression

## Multi-Output Regression

Regression is a predictive modeling task that involves predicting a numerical output given some input.

It is different from classification tasks that involve predicting a class label.

Typically, a regression task involves predicting a single numeric value. Although, some tasks require predicting more than one numeric value. These tasks are referred to as **multiple-output regression**, or multi-output regression for short.

In multi-output regression, two or more outputs are required for each input sample, and the outputs are required simultaneously. The assumption is that the outputs are a function of the inputs.

We can create a synthetic multi-output regression dataset using the make_regression() function in the scikit-learn library.

Our dataset will have 1,000 samples with 10 input features, five of which will be relevant to the output and five of which will be redundant. The dataset will have three numeric outputs for each sample.

The complete example of creating and summarizing the synthetic multi-output regression dataset is listed below.

1 2 3 4 5 6 |
# example of a multi-output regression problem from sklearn.datasets import make_regression # create dataset X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=3, random_state=2) # summarize shape print(X.shape, y.shape) |

Running the example creates the dataset and summarizes the shape of the input and output elements.

We can see that, as expected, there are 1,000 samples, each with 10 input features and three output features.

1 |
(1000, 10) (1000, 3) |

Next, let’s look at how we can develop neural network models for multiple-output regression tasks.

## Neural Networks for Multi-Outputs

Many machine learning algorithms support multi-output regression natively.

Popular examples are decision trees and ensembles of decision trees. A limitation of decision trees for multi-output regression is that the relationships between inputs and outputs can be blocky or highly structured based on the training data.

Neural network models also support multi-output regression and have the benefit of learning a continuous function that can model a more graceful relationship between changes in input and output.

Multi-output regression can be supported directly by neural networks simply by specifying the number of target variables there are in the problem as the number of nodes in the output layer. For example, a task that has three output variables will require a neural network output layer with three nodes in the output layer, each with the linear (default) activation function.

We can demonstrate this using the Keras deep learning library.

We will define a multilayer perceptron (MLP) model for the multi-output regression task defined in the previous section.

Each sample has 10 inputs and three outputs, therefore, the network requires an input layer that expects 10 inputs specified via the “*input_dim*” argument in the first hidden layer and three nodes in the output layer.

We will use the popular ReLU activation function in the hidden layer. The hidden layer has 20 nodes, which were chosen after some trial and error. We will fit the model using mean absolute error (MAE) loss and the Adam version of stochastic gradient descent.

The definition of the network for the multi-output regression task is listed below.

1 2 3 4 5 6 |
... # define the model model = Sequential() model.add(Dense(20, input_dim=10, kernel_initializer='he_uniform', activation='relu')) model.add(Dense(3)) model.compile(loss='mae', optimizer='adam') |

You may want to adapt this model for your own multi-output regression task, therefore, we can create a function to define and return the model where the number of input and number of output variables are provided as arguments.

1 2 3 4 5 6 7 |
# get the model def get_model(n_inputs, n_outputs): model = Sequential() model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu')) model.add(Dense(n_outputs)) model.compile(loss='mae', optimizer='adam') return model |

Now that we are familiar with how to define an MLP for multi-output regression, let’s explore how this model can be evaluated.

## Neural Network for Multi-Output Regression

If the dataset is small, it is good practice to evaluate neural network models repeatedly on the same dataset and report the mean performance across the repeats.

This is because of the stochastic nature of the learning algorithm.

Additionally, it is good practice to use k-fold cross-validation instead of train/test splits of a dataset to get an unbiased estimate of model performance when making predictions on new data. Again, only if there is not too much data and the process can be completed in a reasonable time.

Taking this into account, we will evaluate the MLP model on the multi-output regression task using repeated k-fold cross-validation with 10 folds and three repeats.

Each fold the model is defined, fit, and evaluated. The scores are collected and can be summarized by reporting the mean and standard deviation.

The *evaluate_model()* function below takes the dataset, evaluates the model, and returns a list of evaluation scores, in this case, MAE scores.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# evaluate a model using repeated k-fold cross-validation def evaluate_model(X, y): results = list() n_inputs, n_outputs = X.shape[1], y.shape[1] # define evaluation procedure cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) # enumerate folds for train_ix, test_ix in cv.split(X): # prepare data X_train, X_test = X[train_ix], X[test_ix] y_train, y_test = y[train_ix], y[test_ix] # define model model = get_model(n_inputs, n_outputs) # fit model model.fit(X_train, y_train, verbose=0, epochs=100) # evaluate model on test set mae = model.evaluate(X_test, y_test, verbose=0) # store result print('>%.3f' % mae) results.append(mae) return results |

We can then load our dataset and evaluate the model and report the mean performance.

Tying this together, the complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# mlp for multi-output regression from numpy import mean from numpy import std from sklearn.datasets import make_regression from sklearn.model_selection import RepeatedKFold from keras.models import Sequential from keras.layers import Dense # get the dataset def get_dataset(): X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=3, random_state=2) return X, y # get the model def get_model(n_inputs, n_outputs): model = Sequential() model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu')) model.add(Dense(n_outputs)) model.compile(loss='mae', optimizer='adam') return model # evaluate a model using repeated k-fold cross-validation def evaluate_model(X, y): results = list() n_inputs, n_outputs = X.shape[1], y.shape[1] # define evaluation procedure cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) # enumerate folds for train_ix, test_ix in cv.split(X): # prepare data X_train, X_test = X[train_ix], X[test_ix] y_train, y_test = y[train_ix], y[test_ix] # define model model = get_model(n_inputs, n_outputs) # fit model model.fit(X_train, y_train, verbose=0, epochs=100) # evaluate model on test set mae = model.evaluate(X_test, y_test, verbose=0) # store result print('>%.3f' % mae) results.append(mae) return results # load dataset X, y = get_dataset() # evaluate model results = evaluate_model(X, y) # summarize performance print('MAE: %.3f (%.3f)' % (mean(results), std(results))) |

Running the example reports the MAE for each fold and each repeat, to give an idea of the evaluation progress.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

At the end, the mean and standard deviation MAE is reported. In this case, the model is shown to achieve a MAE of about 8.184.

You can use this code as a template for evaluating MLP models on your own multi-output regression tasks. The number of nodes and layers in the model can easily be adapted and tailored to the complexity of your dataset.

1 2 3 4 5 6 7 |
... >8.054 >7.562 >9.026 >8.541 >6.744 MAE: 8.184 (1.032) |

Once a model configuration is chosen, we can use it to fit a final model on all available data and make a prediction for new data.

The example below demonstrates this by first fitting the MLP model on the entire multi-output regression dataset, then calling the *predict()* function on the saved model in order to make a prediction for a new row of data.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# use mlp for prediction on multi-output regression from numpy import asarray from sklearn.datasets import make_regression from keras.models import Sequential from keras.layers import Dense # get the dataset def get_dataset(): X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=3, random_state=2) return X, y # get the model def get_model(n_inputs, n_outputs): model = Sequential() model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu')) model.add(Dense(n_outputs, kernel_initializer='he_uniform')) model.compile(loss='mae', optimizer='adam') return model # load dataset X, y = get_dataset() n_inputs, n_outputs = X.shape[1], y.shape[1] # get model model = get_model(n_inputs, n_outputs) # fit the model on all data model.fit(X, y, verbose=0, epochs=100) # make a prediction for new data row = [-0.99859353,2.19284309,-0.42632569,-0.21043258,-1.13655612,-0.55671602,-0.63169045,-0.87625098,-0.99445578,-0.3677487] newX = asarray([row]) yhat = model.predict(newX) print('Predicted: %s' % yhat[0]) |

Running the example fits the model and makes a prediction for a new row.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

As expected, the prediction contains three output variables required for the multi-output regression task.

1 |
Predicted: [-152.22713 -78.04891 -91.97194] |

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

## Summary

In this tutorial, you discovered how to develop deep learning models for multi-output regression.

Specifically, you learned:

- Multi-output regression is a predictive modeling task that involves two or more numerical output variables.
- Neural network models can be configured for multi-output regression tasks.
- How to evaluate a neural network for multi-output regression and make a prediction for new data.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.

Lors de l’exécution est générée l’erreur suivante:

File “C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py”, line 703, in is_tensor

return isinstance(x, tf_ops._TensorLike) or tf_ops.is_dense_tensor_like(x)

AttributeError: module ‘tensorflow.python.framework.ops’ has no attribute ‘_TensorLike’

Comment résoudre?

Sorry to hear that, perhaps confirm that your version of Keras and TensorFlow are up to date:

https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/

The tips here may also help:

https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me

Hy

Running this example I get the following error:

AttributeError: module ‘tensorflow.python.framework.ops’ has no attribute ‘_TensorLike’

A solution?

What version of Keras and TensorFlow are you using?

You must use the follow versions or higher:

Hi Jason,

could you tell me why a model is compiled for each cross validation fold?

Each model requires training a new model from scratch in order to establish an unbiased estimate of model performance when making predictions on out of sample instances.

If you are new to k-fold cross-validation, you can get started here:

https://machinelearningmastery.com/k-fold-cross-validation/

I thought a model had to be instantiated once and then was passed to the cross validation loop. I missed something I guess…

In the example below taken from one of your excellent article ( Add Binary Flags for Missing Values ) I understand that the model is instantiated and that this instance of the model is evaluated on each cross validation training folds :

[ step 1 ] model = RandomForest()

[ step 2 ] cv = RepeatedStratifiedKFold( n_splits = 10 , n_repeats = 3 )

[ step3 ] scores = cross_val_score( model , X , y , scoring = ‘accuracy’ , cv = cv , n_jobs = -1 )

Here it seems to me that for each training fold a model is instanciated and then evaluated on the corresponding testing folds.

Both logic seems different… or I missed something.

I tested passing the model to the evaluation function :

def evaluate_model( model , X , y ) :

results = list()

n_inputs = X.shape[ 1 ]

n_outputs = y.shape[ 1 ]

cv = RepeatedKFold( n_splits = 10 , n_repeats = 1 , random_state = 999 )

for train_ix, test_ix in cv.split( X ) :

X_train, X_test = X[train_ix], X[test_ix]

y_train, y_test = y[train_ix], y[test_ix]

model.fit( X_train , y_train , verbose = 0 , epochs = 100 )

mae = model.evaluate( X_test , y_test , verbose = 0 )

results.append( mae )

print( f’mae : {mae:.3f}’ )

return results

mae keeps decreasing… I do not understand why…

It is invalid as you continue to train the same model each loop.

The model must be re-defined and re-fit each cross-validation loop otherwise the evaluation is optimistic.

It does the same thing.

In that case, the model is created and fit a new for each cross-validation fold. You just don’t see it as it happens internally to the function.

OK. I thought the model was created once and then fitted on each cross validation fold.

I have been doing a little experiment.

cv = RepeatedStratifiedKFold( n_splits = 10 , n_repeats = 1 , random_state = 777 )

and then [ A ] :

> scores_A = []

> for train_ix , test_ix in cv.split( X , y ) :

> X_train , X_test = X[ train_ix , : ] , X[ test_ix , : ]

> y_train , y_test = y[ train_ix ] , y[ test_ix ]

> model = AdaBoostClassifier()

> model.fit( X_train , y_train )

> y_test_pred = model.predict( X_test )

> score = roc_auc_score( y_test , y_test_pred )

> print( f’score : {score}’ )

> scores_A.append( score )

After that [ B ] :

> scores_B = []

> model = AdaBoostClassifier()

> for train_ix , test_ix in cv.split( X , y ) :

> X_train , X_test = X[ train_ix , : ] , X[ test_ix , : ]

> y_train , y_test = y[ train_ix ] , y[ test_ix ]

> #model = AdaBoostClassifier()

> model.fit( X_train , y_train )

> y_test_pred = model.predict( X_test )

> score = roc_auc_score( y_test , y_test_pred )

> print( f’score : {score}’ )

> scores_B.append( score )

The difference between [ A ] and [ B ] is that with [ B ] the model is instantiated outside the loop once whereas with [ A ] it is created anew for each fold.

scores_A and _B are the same ( even when I change the random state parameter for RepeatedStratifiedKFold and/or the model used : RandomForest…).

This experiment makes me think the model has to be created once and be fitted on each cross validation fold.

For me, the sole difference between model_A = AdaBoostClassifier() an model_B = AdaBoostClassifier() is there memory location this is why I do not understand why the get_model function is called in the evaluate_model function…

PS :

I was not able to replicate the scores given by

> cross_val_score( model , X , y , scoring = ‘roc_auc’ , cv = cv , n_jobs = -1 )

The cross-validation procedure requires the model be re-fit each evaluation.

Internally cross_val_score will clone the model and refit from scratch each iteration. This is functionality equilivient to re-defining and re-fitting each iteration.

HI Jason,

Just out of curiosity, why would you build a multi-output model instead of multiple models of single outputs? When would it be better/worse?

Try both and see what works best for your specific dataset.

Hi Jason,

First of all, congratulations on your website and for your detailed explanations and examples. I really appreciate it!

I have a question for you, as I’ve already spent a considerable amount of time searching online, without significant success. How can we handle a situation where we have partial ground truth for our targets? (e.g. output vectors with some NaN values)

So far, I understand that if you provide a y_train with NaNs, then the loss function won’t behave properly. If we provided a loss function for each output variable (I think keras allows that with loss=[‘mse’, ‘mse’, …]), then in theory, we could dismiss a group of NaNs within a batch of an output variable, by filtering them out (practically making both y_true and y_pred = 0).

The problem is that if we create a custom loss function and try to replace NaNs with 0s in both vectors, then keras throws an error (I think it relates to the use of non-tensorflow functions to filter nans, thus making keras unable to compute the derivative of the loss function)

Could you think of an elegant solution on this problem? Or maybe a completely different approach? (i assume that this way is sensible from the optimization perspective)

Thanks in advance for your time!

Thanks!

It is common to replace missing inputs with an imputed value using statistics or a model.

For some models, you can mark missing values with a special value and allow the model to treat missing as just another value.

Also, you can mark missing values for some models and configure them to ignore them, e.g. a masking layer in neural nets.

I hope that gives you some ideas.

Hi Jason,

Thanks for your reply.

In this case, it’s not missing inputs, and I can’t really use any statistics to learn the missing targets.

Regarding the special values, I haven’t seen anything related in Keras. Are you aware of such symbol?

Also, I checked the masking layers, but that’s again only for the inputs. This wouldn’t affect the y_true in the loss function.

In general, it seems like a trivial problem which doesn’t seem to have a trivial solution…

Maybe I need to check how people handle missing labels on multi-label classification.

Hopefully I’ll figure it out soon!

You can use zero padding in the putput (yhat) then manually ignore zero padded values when using the output or evaluating predoctions.

This is very common in seq2seq problems in NLP.

This is a great nn-model for regression. I tried using mean sq log error for the loss, so I can interpret the reslt a bit better.

Now, when you evaluate or use the evaluation, is there a difference if you setup the repeat times as desired? Or just keep calling the evaluation function with just 2 or 3 repeats?

Just wandering with this algorithm as I never use tried RepeatedKFolds before.

Thanks for sharing J.

Perhaps you can keep the loss as mse, and use sq log error as a metric?

Repeated k-fold cross-validation is a good practice for stochastic models, if they are not too expensive to fit. 3 repeats is conservative, 10 or 30 is better.

More here:

https://machinelearningmastery.com/repeated-k-fold-cross-validation-with-python/

can you please explain how the mean absolute error in Keras loss function is calculated on multi-output vectors? Is it the average of corresponding values in vectors averaged over dimensions of the output and samples?

I believe it is error averaged across variables and samples.

Hello

I have a question

My data is not time dependent

Not image and video

Which Deep Learning model is more suitable for predicting my data?

Thanks for guiding me

An MLP.

Also, this will help:

https://machinelearningmastery.com/when-to-use-mlp-cnn-and-rnn-neural-networks/

Hi Jason,

Can you be my teacher

I’m happy to answer questions about machine learning.

Hi, thanks for the tutorial.

If the goal is simply prediction, what are the benefits of fitting a multi-output NN instead of multiple single-output NN? What are the gains in this case? Is there a paper or reference you can recommend?

The problem will require a single or multiple outputs. You must use the model that achieves the goals of the project.

Hi, thank you for this article!

I didn’t get once you go through the Cross validation step how you choose one configuration with respect to the others in order to make prediction on new data. I mean, once all the loop is finished what has to be done?

See this:

https://machinelearningmastery.com/make-predictions-scikit-learn/

Ok I think I’ve understood.

Based on what you have written in this article, I evaluate the model using k-fold-cross validation and based on what the score is I change the Layers and their parameters in my model accordingly. Then I make my network learning on all the data already used during validation and finally I make the predictions.

To conclude, Validation is used just to measure the performance of the model I have built, is that right?

The train set is used to fit the model.

The test set is used to evaluate the model.

The validation set can be used to tune the model or stop training the model at the right time.

More here:

https://machinelearningmastery.com/difference-test-validation-datasets/

Wondering about this: With “model.compile(loss=’mae’, optimizer=’adam’)”, I basically instruct keras to minimize the combined loss of both output values together. This averages out the two errors stemming from the two separate target values, which could lead to a large positive error on target 1 which is compensated by a large negative error on target 2.

I imagine to avoid this, I would have to create 2 separate submodels that optimize each loss individually – probably using the functional API?

Yes.

Another approach, a model with two output and two losses (so-called multi-output model):

https://machinelearningmastery.com/keras-functional-api-deep-learning/

Hi, I really appreciate your content!

I am starting with deep learning models and I have a project on mind.

What I want to do, giving certain input values (for example: 10 features), I want to predict a curve. I mean, I want to obtain a 2D curve (where Y axis will be force and X axis will be time). Can “Multi-output regression” be a solution? I will predict time steps (t0=5, t1=16, t2=26…).

What other solution can I use for this case?

thanks in advance

If you want a curve, then perhaps use curve fitting directly:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html

I tested this on my data set and found that it worked if iloc was used. ex:

X_train, X_test = X.iloc[train_ix], X.iloc[test_ix]

y_train, y_test = y.iloc[train_ix], y.iloc[test_ix]

Nice work!

Can I do this to forecast t+1, t+2 and so on?

Yes.

Hi Jason, how would it be for multi-output binary classification?

For example, forecast if t+1 is 0 or 1 and t+2 is 0 or 1 and t+3 is 0 or 1.

1. Output layer: model.add(Dense(n_outputs, activation=’sigmoid’))?

2. Loss function: ‘binary_crossentropy’?

Thanks in advance.

Great tutorial!!!

Yes, I believe so. Try it and see.

Hi Jason, first of all, thanks for your great content! I am also using MLP for doing some multi-output regression, but I found that when I tested the model, the output would always be the same regardless of the input (this also happened at training stage). I have normalized my input data and the dimension of my input and output are 450 and 120, respectively, also I used tanh activation function to bound my output within range [-1,1]. Do you have any suggestions on this? Thanks in advance 🙂

It may suggestion your model require further tuning for your dataset.

The suggestions here will help:

https://machinelearningmastery.com/start-here/#better

Hey Jason, I really love your works here. I’m working on a dataset_1 where the output classes of this dataset_1 has some other features dataset, say dataset_2, that the model could also learn from. The challenge I’m facing is that dataset_1 and dataset_2 are totally different, so there’s no way I could merge them on some common features.

I’d like to know if there’s a way to train a model that would be able to learn on dataset_1 then subsequently learn on dataset_2?

I am considering using dimensionality reduction to reduce the features of the dataset_2 to a single value and then use this single value as an output to dataset_1 in a multi-output model. Do you think this is a good approach?

Thanks!

Try it and compare to other approaches.

Perhaps try to ensemble their predictions?