The Keras library provides a way to calculate and report on a suite of standard metrics when training deep learning models.

In addition to offering standard metrics for classification and regression problems, Keras also allows you to define and report on your own custom metrics when training deep learning models. This is particularly useful if you want to keep track of a performance measure that better captures the skill of your model during training.

In this tutorial, you will discover how to use the built-in metrics and how to define and use your own metrics when training deep learning models in Keras.

After completing this tutorial, you will know:

- How Keras metrics work and how you can use them when training your models.
- How to use regression and classification metrics in Keras with worked examples.
- How to define and use your own custom metric in Keras with a worked example.

Let’s get started.

## Tutorial Overview

This tutorial is divided into 4 parts; they are:

- Keras Metrics
- Keras Regression Metrics
- Keras Classification Metrics
- Custom Metrics in Keras

## Keras Metrics

Keras allows you to list the metrics to monitor during the training of your model.

You can do this by specifying the “*metrics*” argument and providing a list of function names (or function name aliases) to the *compile()* function on your model.

For example:

1 |
model.compile(..., metrics=['mse']) |

The specific metrics that you list can be the names of Keras functions (like *mean_squared_error*) or string aliases for those functions (like ‘*mse*‘).

Metric values are recorded at the end of each epoch on the training dataset. If a validation dataset is also provided, then the metric recorded is also calculated for the validation dataset.

All metrics are reported in verbose output and in the history object returned from calling the *fit()* function. In both cases, the name of the metric function is used as the key for the metric values. In the case of metrics for the validation dataset, the “*val_*” prefix is added to the key.

Both loss functions and explicitly defined Keras metrics can be used as training metrics.

## Keras Regression Metrics

Below is a list of the metrics that you can use in Keras on regression problems.

**Mean Squared Error**: mean_squared_error, MSE or mse**Mean Absolute Error**: mean_absolute_error, MAE, mae**Mean Absolute Percentage Error**: mean_absolute_percentage_error, MAPE, mape**Cosine Proximity**: cosine_proximity, cosine

The example below demonstrates these 4 built-in regression metrics on a simple contrived regression problem.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
from numpy import array from keras.models import Sequential from keras.layers import Dense from matplotlib import pyplot # prepare sequence X = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]) # create model model = Sequential() model.add(Dense(2, input_dim=1)) model.add(Dense(1)) model.compile(loss='mse', optimizer='adam', metrics=['mse', 'mae', 'mape', 'cosine']) # train model history = model.fit(X, X, epochs=500, batch_size=len(X), verbose=2) # plot metrics pyplot.plot(history.history['mean_squared_error']) pyplot.plot(history.history['mean_absolute_error']) pyplot.plot(history.history['mean_absolute_percentage_error']) pyplot.plot(history.history['cosine_proximity']) pyplot.show() |

Running the example prints the metric values at the end of each epoch.

1 2 3 4 5 6 7 8 9 10 11 |
... Epoch 96/100 0s - loss: 1.0596e-04 - mean_squared_error: 1.0596e-04 - mean_absolute_error: 0.0088 - mean_absolute_percentage_error: 3.5611 - cosine_proximity: -1.0000e+00 Epoch 97/100 0s - loss: 1.0354e-04 - mean_squared_error: 1.0354e-04 - mean_absolute_error: 0.0087 - mean_absolute_percentage_error: 3.5178 - cosine_proximity: -1.0000e+00 Epoch 98/100 0s - loss: 1.0116e-04 - mean_squared_error: 1.0116e-04 - mean_absolute_error: 0.0086 - mean_absolute_percentage_error: 3.4738 - cosine_proximity: -1.0000e+00 Epoch 99/100 0s - loss: 9.8820e-05 - mean_squared_error: 9.8820e-05 - mean_absolute_error: 0.0085 - mean_absolute_percentage_error: 3.4294 - cosine_proximity: -1.0000e+00 Epoch 100/100 0s - loss: 9.6515e-05 - mean_squared_error: 9.6515e-05 - mean_absolute_error: 0.0084 - mean_absolute_percentage_error: 3.3847 - cosine_proximity: -1.0000e+00 |

A line plot of the 4 metrics over the training epochs is then created.

Note that the metrics were specified using string alias values [‘*mse*‘, ‘*mae*‘, ‘*mape*‘, ‘*cosine*‘] and were referenced as key values on the history object using their expanded function name.

We could also specify the metrics using their expanded name, as follows:

1 |
model.compile(loss='mse', optimizer='adam', metrics=['mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'cosine_proximity']) |

We can also specify the function names directly if they are imported into the script.

1 2 |
from keras import metrics model.compile(loss='mse', optimizer='adam', metrics=[metrics.mean_squared_error, metrics.mean_absolute_error, metrics.mean_absolute_percentage_error, metrics.cosine_proximity]) |

You can also use the loss functions as metrics.

For example, you could use the Mean squared Logarithmic Error (*mean_squared_logarithmic_error*, *MSLE* or *msle*) loss function as a metric as follows:

1 |
model.compile(loss='mse', optimizer='adam', metrics=['msle']) |

## Keras Classification Metrics

Below is a list of the metrics that you can use in Keras on classification problems.

**Binary Accuracy**: binary_accuracy, acc**Categorical Accuracy**: categorical_accuracy, acc**Sparse Categorical Accuracy**: sparse_categorical_accuracy**Top k Categorical Accuracy**: top_k_categorical_accuracy (requires you specify a k parameter)**Sparse Top k Categorical Accuracy**: sparse_top_k_categorical_accuracy (requires you specify a k parameter)

Accuracy is special.

Regardless of whether your problem is a binary or multi-class classification problem, you can specify the ‘*acc*‘ metric to report on accuracy.

Below is an example of a binary classification problem with the built-in accuracy metric demonstrated.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
from numpy import array from keras.models import Sequential from keras.layers import Dense from matplotlib import pyplot # prepare sequence X = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]) y = array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) # create model model = Sequential() model.add(Dense(2, input_dim=1)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc']) # train model history = model.fit(X, y, epochs=400, batch_size=len(X), verbose=2) # plot metrics pyplot.plot(history.history['acc']) pyplot.show() |

Running the example reports the accuracy at the end of each training epoch.

1 2 3 4 5 6 7 8 9 10 11 |
... Epoch 396/400 0s - loss: 0.5934 - acc: 0.9000 Epoch 397/400 0s - loss: 0.5932 - acc: 0.9000 Epoch 398/400 0s - loss: 0.5930 - acc: 0.9000 Epoch 399/400 0s - loss: 0.5927 - acc: 0.9000 Epoch 400/400 0s - loss: 0.5925 - acc: 0.9000 |

A line plot of accuracy over epoch is created.

## Custom Metrics in Keras

You can also define your own metrics and specify the function name in the list of functions for the “*metrics*” argument when calling the *compile()* function.

A metric I often like to keep track of is Root Mean Square Error, or RMSE.

You can get an idea of how to write a custom metric by examining the code for an existing metric.

For example, below is the code for the mean_squared_error loss function and metric in Keras.

1 2 |
def mean_squared_error(y_true, y_pred): return K.mean(K.square(y_pred - y_true), axis=-1) |

K is the backend used by Keras.

From this example and other examples of loss functions and metrics, the approach is to use standard math functions on the backend to calculate the metric of interest.

For example, we can write a custom metric to calculate RMSE as follows:

1 2 3 4 |
from keras import backend def rmse(y_true, y_pred): return backend.sqrt(backend.mean(backend.square(y_pred - y_true), axis=-1)) |

You can see the function is the same code as MSE with the addition of the *sqrt()* wrapping the result.

We can test this in our regression example as follows. Note that we simply list the function name directly rather than providing it as a string or alias for Keras to resolve.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
from numpy import array from keras.models import Sequential from keras.layers import Dense from matplotlib import pyplot from keras import backend def rmse(y_true, y_pred): return backend.sqrt(backend.mean(backend.square(y_pred - y_true), axis=-1)) # prepare sequence X = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]) # create model model = Sequential() model.add(Dense(2, input_dim=1, activation='relu')) model.add(Dense(1)) model.compile(loss='mse', optimizer='adam', metrics=[rmse]) # train model history = model.fit(X, X, epochs=500, batch_size=len(X), verbose=2) # plot metrics pyplot.plot(history.history['rmse']) pyplot.show() |

Running the example reports the custom RMSE metric at the end of each training epoch.

1 2 3 4 5 6 7 8 9 10 11 |
... Epoch 496/500 0s - loss: 1.2992e-06 - rmse: 9.7909e-04 Epoch 497/500 0s - loss: 1.2681e-06 - rmse: 9.6731e-04 Epoch 498/500 0s - loss: 1.2377e-06 - rmse: 9.5562e-04 Epoch 499/500 0s - loss: 1.2079e-06 - rmse: 9.4403e-04 Epoch 500/500 0s - loss: 1.1788e-06 - rmse: 9.3261e-04 |

At the end of the run, a line plot of the custom RMSE metric is created.

Your custom metric function must operate on Keras internal data structures that may be different depending on the backend used (e.g. *tensorflow.python.framework.ops.Tensor* when using tensorflow) rather than the raw yhat and y values directly.

For this reason, I would recommend using the backend math functions wherever possible for consistency and execution speed.

## Further Reading

This section provides more resources on the topic if you are looking go deeper.

- Keras Metrics API documentation
- Keras Metrics Source Code
- Keras Loss API documentation
- Keras Loss Source Code

## Summary

In this tutorial, you discovered how to use Keras metrics when training your deep learning models.

Specifically, you learned:

- How Keras metrics works and how you configure your models to report on metrics during training.
- How to use classification and regression metrics built into Keras.
- How to define and report on your own custom metrics efficiently while training your deep learning models.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

Off topic but interesting none the less :

1) how to train an ensemble of models in the same time it takes to train 1

http://www.kdnuggets.com/2017/08/train-deep-learning-faster-snapshot-ensembling.html

2) when not to use deep learning

http://www.kdnuggets.com/2017/07/when-not-use-deep-learning.html

Thanks for sharing.

Hi Jason,

Thanks again for another great topic on keras but I’m a R user !

I can work with keras on R, but how about to implement custom metric ‘rmse’ on keras R please ?

Because I find something like that on the github repository :

metric_mean_squared_error <- function(y_true, y_pred) {

keras$metrics$mean_squared_error(y_true, y_pred)

}

attr(metric_mean_squared_error, "py_function_name") <- "mean_squared_error"

and my poor

rmse <- function(y_true, y_pred) {

K$sqrt(K$mean(K$square(y_pred – y_true)))

}

is not working ("nan" is returned)

Ok finally I make it return a value different from ‘nan’, but the result is not the same as the square root of ‘mse’ from keras ?!? Maybe due to the arg ‘axis = -1’ ?

Sorry, I have not used Keras in R, I don’t have good advice for you at this stage.

hi Jason,

Thanks for your very good topic on evaluation metrics in keras. can you please tell me how to compute macro-F and the micro-F scores?

thanks in advance

Sorry, I am not familiar with those scores John.

Perhaps find a definition and code them yourself?

Hi Jason,

I used your “def rmse” in my code, but it returns the same result of mse.

# define data and target value

X = TFIDF_Array

Y = df[‘Shrinkage’]

# custom metric to calculate RMSE

def RMSE(y_true, y_pred):

return backend.sqrt(backend.mean(backend.square(y_pred – y_true), axis=-1))

# define base model

def regression_model():

# create model

model = Sequential()

model.add(Dense(512, input_dim=X.shape[1], kernel_initializer=’uniform’, activation=’relu’))

model.add(Dropout(0.5))

model.add(Dense(1, kernel_initializer=’uniform’))

# compile model

model.compile(loss=’mse’, optimizer=’sgd’, metrics=[RMSE])

return model

# evaluate model

estimator = KerasRegressor(build_fn=regression_model, nb_epoch=100, batch_size=32, verbose=0)

kfold = KFold(n_splits=3, random_state=1)

reg_results = cross_val_score(estimator, X, Y, cv=kfold)

Did the example in the post – copied exactly – work for you?

Epoch 496/500

0s – loss: 3.9225e-04 – rmse: 0.0170

Epoch 497/500

0s – loss: 3.8870e-04 – rmse: 0.0169

Epoch 498/500

0s – loss: 3.8518e-04 – rmse: 0.0169

Epoch 499/500

0s – loss: 3.8169e-04 – rmse: 0.0168

Epoch 500/500

0s – loss: 3.7821e-04 – rmse: 0.0167

It gave back different values from yours.

Epoch 497/500

0s – loss: 0.0198 – mean_squared_error: 0.0198

Epoch 498/500

0s – loss: 0.0197 – mean_squared_error: 0.0197

Epoch 499/500

0s – loss: 0.0197 – mean_squared_error: 0.0197

Epoch 500/500

0s – loss: 0.0196 – mean_squared_error: 0.0196

and these were the result when I used:

metrics=[‘mean_squared_error’]

I didn’t see any difference of MSE and RMSE here.

Please advise. Thanks.

Yes, this is to be expected. Machine learning algorithms are stochastic meaning that the same algorithm on the same data will give different results each time it is run. See this post for more details:

https://machinelearningmastery.com/randomness-in-machine-learning/

Dear Jason,

Thank you again for the awsome blog and clear explanations

If I understood well, RMSE should be equal to sqrt(mse), but this is not the case for my data:

Epoch 130/1000

10/200 [>………………………..] – ETA: 0s – loss: 0.0989 – rmse: 0.2656

200/200 [==============================] – 0s 64us/step – loss: 0.2856 – rmse: 0.4070

Please sir, how can we calculate the coefficient of determination

The mse may be calculated at the end of each batch, the rmse may be calculated at the end of the epoch because it is a metric.

For the determination coefficient I use this basic code

S1, S2 = 0, 0

for i in range(len(Y)):

S1 = S1 + (Y_pred_array[i] – mean_y)**2

S2 = S2 + (Y_array[i] – mean_y)**2

R2 = S1/S2

But this gives give bad results

How can you deal with Y_pred as iterable also it is a Tensor?

Thanks

Thanks for the article. How does Keras compute a mean statistic in a per batch fashion? Does it internally (magically) aggregate the sum and count to that point in the epoch and print the measure or does it compute the measure per batch and then again re-compute the metric at the end of each epoch over the entire data?

I believe the sum is accumulated and printed at the end of each batch or end of each epoch. I don’t recall which.

Great post and just in time as usual;

The issue is that I am trying to calculate the loss based on IoU (Intersection over union) and I have no clue how to do it using my backend (TensorFlow)

My output is like this(xmin,ymin,xmax,ymax)

Thanks

Sorry, I have not implemented (or heard of) that metric.

model.compile(loss=’mse’, optimizer=’adam’, metrics=[rmse])

Epoch 496/500

0s – loss: 1.2992e-06 – rmse: 9.7909e-04

loss is mse. Should mse = rmse^2? Above value (9.7909e-04)^2 is 9.6e-8, which mismatch 1.2992e-06. Did I misunderstand something? Thanks.

The loss and metrics might not be calculated at the same time, e.g. end of batch vs end of epoch.

Thanks for reply.

history = model.fit(X, X, epochs=500, batch_size=len(X), verbose=2)

I thought the duration of batch is equal to one epoch, since batch_size=len(X). If it is correct?

Furthermore, it seems that the loss of epoch is also updated each iteration.

Epoch 496/500

0s – loss: 1.2992e-06 – rmse: 9.7909e-04

No, one epoch is comprised of 1 or more batches. Often 32 samples per batch are used as a default.

Lear more here:

https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/

Thanks a lot for your time to explain and find the link.

I am sorry. I think I did not express my thoughts correctly.

In the above example, history = model.fit(X, X, epochs=500, batch_size=len(X), verbose=2)

batch_size=len(X)

batch_size: Integer or None. Number of samples per gradient update. If unspecified, batch_size will default to 32.

Since batch_size has been specified as the length of testset, may I consider one epoch comprises 1 batch and the end of a batch is the time when an epoch is end? Model ’mse’ loss is the rmse^2.

Yes, correct.

Thanks for the great article, Jason. I have 2 questions;

1) I have a pipeline which has a sequence like : Normalizer –> KerasRegressor

Can I simply use history = pipeline.fit(..) then plot metrics ?

2) I have a KFold crossvalidation like that:

kfold = StratifiedKFold(n_splits=3)

results = cross_val_score(pipeline, X, Y, cv=kfold, scoring = mape)

How I can plot that 3 CV fits’ metrics?

Thanks.

No, I don’t believe you can easily access history when using the sklearn wrappers.

HI Dr. Jason Brownlee

Thanks for good tutorial.

What is the different between these two lines

score = model.evaluate(data2_Xscaled, data2_Yscaled, verbose=verbose)

y_hat = model.predict(data2_Xscaled)

objective metric is the customized one def rmse(y_true, y_pred)

the score value should also equal to y_hat

One evaluates the model the other makes a prediction.

Hello Jason,

Thanks for your work.

I’m using MAE as metric in a multi-class classification problem with ordered classes.

Because, in my problem it is not the same to classify a record of class 5 in the class 4, than to assign it to the class 1.

My model is:

network %

layer_dense(units = 32, activation = “relu”, input_shape = c(38)) %>%

layer_dense(units = 5, activation = “softmax”)

network %>% compile(

optimizer = “rmsprop”,

loss = “categorical_crossentropy”,

metrics = c(“mae”)

)

But the model does not correctly calculate the MAE.

It is possible to use MAE for this classification problem?

Tanks

MAE is not an appropriate measure of error for classification, it is intended for regression problems.

You can learn the difference between classification and regression here:

https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-classification-and-regression

Do you have a code written for the mean_iou metric?

What is mean_iou?

Hi Jason,

When defining a custom metrics function, the y_true and y_pred are of type Tensor. If I have my own function that takes numpy arrays as input how do I convert y_true and y_pred to numpy arrays?

You will need to work with Tensor types. That is the expectation of Keras.

Dear Jason,

How can we use precision and recall metrics for Deep Learning with Keras in Python?

Thanx in advance.

You can make predictions with our model then use the precision and recall metrics from the sklearn library.

Hello mr Jason

I have a question that have confused me for so long.

For a multiple output regression problem, what does the MSE loss function compute exactly ?

Thank you in advance.

Nice article(s) Jason.

At run time, I wanted to bucket the classes and evaluate. So tried this function but it returns

`nan`

.def my_metric(y_true, y_pred):

actual = tf.floor( y_true / 10 )

predicted = tf.floor( y_pred / 10 )

return K.categorical_crossentropy(actual, predicted)

Sorry, I cannot debug your code. Perhaps post to stackoverflow?

Sorry. Thanks a lot. Learned some good things 🙂

Hello mr Jason

For a multiple output regression problem, what does the MSE loss function compute exactly ?

Is it the sum of the MSE over all the output variables, the average or something else ?

Thank you in advance.

The average of the squared differences between model predictions and true values.

This is for just one output, what if I have multiple outputs ?

You can calculate the metric for each time step or output. I have an example here:

https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/

Thank you so much for your Tutorial. Nowadays I follow your twitter proposals everyday. It’s great !

I have two questions:

1) regarding sequential model in the last example;

if I remove activation definition = ‘relu’, in your last code example … I got a surprising better RMSE performance values… it is suggest to me that has something to do with Regression issues that works better if we do not put activation at all in the first hide layer. Is it casual result or any profound reason?

2) using a same architectural model, which is better a Regression approach (we leave out the activation in the output layer) or a multinomial classification (we set up the appropriate ‘softmax’ as activation in the output layer), imagine for example, we analyze same problem, e.g. we have all continuos label output or any discrete multiclass label (for getting for example rounded real number by their equivalent integer number), for a serie of real number samples …I mean is there any intrinsic advantage or behavior using Regression analysis vs Multinomial classification ?

thanks

JG

It really depends on the problem as to the choice and benefit of activation functions.

In terms of activation in the output layer – what I think you’re asking about, the heuristics are:

– regression: use ‘linear’

– binary classification: use ‘sigmoid’

– multi-class classification: use ‘softmax’.

Does that help?

If I imagine a continuos curve for a linear regression as output of prediction vs imagine for the same problem different outputs of segments to categorize a big multi-class classification, using the same main model architecture (obviously with different units and activation at output, loss and metrics at compilation, etc) …which model will perform better the regression or the multi-class (for the same problem approach) ?

My intuition tell me that multi-class it is more fine because it can focus on specific segment output (classes) of the linear regression curve (and even it has more units at the output therefore more analysis it is involved. Do not worry to much, I try in the future to experiment with some specific examples, to search for my question.

Anyway, do you like women basket? congratulations Australia won to España (-Spain it is my country -) two hours ago…

I don’t follow, sorry.

A classification model is best for classification and will perform beyond poorly for regression.

Not a sports guy, sorry 🙂

Thanks for the tutorial.

I followed all of the steps and used my own metric function successfully. I was also able to plot it.

The problem that I encountered was when I tried to load the model and the saved weights in order to use model.evaluate_generator(). I keep getting the error: “Exception has occurred: ValueError too many values to unpack (expected 2) ”

I was wondering if you know how to solve this problem.

I have not seen this. Is your version of Keras up to date? v2.2.4 or better?

How to extract and store the accuracy output from ‘loss’ and ‘metrics’ in the model.compile step in order to pass those float values to mlflow’s log_metric() function ?

history = regr.compile(optimizer, loss = ‘mean_squared_error’, metrics =[‘mae’])

My ‘history’ variable keeps coming up as ‘None type’

That is odd, I have not seen that before.

Perhaps post to the keras user group:

https://machinelearningmastery.com/get-help-with-keras/

Hi Dr. Brownlee,

By definition, rmse should be square root of mse.

But if we fit keras with batches, rmse would not be calculated correctly.

Could you give me some advices about how to use customized rmse metric with batches fitting properly?

If you add RMSE as a metric, it will be calculated at the end of each epoch, i.e. correctly.

Why is the cosine proximity value negative in this case. Should it not be positive since the dot product computed is of the same vectors it should be +1.0 right?

Perhaps because the framework expects to minimize loss.

What would be the correct interpretation of negative value in this case? In the example you have mentioned since both the vectors were same the value we received was -1.0. When i google the meaning of it certain blogs mentioned that it means that vector are similar but in opposite directions . This does not seem a correct interpretation as both vectors are same

Sorry, i don’t have material on this measure, perhaps this will help:

https://en.wikipedia.org/wiki/Cosine_similarity