Once you choose and fit a final deep learning model in Keras, you can use it to make predictions on new data instances.

There is some confusion amongst beginners about how exactly to do this. I often see questions such as:

How do I make predictions with my model in Keras?

In this tutorial, you will discover exactly how you can make classification and regression predictions with a finalized deep learning model with the Keras Python library.

After completing this tutorial, you will know:

- How to finalize a model in order to make it ready for making predictions.
- How to make class and probability predictions for classification problems in Keras.
- How to make regression predictions in in Keras.

Let’s get started.

## Tutorial Overview

This tutorial is divided into 3 parts; they are:

- Finalize Model
- Classification Predictions
- Regression Predictions

## 1. Finalize Model

Before you can make predictions, you must train a final model.

You may have trained models using k-fold cross validation or train/test splits of your data. This was done in order to give you an estimate of the skill of the model on out of sample data, e.g. new data.

These models have served their purpose and can now be discarded.

You now must train a final model on all of your available data. You can learn more about how to train a final model here:

## 2. Classification Predictions

Classification problems are those where the model learns a mapping between input features and an output feature that is a label, such as “*spam*” and “*not spam*“.

Below is an example of a finalized neural network model in Keras developed for a simple two-class (binary) classification problem.

If developing a neural network model in Keras is new to you, see the post:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# example of training a final classification model from keras.models import Sequential from keras.layers import Dense from sklearn.datasets.samples_generator import make_blobs from sklearn.preprocessing import MinMaxScaler # generate 2d classification dataset X, y = make_blobs(n_samples=100, centers=2, n_features=2, random_state=1) scalar = MinMaxScaler() scalar.fit(X) X = scalar.transform(X) # define and fit the final model model = Sequential() model.add(Dense(4, input_dim=2, activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam') model.fit(X, y, epochs=200, verbose=0) |

After finalizing, you may want to save the model to file, e.g. via the Keras API. Once saved, you can load the model any time and use it to make predictions. For an example of this, see the post:

For simplicity, we will skip this step for the examples in this tutorial.

There are two types of classification predictions we may wish to make with our finalized model; they are class predictions and probability predictions.

### Class Predictions

A class prediction is given the finalized model and one or more data instances, predict the class for the data instances.

We do not know the outcome classes for the new data. That is why we need the model in the first place.

We can predict the class for new data instances using our finalized classification model in Keras using the *predict_classes()* function. Note that this function is only available on *Sequential* models, not those models developed using the functional API.

For example, we have one or more data instances in an array called *Xnew*. This can be passed to the *predict_classes()* function on our model in order to predict the class values for each instance in the array.

1 2 |
Xnew = [[...], [...]] ynew = model.predict_classes(Xnew) |

Let’s make this concrete with an example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# example making new class predictions for a classification problem from keras.models import Sequential from keras.layers import Dense from sklearn.datasets.samples_generator import make_blobs from sklearn.preprocessing import MinMaxScaler # generate 2d classification dataset X, y = make_blobs(n_samples=100, centers=2, n_features=2, random_state=1) scalar = MinMaxScaler() scalar.fit(X) X = scalar.transform(X) # define and fit the final model model = Sequential() model.add(Dense(4, input_dim=2, activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam') model.fit(X, y, epochs=500, verbose=0) # new instances where we do not know the answer Xnew, _ = make_blobs(n_samples=3, centers=2, n_features=2, random_state=1) Xnew = scalar.transform(Xnew) # make a prediction ynew = model.predict_classes(Xnew) # show the inputs and predicted outputs for i in range(len(Xnew)): print("X=%s, Predicted=%s" % (Xnew[i], ynew[i])) |

Running the example predicts the class for the three new data instances, then prints the data and the predictions together.

1 2 3 |
X=[0.89337759 0.65864154], Predicted=[0] X=[0.29097707 0.12978982], Predicted=[1] X=[0.78082614 0.75391697], Predicted=[0] |

If you had just one new data instance, you could provide this as an instance wrapped in an array to the *predict_classes()* function; for example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# example making new class prediction for a classification problem from keras.models import Sequential from keras.layers import Dense from sklearn.datasets.samples_generator import make_blobs from sklearn.preprocessing import MinMaxScaler from numpy import array # generate 2d classification dataset X, y = make_blobs(n_samples=100, centers=2, n_features=2, random_state=1) scalar = MinMaxScaler() scalar.fit(X) X = scalar.transform(X) # define and fit the final model model = Sequential() model.add(Dense(4, input_dim=2, activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam') model.fit(X, y, epochs=500, verbose=0) # new instance where we do not know the answer Xnew = array([[0.89337759, 0.65864154]]) # make a prediction ynew = model.predict_classes(Xnew) # show the inputs and predicted outputs print("X=%s, Predicted=%s" % (Xnew[0], ynew[0])) |

Running the example prints the single instance and the predicted class.

1 |
X=[0.89337759 0.65864154], Predicted=[0] |

### A Note on Class Labels

Note that when you prepared your data, you will have mapped the class values from your domain (such as strings) to integer values. You may have used a *LabelEncoder*.

This *LabelEncoder* can be used to convert the integers back into string values via the *inverse_transform()* function.

For this reason, you may want to save (pickle) the *LabelEncoder* used to encode your *y* values when fitting your final model.

### Probability Predictions

Another type of prediction you may wish to make is the probability of the data instance belonging to each class.

This is called a probability prediction where, given a new instance, the model returns the probability for each outcome class as a value between 0 and 1.

You can make these types of predictions in Keras by calling the *predict_proba()* function; for example:

1 2 |
Xnew = [[...], [...]] ynew = model.predict_proba(Xnew) |

In the case of a two-class (binary) classification problem, the sigmoid activation function is often used in the output layer. The predicted probability is taken as the likelihood of the observation belonging to class 1, or inverted (1 – probability) to give the probability for class 0.

In the case of a multi-class classification problem, the softmax activation function is often used on the output layer and the likelihood of the observation for each class is returned as a vector.

The example below makes a probability prediction for each example in the *Xnew* array of data instance.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# example making new probability predictions for a classification problem from keras.models import Sequential from keras.layers import Dense from sklearn.datasets.samples_generator import make_blobs from sklearn.preprocessing import MinMaxScaler # generate 2d classification dataset X, y = make_blobs(n_samples=100, centers=2, n_features=2, random_state=1) scalar = MinMaxScaler() scalar.fit(X) X = scalar.transform(X) # define and fit the final model model = Sequential() model.add(Dense(4, input_dim=2, activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam') model.fit(X, y, epochs=500, verbose=0) # new instances where we do not know the answer Xnew, _ = make_blobs(n_samples=3, centers=2, n_features=2, random_state=1) Xnew = scalar.transform(Xnew) # make a prediction ynew = model.predict_proba(Xnew) # show the inputs and predicted outputs for i in range(len(Xnew)): print("X=%s, Predicted=%s" % (Xnew[i], ynew[i])) |

Running the instance makes the probability predictions and then prints the input data instance and the probability of each instance belonging to class 1.

1 2 3 |
X=[0.89337759 0.65864154], Predicted=[0.0087348] X=[0.29097707 0.12978982], Predicted=[0.82020265] X=[0.78082614 0.75391697], Predicted=[0.00693122] |

This can be helpful in your application if you want to present the probabilities to the user for expert interpretation.

## 3. Regression Predictions

Regression is a supervised learning problem where given input examples, the model learns a mapping to suitable output quantities, such as “0.1” and “0.2”, etc.

Below is an example of a finalized Keras model for regression.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# example of training a final regression model from keras.models import Sequential from keras.layers import Dense from sklearn.datasets import make_regression from sklearn.preprocessing import MinMaxScaler # generate regression dataset X, y = make_regression(n_samples=100, n_features=2, noise=0.1, random_state=1) scalarX, scalarY = MinMaxScaler(), MinMaxScaler() scalarX.fit(X) scalarY.fit(y.reshape(100,1)) X = scalarX.transform(X) y = scalarY.transform(y.reshape(100,1)) # define and fit the final model model = Sequential() model.add(Dense(4, input_dim=2, activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='linear')) model.compile(loss='mse', optimizer='adam') model.fit(X, y, epochs=1000, verbose=0) |

We can predict quantities with the finalized regression model by calling the *predict()* function on the finalized model.

The *predict()* function takes an array of one or more data instances.

The example below demonstrates how to make regression predictions on multiple data instances with an unknown expected outcome.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
# example of making predictions for a regression problem from keras.models import Sequential from keras.layers import Dense from sklearn.datasets import make_regression from sklearn.preprocessing import MinMaxScaler # generate regression dataset X, y = make_regression(n_samples=100, n_features=2, noise=0.1, random_state=1) scalarX, scalarY = MinMaxScaler(), MinMaxScaler() scalarX.fit(X) scalarY.fit(y.reshape(100,1)) X = scalarX.transform(X) y = scalarY.transform(y.reshape(100,1)) # define and fit the final model model = Sequential() model.add(Dense(4, input_dim=2, activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='linear')) model.compile(loss='mse', optimizer='adam') model.fit(X, y, epochs=1000, verbose=0) # new instances where we do not know the answer Xnew, a = make_regression(n_samples=3, n_features=2, noise=0.1, random_state=1) Xnew = scalarX.transform(Xnew) # make a prediction ynew = model.predict(Xnew) # show the inputs and predicted outputs for i in range(len(Xnew)): print("X=%s, Predicted=%s" % (Xnew[i], ynew[i])) |

Running the example makes multiple predictions, then prints the inputs and predictions side by side for review.

1 2 3 |
X=[0.29466096 0.30317302], Predicted=[0.17097184] X=[0.39445118 0.79390858], Predicted=[0.7475489] X=[0.02884127 0.6208843 ], Predicted=[0.43370453] |

The same function can be used to make a prediction for a single data instance, as long as it is suitably wrapped in a surrounding list or array.

For example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# example of making predictions for a regression problem from keras.models import Sequential from keras.layers import Dense from sklearn.datasets import make_regression from sklearn.preprocessing import MinMaxScaler from numpy import array # generate regression dataset X, y = make_regression(n_samples=100, n_features=2, noise=0.1, random_state=1) scalarX, scalarY = MinMaxScaler(), MinMaxScaler() scalarX.fit(X) scalarY.fit(y.reshape(100,1)) X = scalarX.transform(X) y = scalarY.transform(y.reshape(100,1)) # define and fit the final model model = Sequential() model.add(Dense(4, input_dim=2, activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='linear')) model.compile(loss='mse', optimizer='adam') model.fit(X, y, epochs=1000, verbose=0) # new instance where we do not know the answer Xnew = array([[0.29466096, 0.30317302]]) # make a prediction ynew = model.predict(Xnew) # show the inputs and predicted outputs print("X=%s, Predicted=%s" % (Xnew[0], ynew[0])) |

Running the example makes a single prediction and prints the data instance and prediction for review.

1 |
X=[0.29466096 0.30317302], Predicted=[0.17333156] |

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

- How to Train a Final Machine Learning Model
- Save and Load Your Keras Deep Learning Models
- Develop Your First Neural Network in Python With Keras Step-By-Step
- The 5 Step Life-Cycle for Long Short-Term Memory Models in Keras
- How to Make Predictions with Long Short-Term Memory Models in Keras

## Summary

In this tutorial, you discovered how you can make classification and regression predictions with a finalized deep learning model with the Keras Python library.

Specifically, you learned:

- How to finalize a model in order to make it ready for making predictions.
- How to make class and probability predictions for classification problems in Keras.
- How to make regression predictions in in Keras.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

Great article Jason. Do you recommend any articles for hyperparameter tuning to further improve accuracies? Also any articles for common problems and solutions during model tuning?

Yes, see this post:

http://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/

Thanks for tutorial.

I have a problem with my data because there aren’t of the same dimensions, so the software don’t work. If I fill in the blanks with random numbers to have the dimensions, it’s work but I haven’t the good answers. Can you explain me how I can do to solve this problem.

Thanks

Perhaps this post will help:

https://machinelearningmastery.com/handle-missing-timesteps-sequence-prediction-problems-python/

Thanks for the tutorial! If I want to build a CNN which has both classification and regression heads, I suppose I cannot use a sequential model. Do you know an example for such a multi-head CNN? Thank you

Here are some examples of multiple output models:

https://machinelearningmastery.com/keras-functional-api-deep-learning/

Thanks for your the explication,

Could you please put photos of network architectures aside the code ?

I think it will help us to understand best the architecture

Thanks for the suggestion.

hello Jason,

thanks for your articles. you really help us a lot.

may i ask one question ?

Can Keras be used to build clustering models?

keras.wrappers.scikit_learn can be used to build KerasClassifier model, Keras be used to build clustering models? If it can be, are there any examples for that?

you know i want to use some features like age, city, education, company, job title and so on to cluster people into some groups and to get the key features of each group.

Perhaps, I have not seen this.

Hi Jason and thanks for this post. I have a quick question about regression with bounded target values.

If my target values are always restricted between [0,1] with most of the values close to 0.5 (i.e., values are rarely close to 0 or 1), is it useful to use sigmoid output activation instead of linear? Would it help in convergence or stability when training a complex model? It seems like a waste not to take any advantage of the fact that target values belong into bounded interval.

So in your code, one would simply make a replacement

model.add(Dense(1, activation=’linear’)) –> model.add(Dense(1, activation=’sigmoid’))

Good question.

Yes, interesting idea. It might change the loss function used to fit the model, which may result in optimizing the wrong problem (e.g. logloss instead of mse). Nevertheless, try it and compare error results between the two approaches.

Yes, that is the code change.

What are the considerations in using scikit-learn over Keras for classification problems (or vis versa) in R or Python?

scikit-learn offers a suite of general ML algorithms, Keras is focused on neural network algorithms only.

Regarding Python vs R, I answer that here:

https://machinelearningmastery.com/faq/single-faq/what-programming-language-should-i-use-for-machine-learning

Does that help?

I am trying to predict a new image on a model that I trained with emnist letters. Here is the code snippet that tries to do so.

import matplotlib

# Force matplotlib to not use any Xwindows backend.

matplotlib.use(‘Agg’)

import keras

import matplotlib.pyplot as plt

from keras.models import load_model

from keras.preprocessing import image

from keras import backend as K

from scipy.misc import imread

from PIL import Image

import skimage.io as io

import skimage.transform as tr

import numpy as np

from keras.utils import plot_model

from keras.datasets import mnist

# Returns a compiled model identical to the previous one

model = load_model(‘matLabbed.h5’)

print(“Testing the model on our own input data”)

imgA = imread(‘A.png’)

imgA = tr.resize(imgA, (28, 28, 1)).astype(‘float32’)

imgA = imgA[np.newaxis, …]

y_classes = model.predict_classes(imgA)

When I try to print y_classes, it gives me numerical outputs e.g 4, 10. I am trying to figure out the labels that my data uses and compare that with the y_classes output. Any suggestion? Thanks.

The integer will relate to a class in your training data.

how to predict classes with model developed using functional API ?

You can call model.predict() then use argmax() on the resulting vector to get the class index.

Hi Jason, this code editor is really irritating, please consider substituting it with another or just with some simple text.

Here is more information on how to copy code from the snippets:

https://machinelearningmastery.com/faq/single-faq/how-do-i-copy-code-from-a-tutorial

Does that help?

The networks for classification and regression differ only a little (activation function of the output neuron and the the loss function) yet in the case of classification it is so easy to estimate the probability of the prediction (via predict_proba) while in the case of regression the analog is the prediction interval which is difficult to calculate for non-linear models like neural networks.

Why is there such a difference? Shouldn’t the probability estimation for regression be done as easily as for classification? A straightforward (but maybe naive) solution that I see is to bin the target variable, make a classification, then use predict_proba to get the probability of the predicted value to be in a certain interval, and then to construct prediction interval from that.

Can it be done this way? Or changing the loss function will make two problems (regression and classification for the binned target) so much different that the result of one problem cannot be transferred to another?

The binning approach would be my first/naive thought too. Try it and see.

OK, I tried it for a toy 1-dim regression task.

A few days ago, I asked this question on stackexchange. Didn’t get many replies, so I now answered my own question by implementing in Keras the idea that I outlined in my comment above.

It looks like it works. I am still not sure if I do all this correctly, and if it will work for real multi-dimensional problems. If you like, you can see it here: https://datascience.stackexchange.com/questions/31773/how-to-estimate-the-variance-of-regressors-in-scikit-learn/32486#32486

The approach makes me nervous.

Hi Jason,

I built a model to classify a financial time series for stock trading having 10 inputs and one output (‘0’ for stock going down, ‘1’ going up). Now I have 1400 such inputs for training the model and managed to get an accuracy of 97%! …

but when presenting new inputs to the model I don’t get a classification accuracy more than 30%! ..in my opinion the problem would be:

1. Overfitting

2, time series not predictable?

etc…

do you have any idea what the problem might be or how to improve the accuracy for new data presented to the model?

Thanks

Low skill might mean underfitting, the wrong model, an intractable problem, etc.

how to get the RMSE for the regression model?

You can calculate the MSE and then calculate the square root of the MSE.

sklearn has an implementation here:

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html

Does that help?

very helpful. thanks

I’m glad it helped.

Hi, I am working on a multilabel problem. My X_train has this shape (for instance) [[0 1 0 0 1]…]. I want to predict the next, like [1 0 0 1 0]. How can I do that? Each label is independent from the others. How train that? Where can I find some information to learn about it?

Thanks!:)

Sorry, I don’t have examples of multilabel forecasting, only multiclass forecasting.

Very nice! could you plz also make a similar tutorial with PyToorch.

Thanks for the suggestion.

Thanks for your post, Jason.

When we are predicting, we are required to revert back to the original scaling. as both the input and output seem to be scaled for the model to predict correctly.

Correct, we should return yhat to the original scale before evaluating error in the case of regression.

Hi, thanks for a great post.

I have question related to regression problem.

I am trying to develop a model that predcits 3 outputs from 36 input variables. To be clear, I have 3 output parameters and 36 input parameters. Which means that in the output layer I have 3 neurons. The model can sucessively predict whe there is only one output, but when it comes to 3 outputs it gives somethig useless. What do you suggest? By the way I use keras mlp regressor.

Perhaps try tuning the model to see if you can find a configuration that best suits your specific problem, here are some ideas:

http://machinelearningmastery.com/improve-deep-learning-performance/

Hi Jason, Thanks for the explanation of classification and regression architecture.

I am trying to use the regression model on 300 dimensions vectors. So the problem statement is to Know which two vectors are similar or non-similar. So for this, I have taken the difference between two similar vectors and non-similar vectors. We labeled similar diff vectors as 0 and non-similar diff vectors as 1. And feeding it to the network of which looks like this:

model = Sequential()

model.add(Dense(300, input_dim=301, activation=’relu’))

model.add(Dense(1, activation=’sigmoid’))

model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘acc’])

Training_Loss: 0.0171

Training_Acc: 0.9988

Testing_Loss = 0.6456110666

Testing_Acc = 0.821831868464

I am not sure, why the model is overfitting, Can you please give some insight into what could be wrong?

Perhaps try some different model configurations and see if they over fit? E.g. less nodes.

Hi,

Thanks for this explanation.

i am trying to use a end to end nvidia model for self driving car in keras. its a regression problem to predict the angle of steering by providing image of camera installed front side of car.

here the problem i am facing is when i predicting the angle using model.predict() , i get a constant value for all input. even my model gives very less error at the time of training like–

Epoch 00001: val_loss improved from inf to 21.63187, saving model to drive/abc/weights.best6.hdf5

Epoch 2/30

283/283 [==============================] – 199s 702ms/step – loss: 18.0307 – val_loss: 16.9002

Epoch 00002: val_loss improved from 21.63187 to 16.90022, saving model to drive/abc/weights.best6.hdf5

Epoch 3/30

5/283 […………………………] – ETA: 25s – loss: 16.0846283/283 [==============================] – 199s 704ms/step – loss: 14.2439 – val_loss: 13.9543

Epoch 00003: val_loss improved from 16.90022 to 13.95434, saving model to drive/abc/weights.best6.hdf5

Epoch 4/30

283/283 [==============================] – 200s 708ms/step – loss: 11.9354 – val_loss: 12.1763

Epoch 00004: val_loss improved from 13.95434 to 12.17632, saving model to drive/abc/weights.best6.hdf5

Epoch 5/30

4/283 […………………………] – ETA: 25s – loss: 11.4636283/283 [==============================] – 201s 711ms/step – loss: 10.5595 – val_loss: 11.1414

Epoch 00005: val_loss improved from 12.17632 to 11.14141, saving model to drive/abc/weights.best6.hdf5

Epoch 6/30

283/283 [==============================] – 201s 711ms/step – loss: 8.3687 – val_loss: 0.7611

Epoch 00006: val_loss improved from 11.14141 to 0.76112, saving model to drive/abc/weights.best6.hdf5

Epoch 7/30

3/283 […………………………] – ETA: 26s – loss: 1.0417283/283 [==============================] – 199s 704ms/step – loss: 0.7403 – val_loss: 0.4893

Epoch 00007: val_loss improved from 0.76112 to 0.48934, saving model to drive/abc/weights.best6.hdf5

Epoch 8/30

283/283 [==============================] – 203s 716ms/step – loss: 0.5372 – val_loss: 0.3477

Epoch 00008: val_loss improved from 0.48934 to 0.34773, saving model to drive/abc/weights.best6.hdf5

Epoch 9/30

6/283 […………………………] – ETA: 26s – loss: 0.4749283/283 [==============================] – 216s 763ms/step – loss: 0.4332 – val_loss: 0.2760

Epoch 00009: val_loss improved from 0.34773 to 0.27596, saving model to drive/abc/weights.best6.hdf5

Epoch 10/30

283/283 [==============================] – 211s 744ms/step – loss: 0.3821 – val_loss: 0.2406

Epoch 00010: val_loss improved from 0.27596 to 0.24057, saving model to drive/abc/weights.best6.hdf5

Epoch 11/30

6/283 […………………………] – ETA: 25s – loss: 0.3903283/283 [==============================] – 207s 733ms/step – loss: 0.3565 – val_loss: 0.2229

Epoch 00011: val_loss improved from 0.24057 to 0.22293, saving model to drive/abc/weights.best6.hdf5

Epoch 12/30

283/283 [==============================] – 222s 784ms/step – loss: 0.3438 – val_loss: 0.2134

Epoch 00012: val_loss improved from 0.22293 to 0.21340, saving model to drive/abc/weights.best6.hdf5

Epoch 13/30

7/283 […………………………] – ETA: 25s – loss: 0.3331283/283 [==============================] – 205s 724ms/step – loss: 0.3331 – val_loss: 0.2076

Epoch 00013: val_loss improved from 0.21340 to 0.20755, saving model to drive/abc/weights.best6.hdf5

Epoch 14/30

283/283 [==============================] – 218s 771ms/step – loss: 0.3305 – val_loss: 0.2036

Epoch 00014: val_loss improved from 0.20755 to 0.20359, saving model to drive/abc/weights.best6.hdf5

Epoch 15/30

7/283 […………………………] – ETA: 26s – loss: 0.5313283/283 [==============================] – 211s 745ms/step – loss: 0.3295 – val_loss: 0.2006

Epoch 00015: val_loss improved from 0.20359 to 0.20061, saving model to drive/abc/weights.best6.hdf5

Epoch 16/30

283/283 [==============================] – 210s 743ms/step – loss: 0.3272 – val_loss: 0.1982

Epoch 00016: val_loss improved from 0.20061 to 0.19824, saving model to drive/abc/weights.best6.hdf5

Epoch 17/30

6/283 […………………………] – ETA: 27s – loss: 0.3350283/283 [==============================] – 220s 778ms/step – loss: 0.3236 – val_loss: 0.1963

Epoch 00017: val_loss improved from 0.19824 to 0.19628, saving model to drive/abc/weights.best6.hdf5

Epoch 18/30

283/283 [==============================] – 205s 726ms/step – loss: 0.3219 – val_loss: 0.1946

Epoch 00018: val_loss improved from 0.19628 to 0.19460, saving model to drive/abc/weights.best6.hdf5

Epoch 19/30

6/283 […………………………] – ETA: 28s – loss: 0.4911283/283 [==============================] – 215s 761ms/step – loss: 0.3180 – val_loss: 0.1932

Epoch 00019: val_loss improved from 0.19460 to 0.19320, saving model to drive/abc/weights.best6.hdf5

Epoch 20/30

283/283 [==============================] – 215s 761ms/step – loss: 0.3197 – val_loss: 0.1920

Epoch 00020: val_loss improved from 0.19320 to 0.19198, saving model to drive/abc/weights.best6.hdf5

Epoch 21/30

6/283 […………………………] – ETA: 31s – loss: 0.7507283/283 [==============================] – 207s 731ms/step – loss: 0.3188 – val_loss: 0.1910

Epoch 00021: val_loss improved from 0.19198 to 0.19101, saving model to drive/abc/weights.best6.hdf5

Epoch 22/30

283/283 [==============================] – 221s 781ms/step – loss: 0.3201 – val_loss: 0.1901

Epoch 00022: val_loss improved from 0.19101 to 0.19008, saving model to drive/abc/weights.best6.hdf5

Epoch 23/30

7/283 […………………………] – ETA: 27s – loss: 0.6471283/283 [==============================] – 207s 730ms/step – loss: 0.3198 – val_loss: 0.1893

Epoch 00023: val_loss improved from 0.19008 to 0.18925, saving model to drive/abc/weights.best6.hdf5

Epoch 24/30

283/283 [==============================] – 215s 758ms/step – loss: 0.3199 – val_loss: 0.1886

Epoch 00024: val_loss improved from 0.18925 to 0.18856, saving model to drive/abc/weights.best6.hdf5

Epoch 25/30

6/283 […………………………] – ETA: 26s – loss: 0.6787283/283 [==============================] – 218s 769ms/step – loss: 0.3197 – val_loss: 0.1879

Epoch 00025: val_loss improved from 0.18856 to 0.18787, saving model to drive/abc/weights.best6.hdf5

Epoch 26/30

283/283 [==============================] – 206s 727ms/step – loss: 0.3191 – val_loss: 0.1873

Epoch 00026: val_loss improved from 0.18787 to 0.18725, saving model to drive/abc/weights.best6.hdf5

Epoch 27/30

5/283 […………………………] – ETA: 26s – loss: 0.2300283/283 [==============================] – 221s 779ms/step – loss: 0.3175 – val_loss: 0.1868

Epoch 00027: val_loss improved from 0.18725 to 0.18680, saving model to drive/abc/weights.best6.hdf5

Epoch 28/30

283/283 [==============================] – 209s 740ms/step – loss: 0.3186 – val_loss: 0.1865

Epoch 00028: val_loss improved from 0.18680 to 0.18653, saving model to drive/abc/weights.best6.hdf5

Epoch 29/30

5/283 […………………………] – ETA: 28s – loss: 0.1778283/283 [==============================] – 213s 752ms/step – loss: 0.3165 – val_loss: 0.1864

Epoch 00029: val_loss improved from 0.18653 to 0.18639, saving model to drive/abc/weights.best6.hdf5

Epoch 30/30

283/283 [==============================] – 218s 771ms/step – loss: 0.3167 – val_loss: 0.1864

Epoch 00030: val_loss did not improve from 0.18639

so my question is why model.predict() gives constant value always?

Thanks,

Perhaps the model needs to be tuned to your problem?

I have a question on text prediction.

Since we are using different approaches to enumerate the text before any model fitting. I face a logical problem here.

Imaging in my train set I have 20 uniq token with I enumerate them like this: (“i”,1)(“love”,2)(“this”,3)… .

After training the model and fitting the data I destroy everything but the model. So I cann’t keep track of this dictionary anymore.

When I want to predict a new sequence that its tokens are completely new things [“we”, “went”, “to”, “Spain”].

How could this new tokens being related to the old dictionary?

how should I enumerate it?

If I use a new dictionary to enumerate this, like: (“we”, 1)(“went”, 2)… . then how model can differentiate that this “We” is different from the “i” in train model?

Thanks in advance!

You must also keep the method used to encode words to integers.

Hi Jason, I don’t understand how program predicts 0 or 1?

Which part exactly?

Great Article. one thing is confusing though. For regression predictions, you used 3 instances in first example and 1 instance in next example, instance in second example is actually first instance in the first example. so why are their results different?

Sorry, I don’t follow. Perhaps you can provide more context?

Because model doesn’t run in exact the same manner everytime unless u save it that’s why.

How to evaluate the model? This code does not work for evaluation as in previous example.

scores = model.evaluate(X, y)

print(“\n%s: %.2f%%” % (model.metrics_names[1], scores[1]*100))

This tutorial is about making predictions.

I recommend evaluating model performing using using a train/test split or using a resampling method such as k-fold cross-validation. I have many tutorials on each.

Perhaps start here:

https://machinelearningmastery.com/faq/single-faq/how-do-i-evaluate-a-machine-learning-algorithm

I mean not about about train/test split, what if i want to just evaluate my model on the same data i used for training?

I used the command

scores = model.evaluate(X, y, verbose=2)

print(scores)

In diabetes example when i print scores,

output is : [0.5047814001639684, 0.7513020833333334]

score[1] here means accuracy of the model

In case of this regression example, when i print scores. I get 0.021803879588842393. What does this data mean? How can i calculate accuracy from this?

If you evaluate a model for regression, the score will be an error score, e.g. the loss chosen to minimize during training. Such as mse.

Oh ok thanks . I think i got it. Accuracy is used only in classification to check whether the model classifies it in correct class or not. But in prediction is in real values so accuracy does not make any sense. Now if someone how accurate is my model, should i give them answer in error score?

We use cross-validation or train/test splits to answer the question: “how accurate is my model?”

We use the model to make predictions on new data, to answer the question: “what is the likely class for this observation?”

Does that help?

Thanks, it does help. I think I need to study deeper on this topic. By the way, your contents on this site are great, covering almost all the problems about Deep Learning using python.

Thanks!

Thanks for the interesting post! Is it really required to also do scaling of the ground truth and if yes why?

It can help during training to scale the output for regression. It depends on the model and the problem.

hi Jason,

may I ask you why you did not do a prediction from a saved h5 file?

without it, we need to run the model every time.

to make it simpler to explain?

thanks! Franco

You can. I tried to keep this tutorial simple.

This tutorial will show you how to save/load your model:

https://machinelearningmastery.com/save-load-keras-deep-learning-models/

Are model.predict(…) and model.predict_proba(…) equivalent if activation in last layer is softmax or sigmoid?

No. I believe predict() will perform an argmax on the predict_proba result to give an integer.

Is there any way to get classes with probabilities like:

class 1 : 0.99, class 2 : 0.8 etc something like this. I have a project in which I have to show confidence of every class for an input, how to do that?

Yes, mentioned in the above post:

sir, may i get any help of abstractive document summarization using Keras?

Sure, I have some tutorials here:

https://machinelearningmastery.com/start-here/#nlp

Is there any way so that I can print classes with confidence in this. Like I am working on a project in which for an input I have to print classes with confidence in the output.

Yes, for confidence intervals:

https://machinelearningmastery.com/confidence-intervals-for-machine-learning/

For prediction intervals:

https://machinelearningmastery.com/prediction-intervals-for-machine-learning/

Hi,

Thank you so much for explaining it so well. I have a question. Can you please tell us what changes will be required if we want to do the regression for multiple outputs. For example, if we have features X=[x1,x2,x3] and we want to produce output Y=[y1,y2]. we want to produce multiple outputs against each observation.

[x1,x2,x3]>>>[y1,y2].

Thanks

Sure, for a problem that requires predicting 2 values, you will require 2 nodes in the output layer of your network. That’s it.

hi jason

is there any model for hyperspectral images. or any link which give training model for hyperspectral images..

Perhaps try a google search?

Hi, what is the meaning of X and y?

I assume X being an input, y some output. But what is the meaning of upper/lower case writing?

X is a matrix y is a vector.

Hi Jason

Is there a way to get variable / input importance with keras ?

or How do you find which variables are important while training a neural network.

Not that I’m aware.

Hi Jason,

why did you use scalar = MinMaxScaler()?

To normalize the inputs.

Hi,

I am running a CNN to classify images labeled as high or low. When I run predict_proba(img) after just one epoch and predict the results of a set of images all classified the same, I see a series of values for the images that are all very similar to:

[[ 0.49511209]]

[[ 0.49458334]]

[[ 0.49470016]]

After 50 epochs, the validation accuracy is about 95%, and the output of predict_proba(img) is similar to:

[[ 0.80663812]]

[[ 0.97021145]]

[[ 0.96050763]]

where none of the values are below 0.5

Could you please tell me essentially what I’m seeing? That is, how do I know what class the probability is referring to (“high” or “low”); why is the probability all below 0.5 for minimum training and closer to 1 at 50 epochs; and why does the probability rise almost uniformly with more training?

Thank you for your work and service in freely sharing your knowledge.

It suggests for that image that the model is unsure of what it is and strongly believes it belongs to all 3 classes.

Hi, thank you for this response. Those values of accuracy are for three separate images as examples for training at 1 epoch and 50 epochs, respectively. Knowing that, could you please respond to my original questions? Thank you!

I see, the “low” and “high” are mapped to integers 0 and 1, then the predictions will be between 0 and 1 for class 1.

The mapping of strings to ints will be controlled by you, e.g. labelencoder in sklearn, and you can inspect it to discover whether low is 0 or 1.

Does that help?

That does help, thank you! To be sure, when the probability is 0.97, for instance, the predicted class is 1, but when the probability is 0.494, the predicted class is 0?

Sincerely,

David

Yes.

OK, thank you!

For text classification of binary sentiments, positive and negative.

Why are words not found on the model’s vocabulary always give the same prediction value?

For example, when i input a single word ‘extravagant’, which is not found on the vocabulary of the model, it will give me a value of 0.165 that is classified as negative. The same probability is resulted when i enter new single terms that are not found on the vocabulary. What causes these words to be predicted with the probability of 0.165 or the same prediction value?

Unknown words are mapped a zero or “unknown” input.

Hi! Thank you so much for the response Jason.

I’d just like to ask another question. If unknown outputs are mapped zero/unknown, why is it that the output is not zero as well?

Is it due to the model itself? that it generates the prediction 0.165 based on the values its layers always generate when receiving 0 as input?

I suspect because it is unusual to have a single unknown word as input to the model – e.g. it would not happen in practice.

Thank you so much again Jason! I really appreciate the help.