Regression Tutorial with the Keras Deep Learning Library in Python

Keras is a deep learning library that wraps the efficient numerical libraries Theano and TensorFlow.

In this post you will discover how to develop and evaluate neural network models using Keras for a regression problem.

After completing this step-by-step tutorial, you will know:

  • How to load a CSV dataset and make it available to Keras.
  • How to create a neural network model with Keras for a regression problem.
  • How to use scikit-learn with Keras to evaluate models using cross-validation.
  • How to perform data preparation in order to improve skill with Keras models.
  • How to tune the network topology of models with Keras.

Let’s get started.

  • Update Mar/2017: Updated example for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0.
Regression Tutorial with Keras Deep Learning Library in Python

Regression Tutorial with Keras Deep Learning Library in Python
Photo by Salim Fadhley, some rights reserved.

1. Problem Description

The problem that we will look at in this tutorial is the Boston house price dataset.

You can download this dataset and save it to your current working directly with the file name housing.csv.

The dataset describes 13 numerical properties of houses in Boston suburbs and is concerned with modeling the price of houses in those suburbs in thousands of dollars. As such, this is a regression predictive modeling problem. Input attributes include things like crime rate, proportion of nonretail business acres, chemical concentrations and more.

This is a well-studied problem in machine learning. It is convenient to work with because all of the input and output attributes are numerical and there are 506 instances to work with.

Reasonable performance for models evaluated using Mean Squared Error (MSE) are around 20 in squared thousands of dollars (or $4,500 if you take the square root). This is a nice target to aim for with our neural network model.

Beat the Math/Theory Doldrums and Start using Deep Learning in your own projects Today, without getting lost in “documentation hell”

Deep Learning With Python Mini-CourseGet my free Deep Learning With Python mini course and develop your own deep nets by the time you’ve finished the first PDF with just a few lines of Python.

Daily lessons in your inbox for 14 days, and a DL-With-Python “Cheat Sheet” you can download right now.   

Download Your FREE Mini-Course  


2. Develop a Baseline Neural Network Model

In this section we will create a baseline neural network model for the regression problem.

Let’s start off by including all of the functions and objects we will need for this tutorial.

We can now load our dataset from a file in the local directory.

The dataset is in fact not in CSV format in the UCI Machine Learning Repository, the attributes are instead separated by whitespace. We can load this easily using the pandas library. We can then split the input (X) and output (Y) attributes so that they are easier to model with Keras and scikit-learn.

We can create Keras models and evaluate them with scikit-learn by using handy wrapper objects provided by the Keras library. This is desirable, because scikit-learn excels at evaluating models and will allow us to use powerful data preparation and model evaluation schemes with very few lines of code.

The Keras wrappers require a function as an argument. This function that we must define is responsible for creating the neural network model to be evaluated.

Below we define the function to create the baseline model to be evaluated. It is a simple model that has a single fully connected hidden layer with the same number of neurons as input attributes (13). The network uses good practices such as the rectifier activation function for the hidden layer. No activation function is used for the output layer because it is a regression problem and we are interested in predicting numerical values directly without transform.

The efficient ADAM optimization algorithm is used and a mean squared error loss function is optimized. This will be the same metric that we will use to evaluate the performance of the model. It is a desirable metric because by taking the square root gives us an error value we can directly understand in the context of the problem (thousands of dollars).

The Keras wrapper object for use in scikit-learn as a regression estimator is called KerasRegressor. We create an instance and pass it both the name of the function to create the neural network model as well as some parameters to pass along to the fit() function of the model later, such as the number of epochs and batch size. Both of these are set to sensible defaults.

We also initialize the random number generator with a constant random seed, a process we will repeat for each model evaluated in this tutorial. This is an attempt to ensure we compare models consistently.

The final step is to evaluate this baseline model. We will use 10-fold cross validation to evaluate the model.

Running this code gives us an estimate of the model’s performance on the problem for unseen data. The result reports the mean squared error including the average and standard deviation (average variance) across all 10 folds of the cross validation evaluation.

3. Modeling The Standardized Dataset

An important concern with the Boston house price dataset is that the input attributes all vary in their scales because they measure different quantities.

It is almost always good practice to prepare your data before modeling it using a neural network model.

Continuing on from the above baseline model, we can re-evaluate the same model using a standardized version of the input dataset.

We can use scikit-learn’s Pipeline framework to perform the standardization during the model evaluation process, within each fold of the cross validation. This ensures that there is no data leakage from each testset cross validation fold into the training data.

The code below creates a scikit-learn Pipeline that first standardizes the dataset then creates and evaluate the baseline neural network model.

Running the example provides an improved performance over the baseline model without standardized data, dropping the error.

A further extension of this section would be to similarly apply a rescaling to the output variable such as normalizing it to the range of 0-1 and use a Sigmoid or similar activation function on the output layer to narrow output predictions to the same range.

4. Tune The Neural Network Topology

There are many concerns that can be optimized for a neural network model.

Perhaps the point of biggest leverage is the structure of the network itself, including the number of layers and the number of neurons in each layer.

In this section we will evaluate two additional network topologies in an effort to further improve the performance of the model. We will look at both a deeper and a wider network topology.

4.1. Evaluate a Deeper Network Topology

One way to improve the performance a neural network is to add more layers. This might allow the model to extract and recombine higher order features embedded in the data.

In this section we will evaluate the effect of adding one more hidden layer to the model. This is as easy as defining a new function that will create this deeper model, copied from our baseline model above. We can then insert a new line after the first hidden layer. In this case with about half the number of neurons.

Our network topology now looks like:

We can evaluate this network topology in the same way as above, whilst also using the standardization of the dataset that above was shown to improve performance.

Running this model does show a further improvement in performance from 28 down to 24 thousand squared dollars.

4.2. Evaluate a Wider Network Topology

Another approach to increasing the representational capability of the model is to create a wider network.

In this section we evaluate the effect of keeping a shallow network architecture and nearly doubling the number of neurons in the one hidden layer.

Again, all we need to do is define a new function that creates our neural network model. Here, we have increased the number of neurons in the hidden layer compared to the baseline model from 13 to 20.

Our network topology now looks like:

We can evaluate the wider network topology using the same scheme as above:

Building the model does see a further drop in error to about 21 thousand squared dollars. This is not a bad result for this problem.

It would have been hard to guess that a wider network would outperform a deeper network on this problem. The results demonstrate the importance of empirical testing when it comes to developing neural network models.


In this post you discovered the Keras deep learning library for modeling regression problems.

Through this tutorial you learned how to develop and evaluate neural network models, including:

  • How to load data and develop a baseline model.
  • How to lift performance using data preparation techniques like standardization.
  • How to design and evaluate networks with different varying topologies on a problem.

Do you have any questions about the Keras deep learning library or about this post? Ask your questions in the comments and I will do my best to answer.

Frustrated With Your Progress In Deep Learning?

 What If You Could Develop Your Own Deep Nets in Minutes

...with just a few lines of Python

Discover how in my new Ebook: Deep Learning With Python

It covers self-study tutorials and end-to-end projects on topics like:
Multilayer PerceptronsConvolutional Nets and Recurrent Neural Nets, and more...

Finally Bring Deep Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.

109 Responses to Regression Tutorial with the Keras Deep Learning Library in Python

  1. Gautam Karmakar June 25, 2016 at 4:19 pm #

    Hi did you handle string variables in cross_val_score module?

  2. Paul June 30, 2016 at 2:28 am #

    Hi Jason,

    Great tutorial(s) they have been very helpful as a crash course for me so far.

    Is there a way to have the model output the estimated Ys in this example? I would like to evaluate the model a little more directly while I’m still learning Keras.


    • Jason Brownlee June 30, 2016 at 6:48 am #

      Hi Paul, you can make predictions by calling model.predict()

    • Rahul November 22, 2016 at 7:23 pm #

      Hey Paul,
      How are you inserting the function model.predict() in the above code to run in on test data? Please let me know.

  3. Chris July 23, 2016 at 6:24 am #

    Hi, Great post thank you, Could you please give a sample on how to use Keras LSTM layer for considering time impact on this dataset ?

  4. Marc Huertas-Company July 28, 2016 at 4:50 am #


    Thanks for the tutorial. I have a regression problem with bounded outputs (0-1). Is there an opitmal way to deal with this?

    • Jason Brownlee July 28, 2016 at 5:49 am #

      Hi Marc, I think a linear activation function on the output layer will be just fun.

  5. James August 5, 2016 at 6:50 am #

    This is a good example. However, it is not relevant to Neural networks when over-fitting is considered. The validation process should be included inside the fit() function to monitor over-fitting status. Moreover, early stopping can be used based on the internal validation step. This example is only applicable for large data compared to the number of all weights of input and hidden nodes.

    • Jason Brownlee August 5, 2016 at 8:04 am #

      Great feedback, thanks James I agree.

      It is intended as a good example to show how to develop a net for regression, but the dataset is indeed a bit small.

      • Amir October 24, 2016 at 11:08 am #

        Thanks Jason and James! A few questions (and also how to implement in python):
        1) How can we monitor the over-fitting status in deep learning
        2) how can we include the cross-validation process inside the fit() function to monitor the over-fitting status
        3) How can we use early stopping based on the internal validation step
        4) Why is this example only applicable for a large data set? What should we do if the data set is small?

        • Jason Brownlee October 25, 2016 at 8:21 am #

          Great questions Amir!

          1. Monitor the performance of the model on the training and a standalone validation dataset. (even plot these learning curves). When skill on the validation set goes down and skill on training goes up or keeps going up, you are overlearning.
          2. Cross validation is just a method for estimating the performance of a model on unseen data. It wraps everything you are doing to prepare data and your model, it does not go inside fit.
          3. Monitor skill on a validation dataset as in 1, when skill stops improving on the validation set, stop training.
          4. Generally, neural nets need a lot more data to train than other methods.

          Here’s a tutorial on checkpointing that you can use to save “early stopped” models:

  6. Salem August 5, 2016 at 2:44 pm #

    How once can predict new data point on a model while during building the model the training data has been standardised using sklearn.

    • Jason Brownlee August 6, 2016 at 2:09 pm #

      You can save the object you used to standardize the data and later reuse it to standardize new data before making a prediction. This might be the MinMaxScaler for example.

  7. Guy August 25, 2016 at 10:52 am #


    I am not using the automatic data normalization as you show, but simply compute the mean and stdev for each feature (data column) in my training data and manually perform zscore ((data – mean) / stdev). By normalization I mean bringing the data to 0-mean, 1-stdev. I know there are several names for this process but let’s call it “normalization” for the sake of this argument.

    So I’ve got 2 questions:

    1) Should I also normalize the output column? Or just leave it as it is in my train/test?

    2) I take the mean, stdev for my training data and use them to normalize the test data. But it seems that doesn’t center my data; no matter how I split the data, and no matter that each mini-batch is balanced (has the same distribution of output values). What am I missing / what can I do?

    • Jason Brownlee August 26, 2016 at 10:30 am #

      Hi Guy, yeah this is normally called standardization.

      Generally, you can get good results from applying the same transform to the output column. Try and see how it affects your results. If MSE or RMSE is the performance measure, you may need to be careful with the interpretation of the results as the scale of these scores will also change.

      Yep, this is a common problem. Ideally, you want a very large training dataset to effectively estimate these values. You could try using bootstrap on the training dataset (or within a fold of cross validation) to create a more robust estimate of these terms. Bootstrap is just the repeated subsampling of your dataset and estimation of the statistical quantities, then take the mean from all the estimates. It works quite well.

      I hope that helps.

  8. Pranith Kumar Pola September 2, 2016 at 3:52 am #

    Hello Jason,

    How should i load multiple finger print images into keras.

    Can you please advise further.

    Best Regards,

  9. Luciano September 10, 2016 at 3:32 am #

    Hi Jason, great tutorial. The best out there for free.

    Can I use R² as my metric? If so, how?


  10. sumon October 1, 2016 at 2:38 am #

    shouldn’t results.mean() print accuracy instead of error?

    • Jason Brownlee October 1, 2016 at 8:03 am #

      We summarize error for regression problems instead of accuracy (x/y correct). I hope that helps.

  11. David October 19, 2016 at 7:34 pm #


    if I have a new dataset, X_new, and I want to make a prediction, the model.predict(X_new) shows the error ”NameError: name model is not defined’ and estimator.predict(X_test) shows the error message ‘KerasRegressor object has no attribute model’.

    Do you have any suggestion? Thanks.

  12. Avhirup October 22, 2016 at 11:19 pm #

    I’m getting more error by standardizing dataset using the same seed.What must be the reason behind it?

    • Avhirup October 22, 2016 at 11:25 pm #

      also deeper network topology seems not to help .It increases the MSE

      • Avhirup October 22, 2016 at 11:32 pm #

        deeper network without standardisation gives better results.Somehow standardisation is adding more noise

  13. Michele Vascellari November 2, 2016 at 9:28 pm #

    Hey great tutorial. I tried to use both Theano and Tensorflow backend, but I obtained very different results for the larger_model. With Theano I obtained results very similar to you, but with Tensorflow I have MSE larger than 100.

    Do you have any clue?


    • Jason Brownlee November 3, 2016 at 7:59 am #

      Great question Michele,

      Off the cuff, I would think it is probably the reproducibility problems we are seeing with Python deep learning stack. It seems near impossible to tie down the random number generators used to get repeatable results.

      I would not rule out a bug in one implementation or another, but I would find this very surprising for such a simple network.

  14. Kenny November 7, 2016 at 4:21 pm #

    hi, i have a question about sklearn interface.
    although we sent the NN model to sklearn and evaluate the regression performance, how can we get the exactly predictions of the input data X, like usually when we r using Keras we can call the model.predict(X) function in keras. btw, I mean the model is in sklearn right?

    • Jason Brownlee November 8, 2016 at 9:49 am #

      Hi Kenny,

      You can use the sklearn model.predict() function in the same way to make predictions on new input data.

      • Silvan Mühlemann November 23, 2016 at 6:48 am #

        Hi Jason

        I bought the book “Deep Learning with Python”. Thanks for your great work!

        I see the question about “model.predict()” quite often. I have it as well. In the code above “model” is undefined. So what variable contains the trained model? I tried “estimator.predict()” but there I get the following error:

        > ‘KerasRegressor’ object has no attribute ‘model’

        I think it would help many readers

        • Jason Brownlee November 23, 2016 at 9:06 am #

          Thanks for your support Silvan.

          With a keras model, you can train the model, assign it to a variable and call model.predict(). See this post:

          In the above example, we use a pipeline, which is also a sklearn Estimator. We can call estimator.predict() directly (same function name, different API), more here:

          Does that help?

          • Dee November 24, 2016 at 9:18 am #

            Hey Jason,

            Is there anyway for you to provide a direct example of using the model.predict() for the example shown in this post? I’ve been following your posts for a couple months now and have gotten much more comfortable with Keras. However, I still cannot seem to be able to use .predict() on this example.


          • Jason Brownlee November 24, 2016 at 10:44 am #

            Hi Dee,

            There info on the predict function here:

            There’s an example of calling predict in this post:

            Does that help?

          • Silvan Mühlemann November 25, 2016 at 7:20 am #

            Hi Dee

            Jason, correct me if I am wrong: If I understand correctly the sample above does *not* provide a trained model as output. So you won’t be able to use the .predict() function immediately.

            Instead you have to train the pipeline:


            Then only you can do predictions:

            pipeline.predict(numpy.array([[ 0.0273, 0. , 7.07 , 0. , 0.469 , 6.421 ,
            78.9 , 4.9671, 2. , 242. , 17.8 , 396.9 ,
            9.14 ]]))

            # will return array(22.125564575195312, dtype=float32)

          • Jason Brownlee November 25, 2016 at 9:34 am #

            Yes, thanks for the correction.

            Sorry, for the confusion.

          • Dee November 28, 2016 at 12:07 pm #

            Hey Silvan,

            Thanks for the tip! I had a feeling that the crossval from SciKit did not output the fitted model but just the RMSE or MSE of the crossval cost function.

            I’ll give it a go with the .fit()!


        • Sud March 17, 2017 at 1:03 am #

          Hi Jason & Silvan,

          Could you pls tell me whether I am given “,Y)” in correct position?
          pls correct me if I am wrong.

          estimators = []
          estimators.append((‘standardize’, StandardScaler()))
          estimators.append((‘mlp’, KerasRegressor(build_fn=larger_model, nb_epoch=50, batch_size=5, verbose=0)))
          pipeline = Pipeline(estimators)
          kfold = KFold(n_splits=10, random_state=seed)
          results = cross_val_score(pipeline, X, Y, cv=kfold)
          print(“Larger: %.2f (%.2f) MSE” % (results.mean(), results.std()))

          Thank you!

          • Jason Brownlee March 17, 2017 at 8:29 am #

   is not needed as you are evaluating the pipeline using kfold cross validation.

  15. Rahul November 18, 2016 at 3:28 pm #

    Dear Jason,
    I have a few questions. I am running the wider neural network on a dataset that corresponds to modelling with better accuracy the number of people walking in and out of a store. I get Wider: 24.73 (7.64) MSE. <– Can you explain exactly what those values mean?

    Also can you suggest any other method of improving the neural network? Do I have to keep re-iterating and tuning according to different topological methods?

    Also what exact function do you use to predict the new data with no ground truth? Is it the sklearn model.predict(X) where X is the new dataset with one lesser dimension because there is no output? Could you please elaborate and explain in detail. I would be really grateful to you.

    Thank you

    • Jason Brownlee November 19, 2016 at 8:45 am #

      Hi Rahul,

      The model reports on Mean Squared Error (MSE). It reports both the mean and the standard deviation of performance across 10 cross validation folds. This gives an idea of the expected spread in the performance results on new data.

      I would suggest trying different network configurations until you find a setup that performs well on your problem. There are no good rules for net configuration.

      You can use model.predict() to make new predictions. You are correct.

  16. Kim December 31, 2016 at 5:45 pm #

    Hi Jason,

    Thank you for the great tutorial.
    I redo the code on a Ubuntu machine and run them on TITAN X GPU. While I get similar results for experiment in section 4.1, my results in section 4.2 is different from yours:

    Larger: 103.31 (236.28) MSE

    no_epoch is 50 and batch_size is 5.

  17. A. Batuhan D. January 20, 2017 at 8:43 pm #

    Hi Jason,

    Thanks for sharing these useful tutorials. Two questions:

    1) If regression model calculates the error and returns as result (no doubt for this) then what is those ‘accuracy’ values printed for each epoch when ‘verbose=1’?

    2) With those predicted values (fit.predict() or cross_val_predict), is it meaningful to find the closest value(s) to predicted result and calculate an accuracy? (This way, more than one accuracy can be calculated: accuracy for closest 2, closest 3, …)

    • Jason Brownlee January 21, 2017 at 10:28 am #

      Hi A. Batuhan D.,

      1. You cannot print accuracy for a regression problem, it does not make sense. It would be loss or error.
      2. Again, accuracy does not make sense for regression. It sounds like you are describing an instance based regression model like kNN?

      • A. Batuhan D. January 23, 2017 at 7:36 pm #

        Hi jason,

        1. I know, it doesn’t make any sense to calculate accuracy for a regression problem but when using Keras library and set verbose=1, function prints accuracy values also alongside with loss values. I’d like to ask the reason of this situation. It is confusing. In your example, verbose parameter is set to 0.

        2. What i do is to calculate some vectors. As input, i’m using vectors (say embedded word vectors of a phrase) and trying to calculate a vector (next word prediction) as an output (may not belong to any known vector in dictionary and probably not). Afterwards, i’m searching the closest vector in dictionary to one calculated by network by cosine distance approach. Counting model predicted vectors who are most similar to the true words vector (say next words vector) than others in dictionary may lead to a reasonable accuracy in my opinion. That’s a brief summary of what i do. I think that it is not related to instance based regression models.


        • Jason Brownlee January 24, 2017 at 11:03 am #

          That is very odd that accuracy is printed for a regression problem. I have not seen it, perhaps it’s a new bug in Keras?

          Are you able to paste a short code + output example?

  18. Partha January 24, 2017 at 7:08 am #

    I tried this tutorial – but it crashes with the following:
    Traceback (most recent call last):
    File “”, line 132, in
    results = cross_val_score(estimator, X, Y, cv=kfold)
    File “C:\Python27\lib\site-packages\sklearn\model_selection\”, line 140, in cross_val_score
    for train, test in cv_iter)
    File “C:\Python27\lib\site-packages\sklearn\externals\joblib\”, line 758, in __call__
    while self.dispatch_one_batch(iterator):
    File “C:\Python27\lib\site-packages\sklearn\externals\joblib\”, line 603, in dispatch_one_batch
    tasks = BatchedCalls(itertools.islice(iterator, batch_size))
    File “C:\Python27\lib\site-packages\sklearn\externals\joblib\”, line 127, in __init__
    self.items = list(iterator_slice)
    File “C:\Python27\lib\site-packages\sklearn\model_selection\”, line 140, in
    for train, test in cv_iter)
    File “C:\Python27\lib\site-packages\sklearn\”, line 67, in clone
    new_object_params = estimator.get_params(deep=False)
    TypeError: get_params() got an unexpected keyword argument ‘deep’

    Some one else also got this same error and posted a question on StackOverflow.

    Any help is appreciated.

    • Jason Brownlee January 24, 2017 at 11:07 am #

      Sorry to hear that.

      What versions of sklearn, Keras and tensorflow or theano are you using?

      • David January 25, 2017 at 12:23 am #

        I have the same problem after an update to Keras 1.2.1. In my case: theano is 0.8.2 and sklearn is 0.18.1.

        I could be wrong, but this could be a problem with the latest version of Keras…

        • David January 25, 2017 at 3:01 am #

          Ok, I think I have managed to solve the issues. I think the problem are crashess between different version of the packages. What it solves everything is to create an evironment. I have posted in stack overflow a solution, @Partha, here:

          • Partha January 25, 2017 at 4:31 am #

            My versions are 0.8.2 for theano and 0.18.1 for sklearn and 1.2.1 for keras.
            I did a new anaconda installation on another machine and it worked there.

          • Jason Brownlee January 25, 2017 at 10:08 am #

            Thanks David, I’ll take a look at the post.

          • Jason Brownlee January 25, 2017 at 10:58 am #

            Hi David, I have reproduced the fault and understand the cause.

            The error is caused by a bug in Keras 1.2.1 and I have two candidate fixes for the issue.

            I have written up the problem and fixes here:

        • Jason Brownlee January 25, 2017 at 10:06 am #

          Thanks, I will investigate and attempt to reproduce.

          • David January 25, 2017 at 8:51 pm #

            yes, Jason’s solution is the correct one. My solution works because in the environment the Keras version installed is 1.1.1, not the one with the bug (1.2.1).

  19. Andy January 25, 2017 at 5:05 am #

    Great tutorial, many thanks!

    Just wondering how do you train on a standardaised dataset (as per section 3), but produce actual (i.e. NOT standardised) predictions with scikit-learn Pipeline?

    • Jason Brownlee January 25, 2017 at 10:10 am #

      Great question Andy,

      The standardization occurs within the pipeline which can invert the transforms as needed. This is one of the benefits of using the sklearn Pipeline.

  20. AndyS January 25, 2017 at 7:32 am #

    Great tutorial, many thanks!

    How do I recover actual predictions (NOT standardized ones) having fit the pipeline in section 3 with,Y)? I believe pipeline.predict(testX) yields a standardised predictedY?

    I see there is an inverse_transform method for Pipeline, however appears to be for only reverting a transformed X.

  21. James Bond January 26, 2017 at 1:39 am #

    Thanks for you post..

    I am currently having some problems with an regression problem, as such you represent here.

    you seem to both normal both input and output, but what do you do if if the output should be used by a different component?… unnormalize it? and if so, wouldn’t the error scale up as well?

    I am currently working on mapping framed audio to MFCC features.
    I tried a lot of different network structures.. cnn, multiple layers..

    I just recently tried adding a linear layer at the end… and wauw.. what an effect.. it keeps declining.. how come?.. do you have any idea?

    • Jason Brownlee January 26, 2017 at 4:47 am #

      Hi James, yes the output must be denormalized (invert any data prep process) before use.

      If the data prep processes are separate, you can keep track of the Python object (or coefficients) and invert the process ad hoc on predictions.

  22. Sarick January 27, 2017 at 6:59 pm #

    Is there any way to use pipeline but still be able to graph MSE over epochs for kerasregressor?

    • Jason Brownlee January 28, 2017 at 7:35 am #

      Not that I have seen Sarick. If you figure a way, let me know.

  23. Aritra January 28, 2017 at 9:33 pm #

    Can you tell me how to do regression with convolutional neural network?

    • Jason Brownlee February 1, 2017 at 10:09 am #

      Great question Aritra.

      You can use the standard CNN structure and modify the example to use a linear output function and a suitable regression loss function.

  24. kono January 29, 2017 at 4:37 pm #

    Hi Jason,

    Could you tell me how to decide batch_size? Is there a rule of thumb for this?

  25. kono January 29, 2017 at 4:53 pm #

    Hi Jason,

    I see some people use fit_generator to train a MLP. Could you tell me when to use fit_generator() and when to use fit()?

  26. Pratik Patil February 2, 2017 at 12:39 am #

    Hi Jason,
    Thank you for the post. I used two of your post this and one on GridSearchCV to get a keras regression workflow with Pipeline.
    My question is how to get weight matrices and bias vectors of keras regressor in a fit, that is on the pipeline.
    (My posts keep getting rejected/disappear, am I breaking some protocol/rule of the site?)

    • Jason Brownlee February 2, 2017 at 1:59 pm #

      Comments are moderated, that is why you do not seem the immediately.

      To access the weights, I would recommend training a standalone Keras model rather than using the KerasClassifier and sklearn Pipeline.

  27. Pedro February 18, 2017 at 7:57 am #


    Thank you for the excelent example! as a beginner, it was the best to start with.
    But I have some questions:

    In the wider topology, what does it mean to have more neurons?

    e.g., in my input layer I “receive” 150 dimensions/features (input_dim) and output 250 dimensions (output_dim). What is in those 100 “extra” neurons (that are propagated to the next hidden layers) ?


    • Jason Brownlee February 18, 2017 at 8:47 am #

      Hi Pedro,

      A neuron is a single learning unit. A layer is comprised of neurons.

      The size of the input layer must match the number of input variables. The size of the output layer must match the number of output variables or output classes in the case of classification.

      The number of hidden layers can vary and the number of neurons per hidden layer can vary. This is the art of configuring a neural net for a given problem.

      Does that help?

      • Pedro Fialho February 20, 2017 at 6:27 am #


        In your wider example, the input layer does not match/output the number of input variables/features:

        model.add(Dense(20, input_dim=13, init=’normal’, activation=’relu’))

        so my question is: apart from the 13 input features, what’s in the 7 neurons, output by this (input) layer?

        • Jason Brownlee February 20, 2017 at 9:33 am #

          Hi Pedro, I’m not sure I understand, sorry.

          The example takes as input 13 features. The input layer (input_dim) expects 13 input values. The first hidden layer combines these weighted inputs 20 times or 20 different ways (20 neurons in the layer) and each neuron outputs one value. These are combined into one neuron (poor guy!) which outputs a prediction.

          • Pedro Fialho February 21, 2017 at 9:14 pm #


            Yes, now I understand (I was not confident that the input layer was also an hidden layer). Thank you again

          • Jason Brownlee February 22, 2017 at 10:00 am #

            The input layer is separate from the first hidden layer. The Keras API makes this confusing because both are specified on the same line.

  28. Bartosz February 19, 2017 at 11:42 am #

    Hi Jason,

    You’ve said that an activation function is not necessary as we want a numerical value as an output of our network. I’ve been looking at recurrent network and in particular this guide: . It recommended using an identity activation function at the output. I was wondering is there any difference between your approach: using Dense(1) as the output layer, and adding an identity activation function at the output of the network: Activation(‘linear’) ? are there any situations when I should use the identity activation layer? Could you elaborate on this?

    In case of this tutorial the network would look like this with the identity function:
    model = Sequential()
    model.add(Dense(13, input_dim=13, init=’normal’, activation=’relu’))
    model.add(Dense(6, init=’normal’, activation=’relu’))
    model.add(Dense(1, init=’normal’))


    • Jason Brownlee February 20, 2017 at 9:25 am #

      Indeed, the example uses a linear activation function by default.

  29. Dan March 18, 2017 at 7:23 am #

    Hi Jason,
    my current understanding is that we want to fit + transform the scaling only on our training set and transform without fit on the testset. In case we use the pipeline in the cv like you did. Do we ensure that for each cv the scaling fit only takes place for the 9 training sets and the transform without the fit on the test set?

    Thanks very much

    • Jason Brownlee March 18, 2017 at 7:55 am #

      Top question.

      The Pipeline does this for us. It is fit then applied to the training set each CV fold, then the fit transforms are applied to the test set to evaluate the model on the fold. It’s a great automatic pattern built into sklearn.

  30. Paula March 21, 2017 at 11:46 pm #

    Hi! I ran your code with your data and we got a different MSE. Should I be concerned? Thanks for help!

  31. Annanya March 29, 2017 at 4:23 am #

    Hi Jason

    while running this above code i found the error as

    Y = dataset[:,25]
    IndexError: index 25 is out of bounds for axis 1 with size 1

    i had declared X and Y as

    X = dataset[:,0:25]
    Y = dataset[:,25]

    help me for solving this

  32. Sagar March 29, 2017 at 10:54 am #

    Hi Jason, Thanks for your great article !

    I am working with same problem [No of samples: 460000 , No of Features:8 ] but my target column output has too big values like in between 20000 to 90000 !

    I tried different NN architecture [ larger to small ] with different batch size and epoch but still not getting good accuracy !

    should i have to normalize my target column ? Please help me for solving this !

    Thanks for your time !

    • Jason Brownlee March 30, 2017 at 8:45 am #

      Yes, you must rescale your input and output data.

      • Sagar March 31, 2017 at 4:22 pm #

        Hi Jason, Thanks for your reply !

        Yes i tried different ways to rescale my data using

        url but i still i only got 20% accuracy !

        I tried different NN topology with different batch size and epoch but not getting good results !

        My code :

        inputFilePath = “path-to-input-file”

        dataframe = pandas.read_csv(inputFilePath, sep=”\t”, header=None)
        dataset = dataframe._values

        # split into input (X) and output (Y) variables
        X = dataset[:,0:8]
        Y = dataset[:,8]

        scaler = StandardScaler().fit(X)
        X = scaler.fit_transform(X)

        maxnumber = max(Y) #Max number i got is : 79882.0

        Y=Y / maxnumber

        # create model
        model = Sequential()
        model.add(Dense(100, input_dim=8, init=’normal’, activation=’relu’))
        model.add(Dense(100, init=’normal’, activation=’relu’))
        model.add(Dense(80, init=’normal’, activation=’relu’))
        model.add(Dense(40, init=’normal’, activation=’relu’))
        model.add(Dense(20, init=’normal’, activation=’relu’))
        model.add(Dense(8, init=’normal’, activation=’relu’))
        model.add(Dense(6, init=’normal’, activation=’relu’))
        model.add(Dense(6, init=’normal’, activation=’relu’))
        model.add(Dense(1, init=’normal’,activation=’relu’))

        model.compile(loss=’mean_absolute_error’, optimizer=’adam’, metrics=[‘accuracy’])
        # checkpoint, Y,nb_epoch=100, batch_size=400)
        # 4. evaluate the network
        loss, accuracy = model.evaluate(X, Y)
        print(“\nLoss: %.2f, Accuracy: %.2f%%” % (loss, accuracy*100))

        I tried MSE and MAE in loss with adam and rmsprop optimizer but still not getting accuracy !

        Please help me ! Thanks

        • Jason Brownlee April 1, 2017 at 5:51 am #

          100 epochs will not be enough for such a deep network. It might need millions.

          • sagar April 6, 2017 at 11:29 pm #

            Hello Jason, Thanks for your reply !

            How can i ensure that i will get output after millions of epoch because after 10000 epoch accuracy is still 0.2378 !

            How can i dynamically decide the number of layers and Neurons size in my neural network ? Is there any way ?

            I already used neural network checkpoint mechanism to ensure its accuracy on validation spilt !
            My code looks like

            model.compile(loss=’mean_absolute_error’, optimizer=’adam’, metrics=[‘accuracy’])

            checkpoint = ModelCheckpoint(save_file_path, monitor=’val_acc’, verbose=1, save_best_only=True, mode=’max’)

            callbacks_list = [checkpoint]

  , Y_Output_Vector,validation_split=0.33, nb_epoch=1000000, batch_size=1300, callbacks=callbacks_list, verbose=0)

            Let me know if i miss something !

          • Jason Brownlee April 9, 2017 at 2:43 pm #

            Looks good.

            There are neural net growing and pruning algorithms but I do not have tutorials sorry.

            See the book: Neural Smithing

  33. Charlotte March 30, 2017 at 8:58 am #

    Hi Jason,

    Thanks for this great tutorial.

    I do believe that there is a small mistake, when giving as parameters the number of epochs, the documentations shows that it should be given as:
    estimator = KerasRegressor(build_fn=baseline_model, epochs=100, batch_size=5, verbose=0).

    When giving:
    estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=0)
    the function doesn’t recognise the argument and just ignore it.

    Can you confirm?

    I’m using your ‘How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras’ tutorial and have trouble tuning the number of epochs. If I checked one of the results of the GridSearchCv with a simple cross validation with the same number of folds I don’t obtain the same results at all. There might be a similar mistake there?

    Thank your for your time!

    • Jason Brownlee March 30, 2017 at 9:01 am #

      You can pass through any parameters you wish:

      You will get different results on each run because neural network behavior is stochastic. this post will help:

      • Charlotte March 30, 2017 at 9:14 am # precises that number of epochs should be given as epochs=n and not nb_epoch=n. When giving the latter, the function will ignore the argument. As an example:

        estimators = []
        estimators.append((‘standardize’, StandardScaler()))
        estimators.append((‘mlp’, KerasRegressor(build_fn=baseline_model, nb_epoch=’hi’, batch_size=50, verbose=0)))
        pipeline = Pipeline(estimators)
        kfold = KFold(n_splits=10, random_state=seed)
        results = cross_val_score(pipeline, X1, Y, cv=kfold)
        print(“Standardized: %.5f (%.2f) MSE” % (results.mean(), results.std()))

        will not raise any error.

        Am I missing something?

        The results I get are strongly different and I don’t think that this can be due to the stochasticity of the NN behaviour.

        • Jason Brownlee March 31, 2017 at 5:49 am #

          Thanks Charlotte, that looks like a recent change for Keras 2.0. I will update the examples soon.

        • Caleb Everett April 25, 2017 at 7:50 am #

          Thank you!

  34. Jens April 16, 2017 at 7:59 am #

    Hey Jason,

    I tried the first part and got a different result for the baseline.
    I figured that the
    estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=0)

    is not working as expected for me as it takes the default epoch of 10. When I change it to epochs=100 it works.

    I just read the above comment, it seems like they changed that in the API

  35. Martin April 19, 2017 at 11:35 pm #

    Hi Jason,

    how can i get regression coefficients?

  36. Luca April 27, 2017 at 12:34 am #

    Dear Jason,

    Thanks for your tutorials!!
    I made it work in a particle physics example I’m working on, and I have 2 questions.
    1) Imagine my target is T=a/b (T=true_value/reco_value). If I give to the regression both “a” and “b” as features, then it should be able to find exactly the correct solution every time, right? Or there is some procedure that try to avoid overtraining, and do not allow to give a results precise at 100%? I ask because I tried, and I got “good” performances, not optimal as I would expect (if it has “a” and “b” it should be able to find the correct T in the test too at 100% ). If I remove b from the regression, and I add other features, then y_hat/y_test is peaking at 0.75, meaning the the regression is biassed. Could you help me understanding these two facts?
    2) I want to save the regression in order to use it later. After the training I do: a) estimator.model.save_weights and b) open(‘models/’+model_name, ‘w’).write(estimator.model.to_json()).
    Estimator is “estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=50, verbose=1)”. How can I later use those 2 files to directly make predictions?

    Thanks a lot,

    • Jason Brownlee April 27, 2017 at 8:42 am #

      Sorry, I’m not sure I follow your first question, perhaps you can restate it briefly?

      See this post on saving and loading keras models:

      • Luca April 28, 2017 at 1:30 am #

        Hi Jason,
        my point is the following. The regression is trained on a set of features (a set of floats), and it provides a single output (a float), the target. During the training the regression learn how to guess the target as a function of the features.
        Of course the target should not be function of the features, otherwise the problem is trivial, but I tried to test this scenario as an initial check. What I did (as a test) is to define a target that is division of 2 features, i.e. I’m giving to the regression “a” and “b”, and I’m saying that the target to find is a/b. In that simple case, the regression should be smart enough to understand during the training that my target is simply a/b. So in the test it should be able to find the correct value with 100% precision, i.e. dividing the 2 features. What I found is that in the test the regression find a value (y_hat) that is close to a/b, but not exactly a/b. So I was wondering why the regression is behaving like that.


        • Jason Brownlee April 28, 2017 at 7:49 am #

          This is a great question.

          At best machine learning can approximate a function, some approximations are better than others.

          That is the best that I can answer it.

  37. Ignacio April 27, 2017 at 12:36 am #

    Hi Jason,

    thanks for your posts, I really enjoy them. I have a quick question: If I want to use sklearn’s GridSearchCV and :
    in my model, will the highest score correspond to the combination with the *highest* mse?
    If that’s the case I assume there is a way to invert the scoring in GridSearchCV?

    • Jason Brownlee April 27, 2017 at 8:43 am #

      When using MSE you will want to find the config that results in the lowest error, e.g. lowest mean squared error.

Leave a Reply