Regression Tutorial with the Keras Deep Learning Library in Python

Last Updated on August 27, 2020

Keras is a deep learning library that wraps the efficient numerical libraries Theano and TensorFlow.

In this post you will discover how to develop and evaluate neural network models using Keras for a regression problem.

After completing this step-by-step tutorial, you will know:

  • How to load a CSV dataset and make it available to Keras.
  • How to create a neural network model with Keras for a regression problem.
  • How to use scikit-learn with Keras to evaluate models using cross-validation.
  • How to perform data preparation in order to improve skill with Keras models.
  • How to tune the network topology of models with Keras.

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Update Mar/2017: Updated for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0.
  • Update Mar/2018: Added alternate link to download the dataset as the original appears to have been taken down.
  • Update Apr/2018: Changed nb_epoch argument to epochs.
  • Update Sep/2019: Updated for Keras 2.2.5 API.
Regression Tutorial with Keras Deep Learning Library in Python

Regression Tutorial with Keras Deep Learning Library in Python
Photo by Salim Fadhley, some rights reserved.

1. Problem Description

The problem that we will look at in this tutorial is the Boston house price dataset.

You can download this dataset and save it to your current working directly with the file name housing.csv (update: download data from here).

The dataset describes 13 numerical properties of houses in Boston suburbs and is concerned with modeling the price of houses in those suburbs in thousands of dollars. As such, this is a regression predictive modeling problem. Input attributes include things like crime rate, proportion of nonretail business acres, chemical concentrations and more.

This is a well-studied problem in machine learning. It is convenient to work with because all of the input and output attributes are numerical and there are 506 instances to work with.

Reasonable performance for models evaluated using Mean Squared Error (MSE) are around 20 in squared thousands of dollars (or $4,500 if you take the square root). This is a nice target to aim for with our neural network model.

Need help with Deep Learning in Python?

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

2. Develop a Baseline Neural Network Model

In this section we will create a baseline neural network model for the regression problem.

Let’s start off by including all of the functions and objects we will need for this tutorial.

We can now load our dataset from a file in the local directory.

The dataset is in fact not in CSV format in the UCI Machine Learning Repository, the attributes are instead separated by whitespace. We can load this easily using the pandas library. We can then split the input (X) and output (Y) attributes so that they are easier to model with Keras and scikit-learn.

We can create Keras models and evaluate them with scikit-learn by using handy wrapper objects provided by the Keras library. This is desirable, because scikit-learn excels at evaluating models and will allow us to use powerful data preparation and model evaluation schemes with very few lines of code.

The Keras wrappers require a function as an argument. This function that we must define is responsible for creating the neural network model to be evaluated.

Below we define the function to create the baseline model to be evaluated. It is a simple model that has a single fully connected hidden layer with the same number of neurons as input attributes (13). The network uses good practices such as the rectifier activation function for the hidden layer. No activation function is used for the output layer because it is a regression problem and we are interested in predicting numerical values directly without transform.

The efficient ADAM optimization algorithm is used and a mean squared error loss function is optimized. This will be the same metric that we will use to evaluate the performance of the model. It is a desirable metric because by taking the square root gives us an error value we can directly understand in the context of the problem (thousands of dollars).

If you are new to Keras or deep learning, see this Keras tutorial.

The Keras wrapper object for use in scikit-learn as a regression estimator is called KerasRegressor. We create an instance and pass it both the name of the function to create the neural network model as well as some parameters to pass along to the fit() function of the model later, such as the number of epochs and batch size. Both of these are set to sensible defaults.

The final step is to evaluate this baseline model. We will use 10-fold cross validation to evaluate the model.

Tying this all together, the complete example is listed below.

Running this code gives us an estimate of the model’s performance on the problem for unseen data.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Note: The mean squared error is negative because scikit-learn inverts so that the metric is maximized instead of minimized. You can ignore the sign of the result.

The result reports the mean squared error including the average and standard deviation (average variance) across all 10 folds of the cross validation evaluation.

3. Modeling The Standardized Dataset

An important concern with the Boston house price dataset is that the input attributes all vary in their scales because they measure different quantities.

It is almost always good practice to prepare your data before modeling it using a neural network model.

Continuing on from the above baseline model, we can re-evaluate the same model using a standardized version of the input dataset.

We can use scikit-learn’s Pipeline framework to perform the standardization during the model evaluation process, within each fold of the cross validation. This ensures that there is no data leakage from each testset cross validation fold into the training data.

The code below creates a scikit-learn Pipeline that first standardizes the dataset then creates and evaluate the baseline neural network model.

Tying this together, the complete example is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example provides an improved performance over the baseline model without standardized data, dropping the error.

A further extension of this section would be to similarly apply a rescaling to the output variable such as normalizing it to the range of 0-1 and use a Sigmoid or similar activation function on the output layer to narrow output predictions to the same range.

4. Tune The Neural Network Topology

There are many concerns that can be optimized for a neural network model.

Perhaps the point of biggest leverage is the structure of the network itself, including the number of layers and the number of neurons in each layer.

In this section we will evaluate two additional network topologies in an effort to further improve the performance of the model. We will look at both a deeper and a wider network topology.

4.1. Evaluate a Deeper Network Topology

One way to improve the performance a neural network is to add more layers. This might allow the model to extract and recombine higher order features embedded in the data.

In this section we will evaluate the effect of adding one more hidden layer to the model. This is as easy as defining a new function that will create this deeper model, copied from our baseline model above. We can then insert a new line after the first hidden layer. In this case with about half the number of neurons.

Our network topology now looks like:

We can evaluate this network topology in the same way as above, whilst also using the standardization of the dataset that above was shown to improve performance.

Tying this together, the complete example is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this model does show a further improvement in performance from 28 down to 24 thousand squared dollars.

4.2. Evaluate a Wider Network Topology

Another approach to increasing the representational capability of the model is to create a wider network.

In this section we evaluate the effect of keeping a shallow network architecture and nearly doubling the number of neurons in the one hidden layer.

Again, all we need to do is define a new function that creates our neural network model. Here, we have increased the number of neurons in the hidden layer compared to the baseline model from 13 to 20.

Our network topology now looks like:

We can evaluate the wider network topology using the same scheme as above:

Tying this together, the complete example is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Building the model does see a further drop in error to about 21 thousand squared dollars. This is not a bad result for this problem.

It would have been hard to guess that a wider network would outperform a deeper network on this problem. The results demonstrate the importance of empirical testing when it comes to developing neural network models.


In this post you discovered the Keras deep learning library for modeling regression problems.

Through this tutorial you learned how to develop and evaluate neural network models, including:

  • How to load data and develop a baseline model.
  • How to lift performance using data preparation techniques like standardization.
  • How to design and evaluate networks with different varying topologies on a problem.

Do you have any questions about the Keras deep learning library or about this post? Ask your questions in the comments and I will do my best to answer.

Develop Deep Learning Projects with Python!

Deep Learning with Python

 What If You Could Develop A Network in Minutes

...with just a few lines of Python

Discover how in my new Ebook:
Deep Learning With Python

It covers end-to-end projects on topics like:
Multilayer PerceptronsConvolutional Nets and Recurrent Neural Nets, and more...

Finally Bring Deep Learning To
Your Own Projects

Skip the Academics. Just Results.

See What's Inside

675 Responses to Regression Tutorial with the Keras Deep Learning Library in Python

  1. Gautam Karmakar June 25, 2016 at 4:19 pm #

    Hi did you handle string variables in cross_val_score module?

    • Jason Brownlee June 26, 2016 at 6:00 am #

      The dataset is numeric, no string values.

      • Ramya December 9, 2017 at 2:34 am #

        How do we handle string values

      • Abhishek Rudra Pal April 18, 2019 at 7:25 am #

        I have
        2 input set (that means 2 columns) instead of 13 of this problem
        8 output( 8 columns)instead of 1 of this problem
        192 training set instead of 506 of this problem
        so multi-input multi-output prediction modeling
        will this code sufficient or do I have to change anything?
        is this deep learning because I heard for deep learning it requires thousand of the training set
        forgive me I don’t know anything about deep learning and with this code I am gonna start
        I am waiting for your reply

        • Jason Brownlee April 18, 2019 at 8:57 am #

          If you are predicting 8 real-valued variables (not 8 classes), you can change the number of nodes in the output layer to 8.

          • Abhishek Rudra Pal April 18, 2019 at 2:28 pm #

            Thank you for your quick response
            so, i have to change only the output layer no
            Now, i have few more question
            If i am able to get the results using this code i have to know some details

            1)I suppose it is the latest deep neural network.What is the name of this neural network? (e.g. recurrent, multilayer perceptron, Boltzmann etc)
            2)In deep learning parameters are needed to be tuned by varying them
            what are the parameters here which i have to vary?
            3)Can you send me the image which will show the complete architecture of neural network showing input layer hidden layer output layer transfer function etc.
            4)since i will be using this code. I have to refer it in the Journal which i am going to write
            should i simply refer this website or any paper of your you suggest me to cite?

          • Jason Brownlee April 19, 2019 at 6:03 am #

            For help in tuning your model, I recommend starting here:

            You can summarize the architecture of your model, learn more here:

            I show how to cite a post or book here:

      • Ganesh Selvaraj November 23, 2019 at 5:16 am #

        Mr. Brownlee,
        If I have a multi input and a multi output regression problem, e.g 4 input and 4 output then how do we deal with that.

        • Jason Brownlee November 23, 2019 at 6:55 am #

          The model can be defined to expect 4 inputs, and then you can have 4 nodes in the output layer.

          • Ganesh Selvaraj November 25, 2019 at 4:47 am #

            Thanks a lot for your kind and prompt reply Mr. Jason.

          • Jason Brownlee November 25, 2019 at 6:33 am #

            You’re welcome.

          • Ganesh Selvaraj November 26, 2019 at 12:45 am #

            Also in case of a multiple output, do we do the prediction and accuracy the same way we do for on out put case in keras. i am new to deep learning so I am sorry of my question is a bit naive.

          • Jason Brownlee November 26, 2019 at 6:08 am #

            You can calculate a score for all outputs and/or for each separate output.

            Training, keras will use a single loss, but your project stakeholders may have more requirements when evaluating the final model.

      • Ganesh Selvaraj November 25, 2019 at 7:43 pm #

        Mr. Jason if I run your code in my system I am getting an error

        TypeError: (‘Keyword argument not understood:’, ‘acitivation’)

        could you please explain why.

      • P.Venkatesh December 25, 2019 at 2:52 am #

        # Regression Example With Boston Dataset: Standardized and Wider
        import pandas as pd
        from keras.models import Sequential
        from keras.layers import Dense
        from keras.wrappers.scikit_learn import KerasRegressor
        from sklearn.model_selection import cross_val_score
        from sklearn.model_selection import KFold
        from sklearn.preprocessing import StandardScaler
        from sklearn.pipeline import Pipeline
        # load dataset
        dataset = pd.read_csv(‘train1.csv’)
        testthedata = pd.read_csv(‘test1.csv’)
        # split into input (X) and output (Y) variables
        X = dataset.drop(columns = [“Id”, “SalePrice”, “Alley”, “MasVnrType”, “BsmtQual”, “BsmtCond”, “BsmtExposure”,
        “BsmtFinType1”, “BsmtFinType2”, “Electrical”, “FireplaceQu”, “GarageType”,
        “GarageFinish”, “GarageQual”, “GarageCond”, “PoolQC”, “Fence”, “MiscFeature”])
        y = dataset[‘SalePrice’].values
        testthedata = testthedata.drop(columns = [“MSZoning”, “Utilities”, “Id”, “Alley”, “MasVnrType”, “BsmtQual”, “BsmtCond”, “BsmtExposure”,
        “Exterior1st”, “Exterior2nd”, “BsmtFinType1”, “BsmtFinType2”, “Electrical”, “FireplaceQu”, “GarageType”,
        “KitchenQual”, “SaleType”, “Functional”, “GarageFinish”, “GarageQual”, “GarageCond”, “PoolQC”, “Fence”, “MiscFeature”])

        from sklearn.preprocessing import LabelEncoder, OneHotEncoder
        le = LabelEncoder()
        le1 = LabelEncoder()
        X[‘MSZoning’] = le.fit_transform(X[[‘MSZoning’]])
        X[‘Street’] = le.fit_transform(X[[‘Street’]])
        X[‘LotShape’] = le.fit_transform(X[[‘LotShape’]])
        X[‘LandContour’] = le.fit_transform(X[[‘LandContour’]])
        X[‘LotConfig’] = le.fit_transform(X[[‘LotConfig’]])
        X[‘LandSlope’] = le.fit_transform(X[[‘LandSlope’]])
        X[‘Utilities’] = le.fit_transform(X[[‘Utilities’]])
        X[‘Neighborhood’] = le.fit_transform(X[[‘Neighborhood’]])
        X[‘Condition1’] = le.fit_transform(X[[‘Condition1’]])
        X[‘Condition2’] = le.fit_transform(X[[‘Condition2’]])
        X[‘BldgType’] = le.fit_transform(X[[‘BldgType’]])
        X[‘HouseStyle’] = le.fit_transform(X[[‘HouseStyle’]])
        X[‘RoofStyle’] = le.fit_transform(X[[‘RoofStyle’]])
        X[‘RoofMatl’] = le.fit_transform(X[[‘RoofMatl’]])
        X[‘Exterior1st’] = le.fit_transform(X[[‘Exterior1st’]])
        X[‘Exterior2nd’] = le.fit_transform(X[[‘Exterior2nd’]])
        X[‘ExterQual’] = le.fit_transform(X[[‘ExterQual’]])
        X[‘ExterCond’] = le.fit_transform(X[[‘ExterCond’]])
        X[‘Foundation’] = le.fit_transform(X[[‘Foundation’]])
        X[‘Heating’] = le.fit_transform(X[[‘Heating’]])
        X[‘HeatingQC’] = le.fit_transform(X[[‘HeatingQC’]])
        X[‘KitchenQual’] = le.fit_transform(X[[‘KitchenQual’]])
        X[‘Functional’] = le.fit_transform(X[[‘Functional’]])
        X[‘PavedDrive’] = le.fit_transform(X[[‘PavedDrive’]])
        X[‘SaleType’] = le.fit_transform(X[[‘SaleType’]])
        X[‘SaleCondition’] = le.fit_transform(X[[‘SaleCondition’]])

        #testing[‘MSZoning’] = le1.fit_transform(testing[[‘MSZoning’]])
        testthedata[‘Street’] = le1.fit_transform(testthedata[[‘Street’]])
        testthedata[‘LotShape’] = le1.fit_transform(testthedata[[‘LotShape’]])
        testthedata[‘LandContour’] = le1.fit_transform(testthedata[[‘LandContour’]])
        testthedata[‘LotConfig’] = le1.fit_transform(testthedata[[‘LotConfig’]])
        #testthedata[‘LandSlope’] = le1.testthedata(testthedata[[‘LandSlope’]])
        #testing[‘Utilities’] = le1.fit_transform(testing[[‘Utilities’]])
        testthedata[‘Neighborhood’] = le1.fit_transform(testthedata[[‘Neighborhood’]])
        testthedata[‘Condition1’] = le1.fit_transform(testthedata[[‘Condition1’]])
        #testthedata[‘Condition2’] = le1.fit_transform(testthedata[[‘Condition2’]])
        testthedata[‘BldgType’] = le1.fit_transform(testthedata[[‘BldgType’]])
        testthedata[‘HouseStyle’] = le1.fit_transform(testthedata[[‘HouseStyle’]])
        testthedata[‘RoofStyle’] = le1.fit_transform(testthedata[[‘RoofStyle’]])
        #testthedata[‘RoofMatl’] = le1.fit_transform(testthedata[[‘RoofMatl’]])
        #testing[‘Exterior1st’] = le1.fit_transform(testing[[‘Exterior1st’]])
        #testing[‘Exterior2nd’] = le1.fit_transform(testing[[‘Exterior2nd’]])
        testthedata[‘ExterQual’] = le1.fit_transform(testthedata[[‘ExterQual’]])
        #testthedata[‘ExterCond’] = le1.fit_transform(testthedata[[‘ExterCond’]])
        testthedata[‘Foundation’] = le1.fit_transform(testthedata[[‘Foundation’]])
        testthedata[‘Heating’] = le1.fit_transform(testthedata[[‘Heating’]])
        #testthedata[‘HeatingQC’] = le1.fit_transform(testthedata[[‘HeatingQC’]])
        #testing[‘KitchenQual’] = le1.fit_transform(testing[[‘KitchenQual’]])
        #testing[‘Functional’] = le1.fit_transform(testing[[‘Functional’]])
        testthedata[‘PavedDrive’] = le1.fit_transform(testthedata[[‘PavedDrive’]])
        #testing[‘SaleType’] = le1.fit_transform(testing[[‘SaleType’]])
        testthedata[‘SaleCondition’] = le1.fit_transform(testthedata[[‘SaleCondition’]])

        X[‘MSZoning’] = pd.to_numeric(X[‘MSZoning’])
        ohe = OneHotEncoder(categorical_features = [1])
        X = ohe.fit_transform(X).toarray()

        for this code, the error was coming how to rectify it, sir,

        File “”, line 1, in
        X = ohe.fit_transform(X).toarray()

        File “/Users/p.venkatesh/opt/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/”, line 629, in fit_transform
        self._categorical_features, copy=True)

        File “/Users/p.venkatesh/opt/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/”, line 45, in _transform_selected
        X = check_array(X, accept_sparse=’csc’, copy=copy, dtype=FLOAT_DTYPES)

        File “/Users/p.venkatesh/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/”, line 496, in check_array
        array = np.asarray(array, dtype=dtype, order=order)

        File “/Users/p.venkatesh/opt/anaconda3/lib/python3.7/site-packages/numpy/core/”, line 85, in asarray
        return array(a, dtype, copy=False, order=order)

        ValueError: could not convert string to float: ‘Y’

  2. Paul June 30, 2016 at 2:28 am #

    Hi Jason,

    Great tutorial(s) they have been very helpful as a crash course for me so far.

    Is there a way to have the model output the estimated Ys in this example? I would like to evaluate the model a little more directly while I’m still learning Keras.


    • Jason Brownlee June 30, 2016 at 6:48 am #

      Hi Paul, you can make predictions by calling model.predict()

    • Rahul November 22, 2016 at 7:23 pm #

      Hey Paul,
      How are you inserting the function model.predict() in the above code to run in on test data? Please let me know.

      • DataScientistPM May 9, 2017 at 11:23 pm #


        Is this how you insert predict and then get predictions in the model?

        def mymodel():
        model = Sequential()
        model.add(Dense(13, input_dim=13, kernel_initializer=’normal’, activation=’relu’))
        model.add(Dense(6, kernel_initializer=’normal’, activation=’relu’))
        model.add(Dense(1, kernel_initializer=’normal’))
        model.compile(loss=’mean_squared_error’, optimizer=’adam’),y, nb_epoch=50, batch_size=5)
        predictions = model.predict(X)
        return model

        I actually want to write the predictions in a file?

  3. Chris July 23, 2016 at 6:24 am #

    Hi, Great post thank you, Could you please give a sample on how to use Keras LSTM layer for considering time impact on this dataset ?

  4. Marc Huertas-Company July 28, 2016 at 4:50 am #


    Thanks for the tutorial. I have a regression problem with bounded outputs (0-1). Is there an opitmal way to deal with this?

    • Jason Brownlee July 28, 2016 at 5:49 am #

      Hi Marc, I think a linear activation function on the output layer will be just fun.

  5. James August 5, 2016 at 6:50 am #

    This is a good example. However, it is not relevant to Neural networks when over-fitting is considered. The validation process should be included inside the fit() function to monitor over-fitting status. Moreover, early stopping can be used based on the internal validation step. This example is only applicable for large data compared to the number of all weights of input and hidden nodes.

    • Jason Brownlee August 5, 2016 at 8:04 am #

      Great feedback, thanks James I agree.

      It is intended as a good example to show how to develop a net for regression, but the dataset is indeed a bit small.

      • Amir October 24, 2016 at 11:08 am #

        Thanks Jason and James! A few questions (and also how to implement in python):
        1) How can we monitor the over-fitting status in deep learning
        2) how can we include the cross-validation process inside the fit() function to monitor the over-fitting status
        3) How can we use early stopping based on the internal validation step
        4) Why is this example only applicable for a large data set? What should we do if the data set is small?

        • Jason Brownlee October 25, 2016 at 8:21 am #

          Great questions Amir!

          1. Monitor the performance of the model on the training and a standalone validation dataset. (even plot these learning curves). When skill on the validation set goes down and skill on training goes up or keeps going up, you are overlearning.
          2. Cross validation is just a method for estimating the performance of a model on unseen data. It wraps everything you are doing to prepare data and your model, it does not go inside fit.
          3. Monitor skill on a validation dataset as in 1, when skill stops improving on the validation set, stop training.
          4. Generally, neural nets need a lot more data to train than other methods.

          Here’s a tutorial on checkpointing that you can use to save “early stopped” models:

  6. Salem August 5, 2016 at 2:44 pm #

    How once can predict new data point on a model while during building the model the training data has been standardised using sklearn.

  7. Guy August 25, 2016 at 10:52 am #


    I am not using the automatic data normalization as you show, but simply compute the mean and stdev for each feature (data column) in my training data and manually perform zscore ((data – mean) / stdev). By normalization I mean bringing the data to 0-mean, 1-stdev. I know there are several names for this process but let’s call it “normalization” for the sake of this argument.

    So I’ve got 2 questions:

    1) Should I also normalize the output column? Or just leave it as it is in my train/test?

    2) I take the mean, stdev for my training data and use them to normalize the test data. But it seems that doesn’t center my data; no matter how I split the data, and no matter that each mini-batch is balanced (has the same distribution of output values). What am I missing / what can I do?

    • Jason Brownlee August 26, 2016 at 10:30 am #

      Hi Guy, yeah this is normally called standardization.

      Generally, you can get good results from applying the same transform to the output column. Try and see how it affects your results. If MSE or RMSE is the performance measure, you may need to be careful with the interpretation of the results as the scale of these scores will also change.

      Yep, this is a common problem. Ideally, you want a very large training dataset to effectively estimate these values. You could try using bootstrap on the training dataset (or within a fold of cross validation) to create a more robust estimate of these terms. Bootstrap is just the repeated subsampling of your dataset and estimation of the statistical quantities, then take the mean from all the estimates. It works quite well.

      I hope that helps.

  8. Pranith Kumar Pola September 2, 2016 at 3:52 am #

    Hello Jason,

    How should i load multiple finger print images into keras.

    Can you please advise further.

    Best Regards,

  9. Luciano September 10, 2016 at 3:32 am #

    Hi Jason, great tutorial. The best out there for free.

    Can I use R² as my metric? If so, how?


  10. sumon October 1, 2016 at 2:38 am #

    shouldn’t results.mean() print accuracy instead of error?

    • Jason Brownlee October 1, 2016 at 8:03 am #

      We summarize error for regression problems instead of accuracy (x/y correct). I hope that helps.

  11. David October 19, 2016 at 7:34 pm #


    if I have a new dataset, X_new, and I want to make a prediction, the model.predict(X_new) shows the error ”NameError: name model is not defined’ and estimator.predict(X_test) shows the error message ‘KerasRegressor object has no attribute model’.

    Do you have any suggestion? Thanks.

  12. Avhirup October 22, 2016 at 11:19 pm #

    I’m getting more error by standardizing dataset using the same seed.What must be the reason behind it?

    • Avhirup October 22, 2016 at 11:25 pm #

      also deeper network topology seems not to help .It increases the MSE

      • Avhirup October 22, 2016 at 11:32 pm #

        deeper network without standardisation gives better results.Somehow standardisation is adding more noise

  13. Michele Vascellari November 2, 2016 at 9:28 pm #

    Hey great tutorial. I tried to use both Theano and Tensorflow backend, but I obtained very different results for the larger_model. With Theano I obtained results very similar to you, but with Tensorflow I have MSE larger than 100.

    Do you have any clue?


    • Jason Brownlee November 3, 2016 at 7:59 am #

      Great question Michele,

      Off the cuff, I would think it is probably the reproducibility problems we are seeing with Python deep learning stack. It seems near impossible to tie down the random number generators used to get repeatable results.

      I would not rule out a bug in one implementation or another, but I would find this very surprising for such a simple network.

  14. Kenny November 7, 2016 at 4:21 pm #

    hi, i have a question about sklearn interface.
    although we sent the NN model to sklearn and evaluate the regression performance, how can we get the exactly predictions of the input data X, like usually when we r using Keras we can call the model.predict(X) function in keras. btw, I mean the model is in sklearn right?

    • Jason Brownlee November 8, 2016 at 9:49 am #

      Hi Kenny,

      You can use the sklearn model.predict() function in the same way to make predictions on new input data.

      • Silvan Mühlemann November 23, 2016 at 6:48 am #

        Hi Jason

        I bought the book “Deep Learning with Python”. Thanks for your great work!

        I see the question about “model.predict()” quite often. I have it as well. In the code above “model” is undefined. So what variable contains the trained model? I tried “estimator.predict()” but there I get the following error:

        > ‘KerasRegressor’ object has no attribute ‘model’

        I think it would help many readers

        • Jason Brownlee November 23, 2016 at 9:06 am #

          Thanks for your support Silvan.

          With a keras model, you can train the model, assign it to a variable and call model.predict(). See this post:

          In the above example, we use a pipeline, which is also a sklearn Estimator. We can call estimator.predict() directly (same function name, different API), more here:

          Does that help?

          • Dee November 24, 2016 at 9:18 am #

            Hey Jason,

            Is there anyway for you to provide a direct example of using the model.predict() for the example shown in this post? I’ve been following your posts for a couple months now and have gotten much more comfortable with Keras. However, I still cannot seem to be able to use .predict() on this example.


          • Jason Brownlee November 24, 2016 at 10:44 am #

            Hi Dee,

            There info on the predict function here:

            There’s an example of calling predict in this post:

            Does that help?

          • Silvan Mühlemann November 25, 2016 at 7:20 am #

            Hi Dee

            Jason, correct me if I am wrong: If I understand correctly the sample above does *not* provide a trained model as output. So you won’t be able to use the .predict() function immediately.

            Instead you have to train the pipeline:


            Then only you can do predictions:

            pipeline.predict(numpy.array([[ 0.0273, 0. , 7.07 , 0. , 0.469 , 6.421 ,
            78.9 , 4.9671, 2. , 242. , 17.8 , 396.9 ,
            9.14 ]]))

            # will return array(22.125564575195312, dtype=float32)

          • Jason Brownlee November 25, 2016 at 9:34 am #

            Yes, thanks for the correction.

            Sorry, for the confusion.

          • Dee November 28, 2016 at 12:07 pm #

            Hey Silvan,

            Thanks for the tip! I had a feeling that the crossval from SciKit did not output the fitted model but just the RMSE or MSE of the crossval cost function.

            I’ll give it a go with the .fit()!


        • Sud March 17, 2017 at 1:03 am #

          Hi Jason & Silvan,

          Could you pls tell me whether I am given “,Y)” in correct position?
          pls correct me if I am wrong.

          estimators = []
          estimators.append((‘standardize’, StandardScaler()))
          estimators.append((‘mlp’, KerasRegressor(build_fn=larger_model, nb_epoch=50, batch_size=5, verbose=0)))
          pipeline = Pipeline(estimators)
          kfold = KFold(n_splits=10, random_state=seed)
          results = cross_val_score(pipeline, X, Y, cv=kfold)
          print(“Larger: %.2f (%.2f) MSE” % (results.mean(), results.std()))

          Thank you!

          • Jason Brownlee March 17, 2017 at 8:29 am #

   is not needed as you are evaluating the pipeline using kfold cross validation.

  15. Rahul November 18, 2016 at 3:28 pm #

    Dear Jason,
    I have a few questions. I am running the wider neural network on a dataset that corresponds to modelling with better accuracy the number of people walking in and out of a store. I get Wider: 24.73 (7.64) MSE. <– Can you explain exactly what those values mean?

    Also can you suggest any other method of improving the neural network? Do I have to keep re-iterating and tuning according to different topological methods?

    Also what exact function do you use to predict the new data with no ground truth? Is it the sklearn model.predict(X) where X is the new dataset with one lesser dimension because there is no output? Could you please elaborate and explain in detail. I would be really grateful to you.

    Thank you

    • Jason Brownlee November 19, 2016 at 8:45 am #

      Hi Rahul,

      The model reports on Mean Squared Error (MSE). It reports both the mean and the standard deviation of performance across 10 cross validation folds. This gives an idea of the expected spread in the performance results on new data.

      I would suggest trying different network configurations until you find a setup that performs well on your problem. There are no good rules for net configuration.

      You can use model.predict() to make new predictions. You are correct.

      • Rishabh Agrawal September 16, 2017 at 1:24 pm #

        Hey! Jason.

        Great work on machine learning. I have learned everything from here.

        One question.
        When we say that we have to train the model first and then predict, are we trying to determine what no. of layers and what no. of neurons, along with other Keras attributes, to get the best fit…and then use the same attributes on prediction dataset?

        Bottom line: are we trying to determine what keras attributes fits our model while we are training the model?

        • Jason Brownlee September 17, 2017 at 5:23 am #

          Generally, we want a model that makes good predictions on new data where we don’t know the answer.

          We evaluate different models and model configurations on test data to get an idea of how the models will perform when making predictions on new data, so that we can pick one or a few that we think will work well.

          Does that help?

  16. Kim December 31, 2016 at 5:45 pm #

    Hi Jason,

    Thank you for the great tutorial.
    I redo the code on a Ubuntu machine and run them on TITAN X GPU. While I get similar results for experiment in section 4.1, my results in section 4.2 is different from yours:

    Larger: 103.31 (236.28) MSE

    no_epoch is 50 and batch_size is 5.

  17. A. Batuhan D. January 20, 2017 at 8:43 pm #

    Hi Jason,

    Thanks for sharing these useful tutorials. Two questions:

    1) If regression model calculates the error and returns as result (no doubt for this) then what is those ‘accuracy’ values printed for each epoch when ‘verbose=1’?

    2) With those predicted values (fit.predict() or cross_val_predict), is it meaningful to find the closest value(s) to predicted result and calculate an accuracy? (This way, more than one accuracy can be calculated: accuracy for closest 2, closest 3, …)

    • Jason Brownlee January 21, 2017 at 10:28 am #

      Hi A. Batuhan D.,

      1. You cannot print accuracy for a regression problem, it does not make sense. It would be loss or error.
      2. Again, accuracy does not make sense for regression. It sounds like you are describing an instance based regression model like kNN?

      • A. Batuhan D. January 23, 2017 at 7:36 pm #

        Hi jason,

        1. I know, it doesn’t make any sense to calculate accuracy for a regression problem but when using Keras library and set verbose=1, function prints accuracy values also alongside with loss values. I’d like to ask the reason of this situation. It is confusing. In your example, verbose parameter is set to 0.

        2. What i do is to calculate some vectors. As input, i’m using vectors (say embedded word vectors of a phrase) and trying to calculate a vector (next word prediction) as an output (may not belong to any known vector in dictionary and probably not). Afterwards, i’m searching the closest vector in dictionary to one calculated by network by cosine distance approach. Counting model predicted vectors who are most similar to the true words vector (say next words vector) than others in dictionary may lead to a reasonable accuracy in my opinion. That’s a brief summary of what i do. I think that it is not related to instance based regression models.


        • Jason Brownlee January 24, 2017 at 11:03 am #

          That is very odd that accuracy is printed for a regression problem. I have not seen it, perhaps it’s a new bug in Keras?

          Are you able to paste a short code + output example?

  18. Partha January 24, 2017 at 7:08 am #

    I tried this tutorial – but it crashes with the following:
    Traceback (most recent call last):
    File “”, line 132, in
    results = cross_val_score(estimator, X, Y, cv=kfold)
    File “C:\Python27\lib\site-packages\sklearn\model_selection\”, line 140, in cross_val_score
    for train, test in cv_iter)
    File “C:\Python27\lib\site-packages\sklearn\externals\joblib\”, line 758, in __call__
    while self.dispatch_one_batch(iterator):
    File “C:\Python27\lib\site-packages\sklearn\externals\joblib\”, line 603, in dispatch_one_batch
    tasks = BatchedCalls(itertools.islice(iterator, batch_size))
    File “C:\Python27\lib\site-packages\sklearn\externals\joblib\”, line 127, in __init__
    self.items = list(iterator_slice)
    File “C:\Python27\lib\site-packages\sklearn\model_selection\”, line 140, in
    for train, test in cv_iter)
    File “C:\Python27\lib\site-packages\sklearn\”, line 67, in clone
    new_object_params = estimator.get_params(deep=False)
    TypeError: get_params() got an unexpected keyword argument ‘deep’

    Some one else also got this same error and posted a question on StackOverflow.

    Any help is appreciated.

    • Jason Brownlee January 24, 2017 at 11:07 am #

      Sorry to hear that.

      What versions of sklearn, Keras and tensorflow or theano are you using?

      • David January 25, 2017 at 12:23 am #

        I have the same problem after an update to Keras 1.2.1. In my case: theano is 0.8.2 and sklearn is 0.18.1.

        I could be wrong, but this could be a problem with the latest version of Keras…

        • David January 25, 2017 at 3:01 am #

          Ok, I think I have managed to solve the issues. I think the problem are crashess between different version of the packages. What it solves everything is to create an evironment. I have posted in stack overflow a solution, @Partha, here:

          • Partha January 25, 2017 at 4:31 am #

            My versions are 0.8.2 for theano and 0.18.1 for sklearn and 1.2.1 for keras.
            I did a new anaconda installation on another machine and it worked there.

          • Jason Brownlee January 25, 2017 at 10:08 am #

            Thanks David, I’ll take a look at the post.

          • Jason Brownlee January 25, 2017 at 10:58 am #

            Hi David, I have reproduced the fault and understand the cause.

            The error is caused by a bug in Keras 1.2.1 and I have two candidate fixes for the issue.

            I have written up the problem and fixes here:

        • Jason Brownlee January 25, 2017 at 10:06 am #

          Thanks, I will investigate and attempt to reproduce.

          • David January 25, 2017 at 8:51 pm #

            yes, Jason’s solution is the correct one. My solution works because in the environment the Keras version installed is 1.1.1, not the one with the bug (1.2.1).

  19. Andy January 25, 2017 at 5:05 am #

    Great tutorial, many thanks!

    Just wondering how do you train on a standardaised dataset (as per section 3), but produce actual (i.e. NOT standardised) predictions with scikit-learn Pipeline?

    • Jason Brownlee January 25, 2017 at 10:10 am #

      Great question Andy,

      The standardization occurs within the pipeline which can invert the transforms as needed. This is one of the benefits of using the sklearn Pipeline.

  20. AndyS January 25, 2017 at 7:32 am #

    Great tutorial, many thanks!

    How do I recover actual predictions (NOT standardized ones) having fit the pipeline in section 3 with,Y)? I believe pipeline.predict(testX) yields a standardised predictedY?

    I see there is an inverse_transform method for Pipeline, however appears to be for only reverting a transformed X.

  21. James Bond January 26, 2017 at 1:39 am #

    Thanks for you post..

    I am currently having some problems with an regression problem, as such you represent here.

    you seem to both normal both input and output, but what do you do if if the output should be used by a different component?… unnormalize it? and if so, wouldn’t the error scale up as well?

    I am currently working on mapping framed audio to MFCC features.
    I tried a lot of different network structures.. cnn, multiple layers..

    I just recently tried adding a linear layer at the end… and wauw.. what an effect.. it keeps declining.. how come?.. do you have any idea?

    • Jason Brownlee January 26, 2017 at 4:47 am #

      Hi James, yes the output must be denormalized (invert any data prep process) before use.

      If the data prep processes are separate, you can keep track of the Python object (or coefficients) and invert the process ad hoc on predictions.

  22. Sarick January 27, 2017 at 6:59 pm #

    Is there any way to use pipeline but still be able to graph MSE over epochs for kerasregressor?

    • Jason Brownlee January 28, 2017 at 7:35 am #

      Not that I have seen Sarick. If you figure a way, let me know.

  23. Aritra January 28, 2017 at 9:33 pm #

    Can you tell me how to do regression with convolutional neural network?

  24. kono January 29, 2017 at 4:37 pm #

    Hi Jason,

    Could you tell me how to decide batch_size? Is there a rule of thumb for this?

  25. kono January 29, 2017 at 4:53 pm #

    Hi Jason,

    I see some people use fit_generator to train a MLP. Could you tell me when to use fit_generator() and when to use fit()?

  26. Pratik Patil February 2, 2017 at 12:39 am #

    Hi Jason,
    Thank you for the post. I used two of your post this and one on GridSearchCV to get a keras regression workflow with Pipeline.
    My question is how to get weight matrices and bias vectors of keras regressor in a fit, that is on the pipeline.
    (My posts keep getting rejected/disappear, am I breaking some protocol/rule of the site?)

    • Jason Brownlee February 2, 2017 at 1:59 pm #

      Comments are moderated, that is why you do not seem the immediately.

      To access the weights, I would recommend training a standalone Keras model rather than using the KerasClassifier and sklearn Pipeline.

  27. Pedro February 18, 2017 at 7:57 am #


    Thank you for the excelent example! as a beginner, it was the best to start with.
    But I have some questions:

    In the wider topology, what does it mean to have more neurons?

    e.g., in my input layer I “receive” 150 dimensions/features (input_dim) and output 250 dimensions (output_dim). What is in those 100 “extra” neurons (that are propagated to the next hidden layers) ?


    • Jason Brownlee February 18, 2017 at 8:47 am #

      Hi Pedro,

      A neuron is a single learning unit. A layer is comprised of neurons.

      The size of the input layer must match the number of input variables. The size of the output layer must match the number of output variables or output classes in the case of classification.

      The number of hidden layers can vary and the number of neurons per hidden layer can vary. This is the art of configuring a neural net for a given problem.

      Does that help?

      • Pedro Fialho February 20, 2017 at 6:27 am #


        In your wider example, the input layer does not match/output the number of input variables/features:

        model.add(Dense(20, input_dim=13, init=’normal’, activation=’relu’))

        so my question is: apart from the 13 input features, what’s in the 7 neurons, output by this (input) layer?

        • Jason Brownlee February 20, 2017 at 9:33 am #

          Hi Pedro, I’m not sure I understand, sorry.

          The example takes as input 13 features. The input layer (input_dim) expects 13 input values. The first hidden layer combines these weighted inputs 20 times or 20 different ways (20 neurons in the layer) and each neuron outputs one value. These are combined into one neuron (poor guy!) which outputs a prediction.

          • Pedro Fialho February 21, 2017 at 9:14 pm #


            Yes, now I understand (I was not confident that the input layer was also an hidden layer). Thank you again

          • Jason Brownlee February 22, 2017 at 10:00 am #

            The input layer is separate from the first hidden layer. The Keras API makes this confusing because both are specified on the same line.

  28. Bartosz February 19, 2017 at 11:42 am #

    Hi Jason,

    You’ve said that an activation function is not necessary as we want a numerical value as an output of our network. I’ve been looking at recurrent network and in particular this guide: . It recommended using an identity activation function at the output. I was wondering is there any difference between your approach: using Dense(1) as the output layer, and adding an identity activation function at the output of the network: Activation(‘linear’) ? are there any situations when I should use the identity activation layer? Could you elaborate on this?

    In case of this tutorial the network would look like this with the identity function:
    model = Sequential()
    model.add(Dense(13, input_dim=13, init=’normal’, activation=’relu’))
    model.add(Dense(6, init=’normal’, activation=’relu’))
    model.add(Dense(1, init=’normal’))


    • Jason Brownlee February 20, 2017 at 9:25 am #

      Indeed, the example uses a linear activation function by default.

  29. Dan March 18, 2017 at 7:23 am #

    Hi Jason,
    my current understanding is that we want to fit + transform the scaling only on our training set and transform without fit on the testset. In case we use the pipeline in the cv like you did. Do we ensure that for each cv the scaling fit only takes place for the 9 training sets and the transform without the fit on the test set?

    Thanks very much

    • Jason Brownlee March 18, 2017 at 7:55 am #

      Top question.

      The Pipeline does this for us. It is fit then applied to the training set each CV fold, then the fit transforms are applied to the test set to evaluate the model on the fold. It’s a great automatic pattern built into sklearn.

  30. Paula March 21, 2017 at 11:46 pm #

    Hi! I ran your code with your data and we got a different MSE. Should I be concerned? Thanks for help!

  31. Annanya March 29, 2017 at 4:23 am #

    Hi Jason

    while running this above code i found the error as

    Y = dataset[:,25]
    IndexError: index 25 is out of bounds for axis 1 with size 1

    i had declared X and Y as

    X = dataset[:,0:25]
    Y = dataset[:,25]

    help me for solving this

  32. Sagar March 29, 2017 at 10:54 am #

    Hi Jason, Thanks for your great article !

    I am working with same problem [No of samples: 460000 , No of Features:8 ] but my target column output has too big values like in between 20000 to 90000 !

    I tried different NN architecture [ larger to small ] with different batch size and epoch but still not getting good accuracy !

    should i have to normalize my target column ? Please help me for solving this !

    Thanks for your time !

    • Jason Brownlee March 30, 2017 at 8:45 am #

      Yes, you must rescale your input and output data.

      • Sagar March 31, 2017 at 4:22 pm #

        Hi Jason, Thanks for your reply !

        Yes i tried different ways to rescale my data using

        url but i still i only got 20% accuracy !

        I tried different NN topology with different batch size and epoch but not getting good results !

        My code :

        inputFilePath = “path-to-input-file”

        dataframe = pandas.read_csv(inputFilePath, sep=”\t”, header=None)
        dataset = dataframe._values

        # split into input (X) and output (Y) variables
        X = dataset[:,0:8]
        Y = dataset[:,8]

        scaler = StandardScaler().fit(X)
        X = scaler.fit_transform(X)

        maxnumber = max(Y) #Max number i got is : 79882.0

        Y=Y / maxnumber

        # create model
        model = Sequential()
        model.add(Dense(100, input_dim=8, init=’normal’, activation=’relu’))
        model.add(Dense(100, init=’normal’, activation=’relu’))
        model.add(Dense(80, init=’normal’, activation=’relu’))
        model.add(Dense(40, init=’normal’, activation=’relu’))
        model.add(Dense(20, init=’normal’, activation=’relu’))
        model.add(Dense(8, init=’normal’, activation=’relu’))
        model.add(Dense(6, init=’normal’, activation=’relu’))
        model.add(Dense(6, init=’normal’, activation=’relu’))
        model.add(Dense(1, init=’normal’,activation=’relu’))

        model.compile(loss=’mean_absolute_error’, optimizer=’adam’, metrics=[‘accuracy’])
        # checkpoint, Y,nb_epoch=100, batch_size=400)
        # 4. evaluate the network
        loss, accuracy = model.evaluate(X, Y)
        print(“\nLoss: %.2f, Accuracy: %.2f%%” % (loss, accuracy*100))

        I tried MSE and MAE in loss with adam and rmsprop optimizer but still not getting accuracy !

        Please help me ! Thanks

        • Jason Brownlee April 1, 2017 at 5:51 am #

          100 epochs will not be enough for such a deep network. It might need millions.

          • sagar April 6, 2017 at 11:29 pm #

            Hello Jason, Thanks for your reply !

            How can i ensure that i will get output after millions of epoch because after 10000 epoch accuracy is still 0.2378 !

            How can i dynamically decide the number of layers and Neurons size in my neural network ? Is there any way ?

            I already used neural network checkpoint mechanism to ensure its accuracy on validation spilt !
            My code looks like

            model.compile(loss=’mean_absolute_error’, optimizer=’adam’, metrics=[‘accuracy’])

            checkpoint = ModelCheckpoint(save_file_path, monitor=’val_acc’, verbose=1, save_best_only=True, mode=’max’)

            callbacks_list = [checkpoint]

  , Y_Output_Vector,validation_split=0.33, nb_epoch=1000000, batch_size=1300, callbacks=callbacks_list, verbose=0)

            Let me know if i miss something !

          • Jason Brownlee April 9, 2017 at 2:43 pm #

            Looks good.

            There are neural net growing and pruning algorithms but I do not have tutorials sorry.

            See the book: Neural Smithing

  33. Charlotte March 30, 2017 at 8:58 am #

    Hi Jason,

    Thanks for this great tutorial.

    I do believe that there is a small mistake, when giving as parameters the number of epochs, the documentations shows that it should be given as:
    estimator = KerasRegressor(build_fn=baseline_model, epochs=100, batch_size=5, verbose=0).

    When giving:
    estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=0)
    the function doesn’t recognise the argument and just ignore it.

    Can you confirm?

    I’m using your ‘How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras’ tutorial and have trouble tuning the number of epochs. If I checked one of the results of the GridSearchCv with a simple cross validation with the same number of folds I don’t obtain the same results at all. There might be a similar mistake there?

    Thank your for your time!

    • Jason Brownlee March 30, 2017 at 9:01 am #

      You can pass through any parameters you wish:

      You will get different results on each run because neural network behavior is stochastic. this post will help:

      • Charlotte March 30, 2017 at 9:14 am # precises that number of epochs should be given as epochs=n and not nb_epoch=n. When giving the latter, the function will ignore the argument. As an example:

        estimators = []
        estimators.append((‘standardize’, StandardScaler()))
        estimators.append((‘mlp’, KerasRegressor(build_fn=baseline_model, nb_epoch=’hi’, batch_size=50, verbose=0)))
        pipeline = Pipeline(estimators)
        kfold = KFold(n_splits=10, random_state=seed)
        results = cross_val_score(pipeline, X1, Y, cv=kfold)
        print(“Standardized: %.5f (%.2f) MSE” % (results.mean(), results.std()))

        will not raise any error.

        Am I missing something?

        The results I get are strongly different and I don’t think that this can be due to the stochasticity of the NN behaviour.

        • Jason Brownlee March 31, 2017 at 5:49 am #

          Thanks Charlotte, that looks like a recent change for Keras 2.0. I will update the examples soon.

        • Caleb Everett April 25, 2017 at 7:50 am #

          Thank you!

  34. Jens April 16, 2017 at 7:59 am #

    Hey Jason,

    I tried the first part and got a different result for the baseline.
    I figured that the
    estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=0)

    is not working as expected for me as it takes the default epoch of 10. When I change it to epochs=100 it works.

    I just read the above comment, it seems like they changed that in the API

  35. Martin April 19, 2017 at 11:35 pm #

    Hi Jason,

    how can i get regression coefficients?

  36. Luca April 27, 2017 at 12:34 am #

    Dear Jason,

    Thanks for your tutorials!!
    I made it work in a particle physics example I’m working on, and I have 2 questions.
    1) Imagine my target is T=a/b (T=true_value/reco_value). If I give to the regression both “a” and “b” as features, then it should be able to find exactly the correct solution every time, right? Or there is some procedure that try to avoid overtraining, and do not allow to give a results precise at 100%? I ask because I tried, and I got “good” performances, not optimal as I would expect (if it has “a” and “b” it should be able to find the correct T in the test too at 100% ). If I remove b from the regression, and I add other features, then y_hat/y_test is peaking at 0.75, meaning the the regression is biassed. Could you help me understanding these two facts?
    2) I want to save the regression in order to use it later. After the training I do: a) estimator.model.save_weights and b) open(‘models/’+model_name, ‘w’).write(estimator.model.to_json()).
    Estimator is “estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=50, verbose=1)”. How can I later use those 2 files to directly make predictions?

    Thanks a lot,

    • Jason Brownlee April 27, 2017 at 8:42 am #

      Sorry, I’m not sure I follow your first question, perhaps you can restate it briefly?

      See this post on saving and loading keras models:

      • Luca April 28, 2017 at 1:30 am #

        Hi Jason,
        my point is the following. The regression is trained on a set of features (a set of floats), and it provides a single output (a float), the target. During the training the regression learn how to guess the target as a function of the features.
        Of course the target should not be function of the features, otherwise the problem is trivial, but I tried to test this scenario as an initial check. What I did (as a test) is to define a target that is division of 2 features, i.e. I’m giving to the regression “a” and “b”, and I’m saying that the target to find is a/b. In that simple case, the regression should be smart enough to understand during the training that my target is simply a/b. So in the test it should be able to find the correct value with 100% precision, i.e. dividing the 2 features. What I found is that in the test the regression find a value (y_hat) that is close to a/b, but not exactly a/b. So I was wondering why the regression is behaving like that.


        • Jason Brownlee April 28, 2017 at 7:49 am #

          This is a great question.

          At best machine learning can approximate a function, some approximations are better than others.

          That is the best that I can answer it.

  37. Ignacio April 27, 2017 at 12:36 am #

    Hi Jason,

    thanks for your posts, I really enjoy them. I have a quick question: If I want to use sklearn’s GridSearchCV and :
    in my model, will the highest score correspond to the combination with the *highest* mse?
    If that’s the case I assume there is a way to invert the scoring in GridSearchCV?

    • Jason Brownlee April 27, 2017 at 8:43 am #

      When using MSE you will want to find the config that results in the lowest error, e.g. lowest mean squared error.

  38. Navdeep May 2, 2017 at 10:37 pm #

    Dear Jason

    I have datafile with 7 variables, 6 inputs and 1 output

    #from sklearn.cross_validation import train_test_split
    #rain, test = train_test_split(data2, train_size = 0.8)

    #train_y= train[‘Average RT’]
    #train_x= train[train.columns.difference([‘Average RT’])]

    ##test_y= test[‘Average RT’]
    #est_x= test[test.columns.difference([‘Average RT’])]

    x=data2[data2.columns.difference([‘Average RT’])]
    y=data2[‘Average RT’]

    print x.shape
    print y.shape

    (1035, 6)
    # define base model
    def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(7, input_dim=7, kernel_initializer=’normal’, activation=’relu’))
    model.add(Dense(1, kernel_initializer=’normal’))
    # Compile model
    model.compile(loss=’mean_squared_error’, optimizer=’adam’)
    return model

    # fix random seed for reproducibility
    #seed = 7
    # evaluate model with standardized dataset
    estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=0)

    kfold = KFold(n_splits=5, random_state=seed)
    results = cross_val_score(estimator, x,y, cv=kfold)
    print(“Results: %.2f (%.2f) MSE” % (results.mean(), results.std()))

    but getting error below
    ValueError: Error when checking input: expected dense_15_input to have shape (None, 7) but got array with shape (828, 6)

    Also i tried changing

    model.add(Dense(7, input_dim=7, kernel_initializer=’normal’, activation=’relu’))
    model.add(Dense(6, input_dim=6, kernel_initializer=’normal’, activation=’relu’))

    because total i have 7 variables out of which 6 are input, 7th Average RT is output

    could u help pls

    could you help pls
    there is non linear relationship also bw o/p and i/p, as ai am trying keras neural to develop relationship that is non linear by itself

    • Jason Brownlee May 3, 2017 at 7:39 am #

      If you have 6 inputs and 1 output, you will have 7 rows.

      You can separate your data as:

      Then, you can configure the input layer of your neural net to expect 6 inputs by setting the “input_dim” to 6.

      Does that help?

      • Aghiles June 12, 2017 at 1:40 am #

        Dear Jason

        and if I have 2 output, can I write

        y = data[:, 0:6]
        y = data[:, 6:7]


        • Jason Brownlee June 12, 2017 at 7:11 am #

          Not quite.

          You can retrieve the 2 columns from your matrix and assign them to y so that y is now 2 columns and n rows.

          Perhaps get more comfortable with numpy array slicing first?

  39. amit kumar May 10, 2017 at 4:23 am #

    sir plz give me code of “to calculayte cost estimation usin back prpoation technique uses simodial activation function”

  40. Francis May 11, 2017 at 5:22 pm #

    Hi Jason,

    I’m new in deep learning and thanks for this impressive tutorial. However, I have an important question about deep learning methods:

    How can we interpret these features just like lasso or other feature selection methods?

    In my project, I have about 20000 features and I want to selected or ranking these features using deep learning methods. How can we do this?

    Thank you!

  41. Alogomining May 15, 2017 at 1:02 pm #

    Thks a lot for this post.
    is there a way to implement a Tweedie regression in thsi framework ?

    • Jason Brownlee May 16, 2017 at 8:33 am #

      Sorry, I have not heard of “tweedie regression”.

  42. Inge May 23, 2017 at 10:41 pm #


    Thank you for the sharing.
    I met a problem, and do not know how to deal with it.
    When it goes to “results = cross_val_score(estimator, X, Y, cv=kfold)”, I got warnings shown as below:
    C:\Program Files\Anaconda3\lib\site-packages\ipykernel\ UserWarning: Update your Dense call to the Keras 2 API: Dense(13, input_dim=13, kernel_initializer="normal", activation="relu")
    C:\Program Files\Anaconda3\lib\site-packages\ipykernel\ UserWarning: Update your Dense call to the Keras 2 API: Dense(1, kernel_initializer="normal")
    C:\Program Files\Anaconda3\lib\site-packages\keras\backend\ UserWarning: Expected no kwargs, you passed 1
    kwargs passed to function are ignored with Tensorflow backend

    I’ve tried to update Anaconda and its all packages,but cannot fix it.

  43. Huyen May 27, 2017 at 2:41 am #

    Hi Jason,

    I have a classic question about neural network for regression but I haven’t found any crystal answer. I have seen the very good performances of neural network on classification for image and so on but still doubt about its performances on regression. In fact, I have tested with 2 cases of data linear and non linear, 2 input and 1 output with random bias but the performances were not good in comparison with other classic machine learning methods such as SVM or Gradient Boosting… So for regression, which kind of data we should apply neural network? Whether the data is more complexity, its performance will be better?

    Thank you for your answer in advance. Hope you have a good day 🙂

    • Jason Brownlee June 2, 2017 at 11:56 am #

      Deep learning will work well for regression but requires larger/harder problems with lots more data.

      Small problems will be better suited to classical linear or even non-linear methods.

      • Huyen June 7, 2017 at 4:20 pm #

        Thank you Jason,

        Return in your examples, I have one question about the appropriate number of neurons should be in each hidden layer and the number of hidden layers in a network. I have read some recommendations such that number of hidden layer neurons are 2/3 of size of input layer and the number of neurons it should (a) be between the input and output layer size, (b) set to something near (inputs+outputs) * 2/3, or (c) never larger than twice the size of the input layer to prevent the overfitting. I doubt about these constraints because I haven’t found any mathematical proofs about them.

        With your example, I increase the number of layers to 7 and with each layer, I use a large number of neurons (approximately 300-200) and it gave MSQ to 0.1394 through 5000 epochs. So do you have any conditions about these number when you build a network?

        • Jason Brownlee June 8, 2017 at 7:38 am #

          No, generally neural network configuration is trial and error with a robust test harness.

  44. Ali June 2, 2017 at 4:12 pm #

    Hi jason.Can i apply regression for Autoencoders?

  45. KK June 3, 2017 at 4:42 am #

    Hi Jason

    Thank you for the great tutorial code! I have some questions regarding regularization and kenel initializer.

    I’d like to add L1/L2 regularization when updating the weights. Where should I put the commands?

    I also have a question abut assigning ” kernel_initializer=’normal’,” Is it necessary to initialize normal kernel?



  46. Kid June 8, 2017 at 6:31 pm #

    Dear Dr.,

    I need you favor on how to use pre trained Keras based sequential model for NER with input text.

    Example if “word1 word2 word3.” is a sentence with three words, how I can convert it to numpy array expected by Keras to predict each words NE tag set from the loaded pretrained Keras model.

    With regards,

  47. Sayak Paul June 15, 2017 at 4:56 am #

    I am getting a rate of more than 58 every time.

    Here’s the exact code being used:

    from numpy.random import seed

    import pandas
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.wrappers.scikit_learn import KerasRegressor
    from sklearn.model_selection import cross_val_score
    from sklearn.model_selection import KFold
    from sklearn.preprocessing import StandardScaler
    from sklearn.pipeline import Pipeline

    # load dataset
    dataframe = pandas.read_csv(“housing.csv”, delim_whitespace=True, header=None)
    dataset = dataframe.values
    # split into input (X) and output (Y) variables
    X = dataset[:,0:13]
    Y = dataset[:,13]

    # Basic NN model using Keras

    def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(13, input_dim=13, kernel_initializer=’normal’, activation=’relu’))
    model.add(Dense(1, kernel_initializer=’normal’))
    # Compile model
    model.compile(loss=’mean_squared_error’, optimizer=’adam’)
    return model

    #seed = 1
    # evaluate model with standardized dataset
    estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=0)

    #kfold = KFold(n_splits=10, random_state=seed)
    results = cross_val_score(estimator, X, Y, cv=10)
    print(“Results: %.2f (%.2f) MSE” % (results.mean(), results.std()))

  48. Hossein June 16, 2017 at 3:53 am #

    Thank you very much,
    Cant we use CNN instead of Dense layers? in case we want to use CNN, should we use conv2d or simply conv?
    In regression problems using deep architectures, can we use AlexNet, VGGNet, and the likes just like how we use them with images?
    I would appreciate if you could have an example in this regard as well
    Best Regards

    • Jason Brownlee June 16, 2017 at 8:05 am #

      I would not recommend a CNN for regression. I would recommend a MLP.

      The shape of your input data (1d, 2d, …) will define the type of CNN to use.

      • Jack April 24, 2020 at 11:21 pm #

        Why you recommend MLP instead of CNN?

  49. Jose June 18, 2017 at 1:34 pm #

    Great tutorial! I liked to save the weight that I adjusted in training, how can I do it?

  50. Jacob June 20, 2017 at 11:14 am #

    Thank you very much.
    I have a question.
    Is this tutorial suitable for wind speed prediction?

  51. Roy July 2, 2017 at 3:04 pm #

    Hi, Thank you for the tutorial. Few questions here.

    1. What is the differences when we use
    KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=0)

    and with, y_train, batch_size=batch_size,epochs=epochs,verbose=1,validation_data=(x_test, y_test))? AFAIK, with when using KerasRegressor, we can do CV while can’t on Am I right? Will both result in the same MSE etc?

    2. How do create a neural network that predict two continuous output using Keras? Here, we only predict one output, how about two or more output? How do we implement that? (Multioutput regression problem?)

    • Jason Brownlee July 3, 2017 at 5:30 am #

      Correct, using the sklearn wrapper lets us use tools like CV on small models.

      You can have two outputs by changing the number of nodes in the output layer to 2.

      • Roy July 4, 2017 at 2:35 pm #

        Thanks for the reply.

        Does that mean that with sklearn wrapper model and with sklearn) model are able to get the same mse if both are given same train, valid, and test dataset (assume sklearn wrapper only run 1st fold)? Or there are some differences behind the model?

        I read about the Keras Model class (functional API) ( ). Is the implementation of the Model class,

        model = Model(inputs=a1, outputs=[output1, output2])

        the same as adding 1 node more at the output layer? If no, what’s the differences?

  52. Nandini July 4, 2017 at 5:17 pm #

    from keras.layers.core import Dense,Activation,Dropout
    from json import load,dump
    from sklearn.metrics import mean_squared_error
    from sklearn.metrics import mean_absolute_error
    from keras.models import Sequential
    from keras2pmml import keras2pmml
    from pyspark import SparkContext,SparkConf
    from pyspark.mllib.linalg import Matrix, Vector
    from elephas.utils.rdd_utils import to_simple_rdd,to_labeled_point
    from elephas import optimizers as elephas_optimizers
    from elephas.spark_model import SparkModel
    from CommonFunctions import DataRead,PMMLGenaration,ModelSave,LoadModel,UpdateDictionary,ModelInfo
    from keras import regularizers
    from sklearn.metrics import r2_score
    from keras.optimizers import SGD
    #from keras.models import model_from_config
    #from keras.utils.generic_utils import get_from_module

    class SNNReg:

    def train(self,sc,xml,data,hdfs_path):
    ## Variable initialization ##
    hActivation = xml.HiddenActivation
    #print hActivation
    nodeList = map(int,(xml.NodeList.split(“,”)))
    #print nodeList
    Accuracy = xml.Accuracy
    #print Accuracy
    lossFn = xml.LossFunction
    #print nodeList
    optimi = xml.Optimizer
    #print optimi
    hCount = int(xml.HiddenNodeCount)
    #print hCount
    inputDim = int(xml.InputDimension)
    opNodes = int(xml.OutputNodes)
    print (‘opNodes’,opNodes)
    nbEpoch = int(xml.NumEpoch)
    batchSize = int(xml.BatchSize)
    #settings default paramerters if not the provided the values for it
    if hActivation==””:
    if lossFn==””:
    if optimi==””:
    if Accuracy==””:
    print “now going to read ”
    X,Y = DataRead(self,xml.NeuralNetCategory,xml.NeuralNetType,data,xml.ColumnNames,xml.TargetColumnName,xml.InputDimension)

    # Creating a sequential model for simple neural network

    model.add(Dense(nodeList[0],input_dim = inputDim,init=’normal’,activation =hActivation ))

    # Creating hidden model nodes based on the hidden layers count
    if hCount > 1:
    for x in range(1,hCount):
    model.add(Dense(nodeList[x],init=’normal’,activation = hActivation))
    # Compile model
    print “model complilation stage”
    model.compile(loss = lossFn, optimizer=optimi)
    rdd =to_simple_rdd(sc,X,Y)
    print rdd
    #adam= elephas_optimizers.Adam()
    adam = elephas_optimizers.Adam()
    #adagrad = elephas_optimizers.Adagrad()
    #adadelta = elephas_optimizers.Adadelta()
    #print (“type of rdd”,type(rdd))
    print “now going to create spark model using elphass”
    # Creating Spark elephas model from the spark model
    print(“no of workers”,int(sc._conf.get(‘’)))

    sparkModel = SparkModel(sc,

    # Train Spark model

    print “now it is going to run train fucntion”
    sparkModel.train(rdd,nb_epoch=nbEpoch, batch_size=batchSize)

    i am trying to implement regression in Neural networks usign elphas and keras in python in a distributed way,but while trianing the i am getting to much high loss values , what i have to do ,give me any suggestions for go further.

    • Jason Brownlee July 6, 2017 at 10:12 am #

      Sorry I cannot help with distributing a Keras model.

  53. Foad July 6, 2017 at 9:35 am #

    two small points:
    1. please mention in the text that it is required to have TensorFlow installed
    2. CSV, means comma separated file, but data in the file are not separated by commas. not a big deal though

  54. Timothy Yan July 6, 2017 at 1:28 pm #

    Thank you for the nice tutorial! In the post, you used “relu”, but I was wondering how to customize the activation function?

  55. Don July 8, 2017 at 10:09 am #

    Hi Jason,

    Thanks for the great tutorial!

    What are the advantages of the deep learning library Keras (with let’s say TensorFlow as the backend) over the sklearn neuron network function MLPRegressor? In both cases, the procedure (input) is very similar, where you have to decide which architecture, activation functions, and solver you want to use.


    • Jason Brownlee July 9, 2017 at 10:50 am #

      Speed of development and size of community.

      • Don July 10, 2017 at 7:43 am #

        Thanks for the quick reply!

        Can you please elaborate a little bit more?

        When you are writing speed of development, can you please give a few practical examples for when it matters or what exactly you mean? When you are writing size of community, do you mean that the Keras/TensorFlow community is larger than the sklearn one? If not, what do you mean?

        In addition, can you please add a few words on the epochs and batch_size parameters? Why is epochs used and not some tolerance, which makes more sense to me? Does it make sense that sometimes when I increase the the epocks value, the score decreases?

        Thanks a lot!

        • Jason Brownlee July 11, 2017 at 10:24 am #

          Yes, I believe it is easier/faster to develop models with Keras than other tools currently available.

          I believe the Keras community is active and this is important to having the library stay current and useful. Keras is complementary to sklearn, tensorflow and theano.

          One epoch is one pass through all training samples. One epoch is comprised of one or more batches. One batch involves a pass through one or more samples before updating the network weights.

  56. Adam July 9, 2017 at 9:35 am #

    I’m a little confused. If you define x as:
    X = dataset[:,0:13]

    then the last column in X is the same as Y. Shouldn’t X be:
    X = dataset[:,0:12]
    and then
    Y = dataset[:,13]

    If you define X to include the outputs, why wouldn’t it just set all the weights for dataset[0:12] to zero then perfectly fit the data since it already knows the answer?

    • Justtestityourselfnexttime July 12, 2017 at 12:55 am #

      > X = [0,1,2,3,4]
      > print(X[0:3])
      [0, 1, 2]

      End index is exclusive.

      • Jason Brownlee July 12, 2017 at 9:48 am #

        Yes. The more questions I get like this, the more I feel I need a post on basic numpy syntax.

  57. Nandu July 11, 2017 at 7:54 pm #

    What are methods to validate the regression model in keras?Please can you help in that

  58. Ambika July 11, 2017 at 7:58 pm #

    how can we recognize the keras regresssion model and classification model with code.

    • Jason Brownlee July 12, 2017 at 9:43 am #

      By the choice of activation function on the output layer and the number of nodes.

      Regression will use a linear activation, have one output and likely use a mse loss function.
      Classification will use a softmax, tanh or sigmoid activation function, have one node per class (or one node for binary classification) and use a log loss function.

  59. FrankLu July 13, 2017 at 11:24 am #

    Thanks for your tutotials and I find it helpful. However, I have a question.
    When I use checkpoint callbacks in, it save a best trained weights, as a hd5 file.
    But I cant load this pre trained weights, caz estimator does not have the method of load_weights which is one in keras models. What can I do, thank you!!!

  60. Paul July 13, 2017 at 3:20 pm #

    Hello, Jason
    Thanks for the amazing tutorial! I learned alot from your blogs.
    I have a question about np.random.seed.
    What does the ‘np.random.seed’ actually do?
    You explained that it is for reproducibility above but I didn’t understand what it means..

    Thank you and hope you have a great one!


  61. nandu July 14, 2017 at 7:37 pm #

    Could you suggest the hidden activation functions for regression Neural networks other than relu.

    • Jason Brownlee July 15, 2017 at 9:41 am #

      Yes, sigmoid and tanh were used for decades before relu came along.

      • Nandini July 18, 2017 at 3:04 pm #

        I have given tanh to regression model usign keras,i am not getting good results,you said tanh also supported for regression,please give me any suggesstions.

        • Jason Brownlee July 18, 2017 at 5:02 pm #

          No, use a “linear” activation on the output layer for regression problems.

          • NANDINI July 19, 2017 at 3:14 pm #

            yeah of course , for output layer i have given the linear activation function only,but i am talking about hidden activation function i have given relu,if i would give tanh i am not getting good results.

          • Jason Brownlee July 19, 2017 at 4:10 pm #

            I’m sorry to hear that. I have a list of ideas to try in this post:

  62. Wayne Tobin July 15, 2017 at 7:17 pm #

    Hi Jason,

    Now that Keras and Tensorflow are available in R (RStudio) do you have any plans on doing above tutorial in R? I’ve got you book where you process the Boston Housing dataset using cubist and would love to see/run a direct comparison to get a sense of what improvement is possible.

    • Jason Brownlee July 16, 2017 at 7:58 am #

      Perhaps in the future, thanks for the suggestion Wayne.

  63. Ronald Levine July 19, 2017 at 5:58 am #

    My problem is that everything is hidden in the Pipeline object. How do I pull out the components, such as the model predict method, then to pull out the predicted values to plot against the input values.

    • Jason Brownlee July 19, 2017 at 8:30 am #

      Don’t use the Pipeline and pass data between the objects manually.

  64. nandu July 19, 2017 at 4:32 pm #

    I have train the keras model, i need the logic for model.predict() ,how we are predicting the the values on test data,i have logic for predict_classes,but i don’t have logic for predict ,Please can you tell me logic for model.predict.

    def predict_proba(self, X):
    a = X
    for i in range(self._num_layers):
    g = self._activations[i]
    W = self._weights[i]
    b = self._biases[i]
    a = g(, W.T) + b)
    return a

    def predict_classes(self, X):
    probs = self.predict_proba(X)
    return np.argmax(probs,1)

    predict_classes for classification.
    i need predict logic for regression.

    • Jason Brownlee July 20, 2017 at 6:17 am #

      You can use the predict() function:

  65. Mustafa July 26, 2017 at 7:05 am #

    Hi Jason,

    Thanks for the blog. I am trying to use the example for my case where I try to build a model and evaluate it for audio data. I use only spectrum data. Original data are in .wav format.

    However, I am getting an error

    “TypeError: can’t pickle NotImplementedType objects”

    in line results = cross_val_score(pipeline, X, Y, cv=kfold)
    My data is very small, only 5 samples.

    Do you have any idea for this error?


    • Jason Brownlee July 26, 2017 at 8:03 am #

      I would recommend talking to the people from which you got the pickled data.

    • Heringsalat August 9, 2017 at 12:15 am #

      Hello Mustafa,

      how is your pipeline initialized/defined? I had the exactly same error message at a line where I used cross_val_score with the KerasRegressor estimator. When you use something like

      estimator = KerasRegressor(build_fn=myModel, nb_epoch=100, batch_size=5, verbose=0)

      with a Keras-Model “myModel” and NOT with a function called “myModel” to return the model after compiling it like in the tutorial at the beginning you should get the same Pickle error. You can reproduce it with the tutorial code via myModel=baseline_model().

      I hope this is helpful…

      Best regards,

  66. ambika July 26, 2017 at 9:20 pm #

    why we are caluculating error rather than accuracy in regression problem,why accuracy does not make sence regression ,Please can you explain it.

    • Jason Brownlee July 27, 2017 at 8:04 am #

      We are not using accuracy. We are calculating error, specifically mean squared error (MSE).

      • ambika July 27, 2017 at 2:39 pm #

        why we are calculating mse rather than accuracy sir?

        • Jason Brownlee July 28, 2017 at 8:27 am #

          Because it is a regression problem and accuracy is only for classification problems.

  67. ambika July 28, 2017 at 3:38 pm #

    while i am calulating loss and mse i am getting same values for regression,is that loss and mse are same in regression or different,if it is different ,how it is different,please can you explain it.

    • Jason Brownlee July 29, 2017 at 8:05 am #

      Loss is the objective minimized by the network. If you use mse as the loss, then you will not need to track mse as a metric as well. They will be the same thing.

  68. Masuk July 28, 2017 at 9:28 pm #

    I am trying to train a ppg signal to estimate the heart rate i.e BPM.
    Do you think it is appropriate to follow this structure?
    If not please kindly help me by suggesting better methods.

    Thank You!

    • Jason Brownlee July 29, 2017 at 8:12 am #

      Perhaps. Also consider a time series formulation. Evaluate every framing you can think of.

  69. Paul August 7, 2017 at 1:37 pm #

    Hi Jason! 🙂
    Thank you for great post! 🙂 I have a question about StandardScaler and Normalization.
    What is difference between them? Also, can I use Min Max scaler instead of StandardScaler?

    Thanks in advance.


    • Jason Brownlee August 8, 2017 at 7:42 am #

      Normalization via the MinMaxScaler scales data between 0-1. Standardization via the StandardScaler subtracts the mean to give the distribution a mean of 0 and a standard deviation of 1.

      Standardization is good for Gaussian distributions, normalization is good otherwise.

      • Paul August 9, 2017 at 1:57 pm #

        Ah ha! Thanks for replying me back! 🙂
        I’ll try MinMaxScaler()


        • Jason Brownlee August 10, 2017 at 6:45 am #

          Good luck Paul.

          • Dalila March 9, 2021 at 10:44 pm #

            Hi Jason,
            Thanks for the tutorial it’s really interesting.
            Could you explain a bit further why you used Standardization and not Normalization please ?
            Do the features have a Gaussian distribution ? How do you know if the features have a Gaussian distribution ?
            I am currently working on house prices prediction on the Ames Housing dataset : do you recommand that I use Standardization or Normalization ?

          • Jason Brownlee March 10, 2021 at 4:40 am #

            You’re welcome.

            Yes, see this:

            Features do not have to have a Gaussian distribution, it is a good idea to only use standardisation if the data is gaussian though.

  70. Vipul September 3, 2017 at 7:41 pm #

    Hi Jason,

    Again, its very informative blog. I am implementing keras in R but I couldn´t find keras regressor to fit the model. Do you have workaround for this or could you please suggest what can be used as an alternative?

  71. James September 6, 2017 at 2:49 am #

    Hey Jason – Thanks for the post.

    I’d love to hear about some other regression models Keras offers and your thoughts on their use-cases.


    • Jason Brownlee September 7, 2017 at 12:46 pm #

      What do you mean James? Do you have an example?

  72. Mohit Jain September 16, 2017 at 5:34 pm #

    Hi David,

    Thanks of the tutorials. These have been very helpful both for the implementation side to getting an insight about the possibilities of machine learning in various fields.

    I was trying to run the code in section 2 and came across the following error:

    Traceback (most recent call last):
    File “”, line 48, in
    results = cross_val_score(estimator, X, Y, cv=kfold)
    File “/home/mjennet/anaconda2/lib/python2.7/site-packages/sklearn/model_selection/”, line 321, in cross_val_score
    File “/home/mjennet/anaconda2/lib/python2.7/site-packages/sklearn/model_selection/”, line 195, in cross_validate
    for train, test in cv.split(X, y, groups))
    File “/home/mjennet/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/”, line 779, in __call__
    while self.dispatch_one_batch(iterator):
    File “/home/mjennet/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/”, line 625, in dispatch_one_batch
    File “/home/mjennet/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/”, line 588, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
    File “/home/mjennet/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/”, line 111, in apply_async
    result = ImmediateResult(func)
    File “/home/mjennet/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/”, line 332, in __init__
    self.results = batch()
    File “/home/mjennet/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/”, line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
    File “/home/mjennet/anaconda2/lib/python2.7/site-packages/sklearn/model_selection/”, line 437, in _fit_and_score, y_train, **fit_params)
    File “/home/mjennet/anaconda2/lib/python2.7/site-packages/keras/wrappers/”, line 137, in fit
    self.model = self.build_fn(**self.filter_sk_params(self.build_fn))
    File “”, line 35, in baseline_model
    model.add(Dense(13, input_dim=13, kernel_initializer=’normal’, activation=’relu’))
    File “/home/mjennet/anaconda2/lib/python2.7/site-packages/keras/layers/”, line 686, in __init__
    super(Dense, self).__init__(**kwargs)
    File “/home/mjennet/anaconda2/lib/python2.7/site-packages/keras/engine/”, line 307, in __init__
    assert kwarg in allowed_kwargs, ‘Keyword argument not understood: ‘ + kwarg
    AssertionError: Keyword argument not understood: kernel_initializer

    I tried to get more insight about the problem and came across the problem described by Mr. Partha which seems to be similar as mine and hence checked the version of Keras. The version of keras that I am using is 1.1.1 with tensorflow(1.2.1) as backend. Can you help me with this?

    • Jason Brownlee September 17, 2017 at 5:26 am #

      My name is Jason.

      It looks like you need to update to Keras 2.

      • Mohit Jain November 26, 2017 at 8:30 pm #

        Apologies Mr. Jason

        I tried to upgrade Keras as well as other dependencies but again the same error pops up. I am currently working on Keras 2.1.1 with Numpy 1.13.3 and scipy 1.0.0

        • Jason Brownlee November 27, 2017 at 5:49 am #

          I am surprised as your error suggests an older version of Keras.

          I have a good list of places to get help with Keras here that you could try:

          • Bharath December 11, 2017 at 5:24 pm #

            Hi Jason,

            Thanks for the example. I get the same error too. Keras/Theano/sklearn: 2.1.2/0.90/0.19.1. Mohit, were you able to debug it?

          • Jason Brownlee December 12, 2017 at 5:23 am #

            Sorry to hear that, I normally think it would be a version issue, but you look up to date.

            I don’t have any good ideas, let me know if you learn more?

  73. asyraf September 26, 2017 at 4:58 pm #

    Hello Jason,

    I used r2 metric on above code and figured that wider model has better score than deeper model. does this mean wider model is better than deeper? is r2 score a good metric to rate a regression model in this case?

  74. tim September 28, 2017 at 12:00 pm #

    Hello Jason, I am using your code from section

    import numpy
    import pandas

    to this section

    kfold = KFold(n_splits=10, random_state=seed)
    results = cross_val_score(estimator, X, Y, cv=kfold)
    print(“Results: %.2f (%.2f) MSE” % (results.mean(), results.std()))

    but mean and std values are always higher than your result

    Results: 60.40 (41.96) MSE

    where is the problem??

  75. tim September 29, 2017 at 1:32 pm #

    The result of cross_val_score

    [ 10.79553125, 7.68724794, 11.24587975, 27.62757629,
    10.6425943 , 8.12384602, 4.93369368, 91.03362441,
    13.37441713, 21.56249909]

    are “Mean square error” ?? or something else??

    If they are MSE,
    can I say this prediction model is very bad??

    • Jason Brownlee September 30, 2017 at 7:35 am #

      The score are MSE. You could take the sqrt to convert them to RMSE.
      How good a score is, depends on the skill of a baseline model (e.g. they’re relative) on the problem and domain knowledge (e.g. their interpretation).

  76. Gabby October 5, 2017 at 7:17 pm #

    I am having issues with cross_val_score. Whenever I run the code, I get the error:

    #TypeError: The added layer must be an instance of class Layer. Found:


    Thank you!

    The full output:
    Traceback (most recent call last):
    File “Y:\Tutorials\Keras_Regression_Tutorial\Keras_Regression_Tutorial\”, line 39, in
    results = cross_val_score(estimator, X, Y, cv=kfold)
    File “C:\Users\Gabby\y35\lib\site-packages\sklearn\model_selection\”, line 321, in cross_val_score
    File “C:\Users\Gabby\y35\lib\site-packages\sklearn\model_selection\”, line 195, in cross_validate
    for train, test in cv.split(X, y, groups))
    File “C:\Users\Gabby\y35\lib\site-packages\sklearn\externals\joblib\”, line 779, in __call__
    while self.dispatch_one_batch(iterator):
    File “C:\Users\Gabby\y35\lib\site-packages\sklearn\externals\joblib\”, line 625, in dispatch_one_batch
    File “C:\Users\Gabby\y35\lib\site-packages\sklearn\externals\joblib\”, line 588, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
    File “C:\Users\Gabby\y35\lib\site-packages\sklearn\externals\joblib\”, line 111, in apply_async
    result = ImmediateResult(func)
    File “C:\Users\Gabby\y35\lib\site-packages\sklearn\externals\joblib\”, line 332, in __init__
    self.results = batch()
    File “C:\Users\Gabby\y35\lib\site-packages\sklearn\externals\joblib\”, line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
    File “C:\Users\Gabby\y35\lib\site-packages\sklearn\externals\joblib\”, line 131, in
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
    File “C:\Users\Gabby\y35\lib\site-packages\sklearn\model_selection\”, line 437, in _fit_and_score, y_train, **fit_params)
    File “C:\Users\Gabby\y35\lib\site-packages\tensorflow\contrib\keras\python\keras\wrappers\”, line 157, in fit
    self.model = self.build_fn(**self.filter_sk_params(self.build_fn))
    File “Y:\Tutorials\Keras_Regression_Tutorial\Keras_Regression_Tutorial\”, line 25, in baseline_model
    model.add(Dense(13, input_dim=13, kernel_initializer=’normal’, activation=’relu’))
    File “C:\Users\Gabby\y35\lib\site-packages\tensorflow\contrib\keras\python\keras\”, line 460, in add
    ‘Found: ‘ + str(layer))
    TypeError: The added layer must be an instance of class Layer. Found:
    Press any key to continue . . .

    • Jason Brownlee October 6, 2017 at 5:36 am #

      Sorry, I have not seen this error before.

      Confirm that you Python libraries including Keras and sklearn are up to date.

      Confirm that you copied all of the code from the example.

  77. Piotr October 11, 2017 at 1:42 am #

    Hello! Thanks for really great tutorial. It’s help a lot!

    But I have a question: Do you know how Can I use StandarsScaler in a pipeline, when I deal with CNN and 2D images? My X data has shape e.g. (39, 256, 256, 1).

    It works perfectly without StandardScaler, but with StandardScaler I’ve got following error:
    ValueError: Found array with dim 4. StandardScaler expected <= 2.

    Do you know how can I convert my input data and where in order to work with CNN, 2D images and StandardScaler?

    • Jason Brownlee October 11, 2017 at 7:57 am #

      I would recommend using the built-in data scaling features for images built into Keras:

      • Piotr October 11, 2017 at 7:24 pm #

        Thanks a lot, for quick response! It’s good to know that Keras has already ImageDataGenerator for augmenting images.

        I have one more question, do you know how can I rescale back outputs from NN to original scale? I mean if ImageDataGenerator has something similar to StandardScaler.inverse_transform() from sci-kit learn?

        • Jason Brownlee October 12, 2017 at 5:27 am #

          I’m not sure, I don’t think so.

          If the image is an input, why would you need to reverse the operation?

          • Piotr October 12, 2017 at 9:50 pm #

            In my case output of my network is based on actual values of pixels. I think, that in my case I will simply omit standardization. But thank you for mentioning ImageDataGenerator, it will help me much in other cases 🙂

  78. Tony October 24, 2017 at 3:35 am #

    Thank you very much for your post
    I use the data you uploaded.
    However, when I print the MSE, it noticed that : Found input variables with inconsistent numbers of sample [506, 1]. It is the final sample in the data.
    please help me

    Thank you very much

    • Jason Brownlee October 24, 2017 at 5:37 am #

      Sorry, I don’t follow, can you restate the issue please?

  79. Tony October 24, 2017 at 11:57 am #

    At the end of step 2, evaluate the baseline model, I could’t print because that error:
    ” Found input variables with inconsistent numbers of sample [506, 1]. It is the final sample in the data.”

    • Jason Brownlee October 24, 2017 at 3:59 pm #

      Perhaps double check that you copied all of the code exactly?

  80. Tony October 24, 2017 at 5:27 pm #

    My mistake. I splitted data into columns already in Excel by “Text to Columns” function.

    Thank you so much 🙂

  81. Harry October 25, 2017 at 2:22 am #

    Thank you so much for these articles. Two questions:

    1) You state “a mean squared error loss function is [used]….This will be the same metric that we will use to evaluate the performance of the model.” I see where ‘mean_squared_error’ is passed as the ‘loss’, but there no ‘metrics=[…]’ arg passed. Does Keras simply use the ‘loss’ function as the metric, if no metric is specified?

    2) I recreated this experiment and added the arg “shuffle=True” to the KFold function. This appears to improve performance down to 13.52 (6.99) MSE (wider_model). Any thoughts on this potential optimization? It seemed almost “too good to be true”.


  82. Tony October 26, 2017 at 3:34 am #

    The NN model you created contains 1 output.

    I have a problem with more than 1 output.
    I want to apply this code by modifying it. Is it ok ?
    Can you suggest some solutions or notice to solve the problem?

    Thank you very much

    • Jason Brownlee October 26, 2017 at 5:31 am #

      Yes, you can change the number of outputs.

      • Soni April 1, 2018 at 7:37 am #

        Hello Jason,
        For multiple outputs, do I still compile the model using the “model.compile(loss=’mean_squared_error’, optimizer=’adam’)”? How does the code compute a mean squared error in case of multiple outputs?

        • Jason Brownlee April 2, 2018 at 5:17 am #


          You can choose to calculate error for each output time step or for all time steps together. I cover this a little here:

          • Soni April 2, 2018 at 11:31 pm #

            How about if the outputs at each time step have different units (or in case or a simple dense feedforward network there are multiple outputs at the end, with each output having different units of measurement?). In that case, with different units and possible different orders of magnitude ranges for the outputs, it might not be sensible to take a simple RMSE etc. What would you suggest then to combine such different outputs together into a single loss function?

          • Jason Brownlee April 3, 2018 at 6:35 am #

            I would recommend rescaling outputs to something sensible (e.g. 0-1) before fitting the model.

  83. Brence October 26, 2017 at 6:26 pm #

    Hey Jason,

    I’m getting an error when running this code. I have karas 2, and scikit learn .17 installed. I keep getting this error:

    Connected to pydev debugger (build 172.3968.37)
    Using TensorFlow backend.
    Traceback (most recent call last):
    File “/home/b/pycharm-community-2017.2.3/helpers/pydev/”, line 1599, in
    globals =[‘file’], None, None, is_module)
    File “/home/b/pycharm-community-2017.2.3/helpers/pydev/”, line 1026, in run
    pydev_imports.execfile(file, globals, locals) # execute the script
    File “/home/b/PycharmProjects/ANN1a/ANN2-Keras1a”, line 6, in
    from sklearn.model_selection import cross_val_score
    ImportError: No module named model_selection
    Backend TkAgg is interactive backend. Turning interactive mode on.

    Is it saying I have no module for Sklearn because I only have .17 instead of the current version which i think is .19? I’m having a lot of trouble updating my scikit-learn package.

    • Jason Brownlee October 27, 2017 at 5:18 am #

      You will need to update your sklearn to 0.18 or higher.

  84. Brence October 27, 2017 at 11:06 pm #

    Hey Jason I need some help with this error message. I’m not sure whats going on with it.

    ‘ValueError: could not convert string to float: Close’

    I think it may be talking about one of my columns in my dataset.csv file which is named ‘Close’.

    Here is the code:

    import numpy
    import pandas
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.wrappers.scikit_learn import KerasRegressor
    from sklearn.model_selection import cross_val_score
    from sklearn.model_selection import KFold
    from sklearn.preprocessing import StandardScaler
    from sklearn.pipeline import Pipeline

    # load dataset
    dataframe = pandas.read_csv(“PTNprice.csv”, delim_whitespace=True, header=None, usecols=[1,2,3,4])
    dataset = dataframe.values
    # split into input (X) and output (Y) variables
    X = dataset[:,0:4]
    Y = dataset[:,1]

    # define the model
    def larger_model():
    # create model
    model = Sequential()
    model.add(Dense(100, input_dim=4, kernel_initializer=’normal’, activation=’relu’))
    model.add(Dense(50, kernel_initializer=’normal’, activation=’relu’))
    model.add(Dense(1, kernel_initializer=’normal’))
    # Compile model
    model.compile(loss=’mean_squared_error’, optimizer=’adam’)
    return model

    # fix random seed for reproducibility
    seed = 7

    # evaluate model with standardized dataset
    estimators = []
    estimators.append((‘standardize’, StandardScaler()))
    estimators.append((‘mlp’, KerasRegressor(build_fn=larger_model, epochs=50, batch_size=5, verbose=0)))
    pipeline = Pipeline(estimators)
    kfold = KFold(n_splits=10, random_state=seed)
    results = cross_val_score(pipeline, X, Y, cv=kfold)
    print(“Standardized: %.2f (%.2f) MSE” % (results.mean(), results.std()))

    • Jason Brownlee October 28, 2017 at 5:14 am #

      Are you using the code and the data from the tutorial?

      Did you copy it all exactly, including indenting?

      • Brence October 28, 2017 at 6:27 am #


        No my code is modified to try and handle a new data text. I felt the data was very similar to the original dataset. I actually got it to work with no errors. I just changed header=none to header=1


        # load dataset
        dataframe = pandas.read_csv(“PTNprice.csv”, delim_whitespace=True, header=1, usecols=[1,2,3,4])
        dataset = dataframe.values
        # split into input (X) and output (Y) variables
        X = dataset[:,0:4]
        Y = dataset[:,1]

        It took longer then expected for the script to finish. I’m trying to get it to make a prediction as well, but my output is less the satisfactory. Here is what the out gave me. What does this mean do you think?


        Larger: 0.00 (0.00) MSE
        [ 0.78021598 0.79241288 0.81000006 …, 3.64232779 3.59621549

        My data is just stock prices from a 10 year period example: 0.75674 0.9655 3.753 1.0293
        columns set up like this.

  85. Duccio November 4, 2017 at 4:41 am #

    Hi Jason,

    thank you so much, these courses are great, and very helpful !

    I have written the code, following yours, but with the only difference that I have not used Pipeline, and take care of the scaling separately

    seed = 7

    X = (X – X.mean(axis=0))/X.std(axis=0)

    estimator = KerasRegressor(build_fn=baseline_model, nb_epoch = 100, batch_size=5,verbose=0)

    kfold = KFold(n_splits=10,random_state=seed)

    results = cross_val_score(estimator, X, y, cv=kfold)

    For some reason, I don’t understand, your method constantly produces better results. Any idea why it performs better ?

    Thanks a lot

  86. Brence November 7, 2017 at 7:13 am #


    Just wanted to stop by and say thanks again!

    I’ve been tweaking my learning models for the past three days and this is what I got

    Larger: 0.12 (0.36) MSE

    That’s pretty good right? I’m using a different data set from you, but it is very similar in structure. I was having a underfitting problem with my model for a while and I was getting like 500%(500%) error. I realized that I needed to make it a bit more complex. So I tripled my features, made my layers deeper and wider at the same time, as well as up the amount of epochs to 50000 and batch size to 10000. I also changed the number of splits from 10 to 25.

    Question: Will I be able to get a smaller error% or is “Larger: 0.12 (0.36) MSE” about the lowest I can expect?

    thanks again Jason,


    • Jason Brownlee November 7, 2017 at 9:56 am #

      Nice work.

      I’m not sure of the limits of this problem, push as much as you have time/interest. In practice “good” is relative to what you have achieved previously. This is a good lesson in applied ML!

  87. Tony November 10, 2017 at 7:22 pm #

    Dear Sir.

    I applied your code and used it to predict successfully.

    As same as my last question. I want to add 1 more output: the age of house: has built in 5 years, 7 years, 10 years….. for instance. The price and age are independent.
    This is not a classify problem as I know.
    So, would you suggest a code or what should I do next to solve the problem, please ?

    Thank you very much

  88. Brence November 14, 2017 at 11:20 am #

    Is it possible to do a recursive multi step forecast prediction with this regression model?

    I’m not sure how this code would fit into this.

    prediction(t+1) = model(obs(t-1), obs(t-2), …, obs(t-n))
    prediction(t+2) = model(prediction(t+1), obs(t-1), …, obs(t-n))

  89. Toby November 18, 2017 at 3:45 am #

    Hi Jason,
    Thank you for your tutorial. I just tried running your sample code for step 2 but unfortunately obtained a negative MSE which obviously does not make sense.

    Results: -57.82 (42.31) MSE

    Any ideas?

    The code is exactly the same with minor exception that I had to changed


    • Jason Brownlee November 18, 2017 at 10:24 am #

      Yes, sklearn inverts minimizing scores to make them maximizing for optimization. Just take the absolute value.

      • Toby November 22, 2017 at 7:04 am #

        Great thanks. Yes I found this out after I posed the question.

  90. Mohit Jain November 27, 2017 at 2:15 am #

    Hi Jason,
    Thank you this amazing tutorial. I just wanted to know what are the ways such that we can predict the output of neural network for some specific values of X and compare the performance by plotting the predicted and actual value



    • Jason Brownlee November 27, 2017 at 5:51 am #

      It really depends on the application as to what and how to plot.

  91. Moritz December 1, 2017 at 8:12 pm #

    Hi Jason,
    first: thanks for this and all your other amazing tutorials. Really helps a lot as a beginner to get actual useful advice.
    However, I reached a point where I’m looking for further advice – hope you can help me out!
    I understand the concept of regression and MSE – in my case, I try to predict two values based on various other parameters. It’s really not complicated and the correlation between the values is pretty clear, so I think this shouldn’t be a problem.
    Now when having a value predicted, I don’t want to know the MSE but I’d rather know, if the prediction is within a certain range from the original value.
    ‘accepted’ range: y +/- 0,1
    y = 1
    y^ = 1,08
    y – y^ = | 0,08| –> OK, because it’s within y +/- 0,1.

    Is there a way to do this in Python or KERAS? I just started working with it, so any advice would be helpful. Thanks!

  92. chenys December 12, 2017 at 7:29 pm #

    Hi Jason,
    Thank you for your tutorial.
    I want to know

    if a regression problem dataset a 10000 feature. the input_dim is so big …….but all the feature are meaningful(it’s an procedure data) and can’t be delete.

    how to change this example to handle my problem,and what should i care,is there any trick?

    • Jason Brownlee December 13, 2017 at 5:30 am #

      Perhaps you can use a projection such as PCA? SVD? or others?

  93. Steve December 13, 2017 at 5:32 am #

    Hi Jason – Thank you for all these tutorials. These are awesome!!

    Since the NN architecture is black box. Is there a way to access hidden layer data for debugging? When I run the regression code (from above) I get slightly different numbers. Thx again!

    • Jason Brownlee December 13, 2017 at 5:47 am #

      You will get different numbers every time you run the same algorithm on the same data Steve. This is a feature, not a bug:

      You can access the layers on the model as an array: model.layers I think.

      • Igor May 17, 2019 at 3:29 am #

        Hi Jason,

        when calling model.predict() the predicted value has no sense in terms of house prices.

        for instance line 15 of House pricing dataset

        0.63796 0.00 8.140 0 0.5380 6.0960 84.50 4.4619 4 307.0 21.00 380.02 10.26 18.20
        last value (18.20) is house price in 1000$

        Xnew= array([[ 0.63796, 0.00, 8.140, 0, 0.5380, 6.0960, 84.50, 4.4619, 4, 307.0, 21.00, 380.02, 10.26]])

        Out[114]: array([[-0.09053693]], dtype=float32)

        what does -0.09053693 mean?

        Could you please amend your code with full code of predict function.

        • Jason Brownlee May 17, 2019 at 5:57 am #

          Perhaps the example you ran scaled the data prior to modeling, if so, you can invert the scaling transform on the prediction to return to original units.

  94. Brence December 28, 2017 at 11:36 am #

    Hey Jason,

    Is it possible to get an prediction output for each column used in the dataset? Like for example the dataset was made up of
    12 1 22 45
    2 34 55 8 like this. Could I get it to give me four output numbers, one for each column in the dataset?

    • Jason Brownlee December 28, 2017 at 2:11 pm #

      Yes, you can call model.predict()

      • Brence December 28, 2017 at 10:17 pm #

        but how do I predict for more then one column in the dataset? My dataset has 6 of them, and my output always has just 5 columns. Which leads me to believe that its just predicting for one column instead of all 6. Could I accomplish this by setting the output layer to have more then one neuron?

        • Jason Brownlee December 29, 2017 at 5:22 am #

          You could configure the model to predict a vector via the number of neurons in the output layer.

          You could configure the model output one column at a time via an encoder-decoder model.

          I have examples of each on the blog.

          • Brence December 29, 2017 at 9:00 am #

            That is exactly what I was looking for. I found your examples on the blog. Thank you so much Jason.

          • Jason Brownlee December 29, 2017 at 2:35 pm #

            Glad to hear it.

  95. Jack January 2, 2018 at 6:48 pm #

    Hi Jason,
    Thank you for your tutorial! I’m not a programmer or anything, in fact, I’ve never wriiten a line of code my entire life. But I find your tutorial very helpful.
    Recently I came acoss a regression problem and I tried to solve it using deep learning. So I followed this article and step by step I got Keras up and running and got a result. The problem is I don’t know how to tune the neural network and optimize it. The result I got is far from satisfactory. Do I have to adjust the parameter of the model one by one and see how it goes or is there a quicker way to optimize the neural network?
    Also I see there’s a mini-course here, and I tried so sign up for it but I didn’t get the email. Maybe because I’m from China or anything, I don’t know. Is there any crash course I can get? cause I know nothing about Python yet…

  96. fatma January 3, 2018 at 7:26 pm #

    Hello, I need to ask for this line X = dataset[:,0:13], as I can see from the data set it contains 14 columns (0 to 13) and the last column is the labels column then this line should be X = dataset [:,0:12]. is it correct or I’m wrong?

  97. Tanya January 4, 2018 at 7:43 am #

    Hi Jason,

    I am trying to apply the code in this tutorial to forecast my time series data. But since the beginning when I am trying to split the data into X and Y, I am getting an error “TypeError: unhashable type: ‘slice'”. Unfortunately I cannot find the source of it.
    Can you help me?
    Thanks in advance!

    runfile(‘D:/LOCAL_DROPBOX/MasterArbeit_Sammlung_V01/Python/MasterArbeit/ARIMA/Test/’, wdir=’D:/LOCAL_DROPBOX/MasterArbeit_Sammlung_V01/Python/MasterArbeit/ARIMA/Test’)
    [[‘3,6’ ‘20,3’ ‘0’ …, 173 1136 0]
    [‘11,4’ ‘18,8’ ‘15,2’ …, 105 1676 0]
    [‘8,9’ ‘15,3’ ‘1,4’ …, 372 733 0]
    [‘-2,3’ ‘4,5’ ‘0’ …, 0 0 0]
    [‘0,2’ ‘7,9’ ‘0’ …, 0 0 0]
    [‘-3,5’ ‘4,4’ ‘0’ …, 0 0 0]]
    Traceback (most recent call last):

    File “”, line 1, in
    runfile(‘D:/LOCAL_DROPBOX/MasterArbeit_Sammlung_V01/Python/MasterArbeit/ARIMA/Test/’, wdir=’D:/LOCAL_DROPBOX/MasterArbeit_Sammlung_V01/Python/MasterArbeit/ARIMA/Test’)

    File “C:\Users\Tanya\Anaconda3\lib\site-packages\spyder\utils\site\”, line 710, in runfile
    execfile(filename, namespace)

    File “C:\Users\Tanya\Anaconda3\lib\site-packages\spyder\utils\site\”, line 101, in execfile
    exec(compile(, filename, ‘exec’), namespace)

    File “D:/LOCAL_DROPBOX/MasterArbeit_Sammlung_V01/Python/MasterArbeit/ARIMA/Test/”, line 25, in
    X = dataset[:,0:8]

    File “C:\Users\Tanya\Anaconda3\lib\site-packages\pandas\core\”, line 2139, in __getitem__
    return self._getitem_column(key)

    File “C:\Users\Tanya\Anaconda3\lib\site-packages\pandas\core\”, line 2146, in _getitem_column
    return self._get_item_cache(key)

    File “C:\Users\Tanya\Anaconda3\lib\site-packages\pandas\core\”, line 1840, in _get_item_cache
    res = cache.get(item)

    TypeError: unhashable type: ‘slice’

  98. Oscar Reyes January 9, 2018 at 3:56 am #


    Regarding to “A further extension of this section would be to similarly apply a rescaling to the output variable such as normalizing it to the range of 0-1”.

    I do not know how can I get that the StandarScaler object also apply the transformation to the ouput variable Y, instead of applying it only over X . I did the following

    results = cross_val_score(pipeline, preprocessing.scale(X), preprocessing.scale(Y), cv=kfold)

    However, in this way the preprocessing step is made prior to the kfold cross validation, and not in each fold execution as in your previous example.

    • Jason Brownlee January 9, 2018 at 5:35 am #

      Yes, the data preparation would have to happen prior to cross validation.

  99. fatma January 11, 2018 at 7:59 pm #

    How we can draw the relation between the expected values and the prediction one

    • Jason Brownlee January 12, 2018 at 5:53 am #

      I would recommend using matplotlib.

      • fatma November 16, 2018 at 8:02 pm #

        One more question, How can I use k-fold cross validation with CNN model?

  100. Reed Guo January 18, 2018 at 12:38 am #

    Hi, Jason
    When I search some tutorials on google, if your posts appears, I always check your blog first.

    Thanks very much.

  101. Reed Guo January 18, 2018 at 2:35 am #

    Hi, Jason

    What is the activation function of the output layer? You didn’t write it.


  102. WonderingStranger January 25, 2018 at 3:25 am #

    Hello Jason,

    I used your code, but get different results:

    Results: -114.64 (82.76) MSE

    Standardized: -29.57 (27.85) MSE

    Larger: -23.46 (27.29) MSE

    Wider: -22.91 (29.25) MSE

    Why are they negative?

    • Jason Brownlee January 25, 2018 at 5:57 am #

      Nice work. The negative results are caused by sklearn inverting the loss function. This is a relatively new thing.

      • WonderingStranger January 26, 2018 at 6:47 am #

        This is confusing. Results are so different!
        How to interpret this error in percentage? Is there a way?

        I have studied the Ng’s courses on deeplearning_dot_ai, but he only introduced classification problems.

        How to understand how good error is for the case of regression? Will there be any difference from you example for a vector-regression (output is a vector) problem?

        Thank you.

        • Jason Brownlee January 27, 2018 at 5:47 am #

          Yes, compare the model skill to a baseline model like a Zero Rule algorithm.

          Improves are relative, not absolute.

  103. Eddy February 5, 2018 at 1:58 pm #

    Hi Jason,

    How do you get predicted y values for plotting when using a pipeline and k-fold cv? Also, suppose you had a separate X_test, how would you predict y_hat from it? So, I am envisioning a scenario where you have a training set and a separate test set (as in Kaggle competitions). You build your pipeline and k-fold cv on the training set and predict on the test set. But, your training set is scaled as a part of the pipeline. How could you apply the same scaling on X_test?

  104. Mehmet Ali February 5, 2018 at 8:47 pm #

    Hi Jason;

    How do you design a Keras model that returns multiple outputs (lets say 4) instead of single output in regression problems?

    • Jason Brownlee February 6, 2018 at 9:14 am #

      You can output a vector with multiple units in the output layer.

      • Mehmet Ali February 7, 2018 at 12:48 am #

        Do you mean I should change the model design by editing last line before compiling from:

        model.add(Dense(1, kernel_initializer=’normal’))


        model.add(Dense(4, kernel_initializer=’normal’))


  105. au_ceng February 5, 2018 at 9:31 pm #

    I obtained similar results like WonderingStranger. I’m new to deep learning. So I did not understand what I need to do with your response. I would appreciate if you explain in more detail.

  106. Wert February 9, 2018 at 10:32 am #

    Hello, how do i save the weights. I checked your link for saving, but you are not using the pipeline method on that one.

    I tried kfold.save_weigths, but got an error

    • Jason Brownlee February 10, 2018 at 8:48 am #

      You might need to keep a reference to your model (somehow?) and use the Keras API to save the weights.

  107. joseph February 9, 2018 at 12:26 pm #

    Hi Jason,

    is there any way to input the standardized data into the lstm model (create_model). The reason is that due the input shape of lstm which only allow 3D..however, to do standardizing, it can only accept 2d shape. hope to get some comment from you.. thank you

  108. joseph February 10, 2018 at 12:06 pm #

    thanks jason for the response..I appreciate it

  109. Eric February 16, 2018 at 4:50 pm #

    Hi Jason,

    I am still new in this
    thank you for your explanation step by step

    I want to ask about the detail in housing.csv and how to predict the value
    for example we want to predict the last attribute of the dataset
    by using estimator.predict

    Thank you

    • Jason Brownlee February 17, 2018 at 8:40 am #

      You can use:

      yhat = model.predict(X)

      Does that help?

  110. Deniz Kılınç (Assoc.Prof.Dr) February 16, 2018 at 7:41 pm #

    Hi Jason,

    Thanks for the great tutorial. your site makes me younger 🙂

    Is there any way to print/export actual and predicted house prices.

    In addition, woud you please suggest a visualization way for R2?

    • Jason Brownlee February 17, 2018 at 8:43 am #


      You can make predictions as follows:

      yhat = model.predict(X)

  111. Swapnil Shankar February 23, 2018 at 4:14 pm #

    X = dataset[:,0:11]
    Y = dataset[:,11]
    Traceback (most recent call last):

    File “”, line 5, in
    Y = dataset[:,11]

    IndexError: index 11 is out of bounds for axis 1 with size 1

    Please help to resolve this issue.

    Thanks jason for this wonderful post.

    • Jason Brownlee February 24, 2018 at 9:09 am #

      Ensure you copy all of the code from the example.

  112. Hugh February 27, 2018 at 3:03 am #

    Hi Jason!
    Thanks for the great tutorial.

    I did all the examples above and then I tried to fit baseline_model by using
    “,Y, nb_epoch=50, batch_size=5)” this command, I got “AttributeError: ‘function’ object has no attribute ‘fit'” this error message. what’s the problem?
    I googled exact same message above but I didn’t get anything about error.

    • Jason Brownlee February 27, 2018 at 6:38 am #

      You called a function on a function. The variable for the model is called “model”. Call functions on that.

  113. Ben February 28, 2018 at 12:18 pm #

    Hi Jason, I’m having a problem, but I’m not sure why. This is a dataset with 7 columns (6 inputs and 1 output).



    Thanks and any help would be appreciated!

    • Jason Brownlee March 1, 2018 at 6:05 am #

      It looks like there is a problem with the shape of your data not matching the expectations of the model.

      Change the model or change the data.

  114. Kane March 6, 2018 at 4:02 am #

    Thanks, Jason, a good tutorial.

    But I have a question that we only specify one loss function ‘mse’ in the compile function, that means we could only see MSE in the result. Is there any way to see the multiple accuracies at the same time in the result? Thanks

  115. ggggb March 7, 2018 at 6:25 pm #

    Hi Jason,

    Thanks for the tutorial! I have one question: if you use StandardScaler for the dataset, isn’t this affecting the units ($) of the cross validation score (MSE)? Thanks.

    • Jason Brownlee March 8, 2018 at 6:21 am #

      Yes, we must invert the transform on the predictions prior to estimating model skill to ensure units are in the same scale as the original data.

      • ggggb March 9, 2018 at 8:04 am #

        But it looks like you’re not doing it but still mentioning square thousand dollars as units, am I missing something?

        • Jason Brownlee March 10, 2018 at 6:12 am #

          correct, I do not covert back original units (dollars), so instead I mention “squared dollars” e.g. $^2.

  116. Fatma March 7, 2018 at 8:47 pm #

    Hey Jason, I have the following two questions:

    How can we use the MAE instead of the MSE? and
    How can we compute the Spearman’s rank correlation coefficients?

    • Jason Brownlee March 8, 2018 at 6:23 am #

      You can specify the loss or the metric as ‘mae’.

      You can save the predictions can use scipy to calculate the spearmans correlation between your predictions and the expected outcomes.

  117. Fati March 9, 2018 at 11:13 pm #


    Thanks for your practical, useful and understandable blog posts.

    I used this post to evaluate my MLP model, but Can we use this method to evaluate LSTM as well?


  118. Sarah March 14, 2018 at 9:45 am #

    Thank you Jason for great posts,
    I have difficulty in understanding the MSE and MAE meaning. I cannot understand how to interpret this number? For this specific example what is the range of ‘mse’ or ‘mae’?

    Because I am working on a large dataset and I am getting mae like 400 to 800 and I cannot figure out what does it mean. Could you please help me?


    • Jason Brownlee March 14, 2018 at 3:08 pm #

      Good qusetion.

      You can take the square root of the MSE to return the units back to the same units of the variable used to make the prediction.

      The MAE is in the same units as the output variable.

      The error values can be interpreted in the context of the distribution of the output variable.

      You can determine if the skill of the model is good by comparing it to the error scores from a baseline method, such as predicting the average outcome from the training set for each prediction on the test set.

      Does that help?

      • Sarah March 15, 2018 at 11:03 am #

        So, ‘mse’ and ‘mae’ is not percentage and it can be any number (even very big depending on output variable), right?
        It means if we are predicting price of house and our output is like $1000, then mae equals to 100 means we have about a hundred dollar error in predicting the price. Did I get it correctly?

        Thank you again

        • Jason Brownlee March 15, 2018 at 2:49 pm #


          • Sarah March 15, 2018 at 2:57 pm #

            I really appreciate it.

          • Alfred March 29, 2018 at 3:00 am #

            Hi Jason and thank you a lot for the post.

            I have a question in addition to what Sarah asked: should I apply the square root also to “results.std()” to get a closer idea of the relationship between the error and the data?

            In the article you achieved a MSE=20 with std~22, but if we calculate the square root for MSE I somehow understand that we should also do that with the standard deviation, right?

            However, if that is the case, if we need to calculate the square root, wouldn’t the original value be the variance? In other words, is the “results.std()” in the next line actually the std or is it the variance?


          • Jason Brownlee March 29, 2018 at 6:38 am #

            No. Take the square root on the raw MSE values, then calculate summary stats like mean and standard deviations.

  119. Keith Freeman April 1, 2018 at 7:21 am #

    Note that nb_epoch has been deprecated in KerasRegressor, should use epochs now in all cases.

  120. Ayan Biswas April 3, 2018 at 8:40 am #

    Hi Jason,
    Great blog posts. Helped me a lot in my work. I have created a similar multi-layer model for learning and predicting from a shock physics dataset (11 input parameters and a time-series output).
    I was wondering how I could be able to get the uncertainty information as well as the predicted output from the estimators? Do you have a blog or piece of keras code that can get me started?

    thanks a lot.

    • Ayan Biswas April 3, 2018 at 8:59 am #

      Basically, my model looks like this:
      # define baseline model
      def baseline_model_full():
      # create model
      model = Sequential()
      model.add(Dense(numOfParams, input_dim=numOfParams, kernel_initializer=’normal’, activation=’relu’))
      model.add(Dense(900, kernel_initializer=’normal’))
      # Compile model
      model.compile(loss=’mean_squared_error’, optimizer=’adam’)
      return model

      I fit this with with training input and output data and then I provide it a new input for its prediction. I was wondering, if I can also get the uncertainty of the model for this prediction along with the predicted output.


    • Jason Brownlee April 3, 2018 at 12:14 pm #

      Good question. You could use predict_proba() to get a probabilistic output.

      Does that help?

      • Ayan Biswas April 4, 2018 at 2:47 am #

        Hi Jason,

        Thank you for the quick response! Yes, I was able to use predict_proba() to get probability values. I am noticing that, the probability values are rather small although the prediction quality is quite good.

        I had 200 test inputs of shape (200,11) and the predicted output was of shape (200,900). The output probability shape was also (200,900) and the maximum value of this prediction probability was only 0.024. So, any suggestions on how to interpret these probability values?

        Thanks again.

        • Jason Brownlee April 4, 2018 at 6:17 am #

          Perhaps the model is not confident.

          • Ayan Biswas April 4, 2018 at 7:14 am #

            After a closer look, I see that the predict() and predict_proba() are actually giving the same array as output; and it is the predictions, not the probabilities. Have you seen this?


          • Jason Brownlee April 5, 2018 at 5:42 am #

            I have not sorry, perhaps contact Keras support:

          • Ayan Biswas April 4, 2018 at 7:41 am #

            I looked at this and seems like both the functions are just the same

            I think this might be the reason why I am getting the same output. But, unlike some other comments over the internet that suggest that we should get the probability as the output for both the functions, I think I am getting the predictions in both the cases.

            Do you have any suggestions on this?

  121. Nicolas April 8, 2018 at 8:15 pm #

    Why are you using 50 epochs in some cases and 100 on others?
    That seems like the best explanation of why you find ‘wider’ (with 200 epochs) is better than ‘larger’ (with 50 epochs).
    And sure enough, I found ‘larger’ with 100 epochs beats ‘wider’ with 100 epochs:
    Larger(100 epochs): 22.28 (26.54) MSE

    • Jason Brownlee April 9, 2018 at 6:09 am #

      Yes, I was demonstrating how to be systematic with model config, not the best model for this problem.

  122. Paul April 12, 2018 at 11:18 am #


    I am trying to use CNN for signal processing.
    Wonder if it is possible? Could you point me to any references?
    Specific example:
    I have an audio signal of some length, let us say 100 samples.
    I would like to find a filter that produces a delta spike out of my signal.
    In other words, training with my signal should output [1, 0, 0, …… 0, 0 ] – delta spike.

    Thanks a lot,


  123. Adarsh April 19, 2018 at 9:13 pm #

    Does number epoch depends on number of data i have.

    For example i have around 400,000+ data, what should be number of epochs

    • Jason Brownlee April 20, 2018 at 5:48 am #

      More data may require more learning/epochs.

      • Adarsh April 23, 2018 at 1:39 pm #

        Thank you jason ur blog is wonderful place to learn Machine Learning for beginners

        • Jason Brownlee April 23, 2018 at 2:54 pm #

          Thanks, I’m glad to hear that.

          • Adarsh May 2, 2018 at 3:05 pm #

            Jason i came across while trying to learn about neural network about dead neurons while training how do i identify dead neurons while training using keras
            and also how to eliminate that i am eager to know that
            thanking you in advance

          • Jason Brownlee May 3, 2018 at 6:31 am #

            Thanks, that is a great topic. Sorry, I don’t have material on it. Perhaps I can cover it in the future.

          • Adarsh May 16, 2018 at 3:19 pm #

            Jason i really want to know the maths behind neural network can u share a place where i can learn that from i want to know how it makes the prediction in linear regression

          • Jason Brownlee May 17, 2018 at 6:24 am #

            Neural network and linear regression are two different methods.

            Learn about the math for neural networks in this book:

            Learn about the math for linear regression in this book:

          • Adarsh May 17, 2018 at 8:25 pm #

            thank you jason that was really good resource

  124. onur April 20, 2018 at 7:36 am #

    Hi Jason-

    Thanks for the great input.

    I am uing the following code to predicts Boston Homa prices:

    # Artificial Neural Network

    # Regression Example With Boston Dataset: Baseline
    # Importing the libraries
    import numpy
    from pandas import read_csv
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.wrappers.scikit_learn import KerasRegressor
    from sklearn.model_selection import cross_val_score
    from sklearn.model_selection import KFold
    from sklearn.preprocessing import StandardScaler
    from sklearn.pipeline import Pipeline
    import matplotlib.pyplot as plt
    # Importing the dataset
    dataframe = read_csv(“housing.csv”, delim_whitespace=True, header=None)
    dataset = dataframe.values
    # Split into input (X) and output (Y) variables
    X = dataset[:,0:13]
    y = dataset[:,13]
    # Create model

    model = Sequential()
    model.add(Dense(13,input_dim=13, init=’normal’, activation=’relu’))
    model.add(Dense(1, init=’normal’))
    # Compile model
    model.compile(loss=’mean_squared_error’, optimizer=’adam’, metrics=[‘accuracy’])
    # Fit the model
    history =, y, validation_split=0.20, epochs=150, batch_size=5, verbose=0)
    # Make predictions
    predictions = model.predict(X)
    # list all data in history
    # summarize history for accuracy
    plt.title(‘model accuracy’)
    plt.legend([‘train’, ‘test’], loc=’upper left’)

    However, as you can see from the graph, my accuracy is very low. Why is that?