[New Book] Click to get The Beginner's Guide to Data Science!
Use the offer code 20offearlybird to get 20% off. Hurry, sale ends soon!

How to Build Multi-Layer Perceptron Neural Network Models with Keras

The Keras Python library for deep learning focuses on creating models as a sequence of layers.

In this post, you will discover the simple components you can use to create neural networks and simple deep learning models using Keras from TensorFlow.

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • May 2016: First version
  • Update Mar/2017: Updated example for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0.
  • Update Jun/2022: Updated code to TensorFlow 2.x. Update external links.
How To Build Multi-Layer Perceptron Neural Network Models with Keras

How to build multi-layer perceptron neural network models with Keras
Photo by George Rex, some rights reserved.

Neural Network Models in Keras

The focus of the Keras library is a model.

The simplest model is defined in the Sequential class, which is a linear stack of Layers.

You can create a Sequential model and define all the layers in the constructor; for example:

A more useful idiom is to create a Sequential model and add your layers in the order of the computation you wish to perform; for example:

Model Inputs

The first layer in your model must specify the shape of the input.

This is the number of input attributes defined by the input_shape argument. This argument expects a tuple.

For example, you can define input in terms of 8 inputs for a Dense type layer as follows:

Model Layers

Layers of different types have a few properties in common, specifically their method of weight initialization and activation functions.

Weight Initialization

The type of initialization used for a layer is specified in the kernel_initializer argument.

Some common types of layer initialization include:

  • random_uniform: Weights are initialized to small uniformly random values between -0.05 and 0.05.
  • random_normal: Weights are initialized to small Gaussian random values (zero mean and standard deviation of 0.05).
  • zeros: All weights are set to zero values.

You can see a full list of the initialization techniques supported on the Usage of initializations page.

Activation Function

Keras supports a range of standard neuron activation functions, such as softmax, rectified linear (relu), tanh, and sigmoid.

You typically specify the type of activation function used by a layer in the activation argument, which takes a string value.

You can see a full list of activation functions supported by Keras on the Usage of activations page.

Interestingly, you can also create an Activation object and add it directly to your model after your layer to apply that activation to the output of the layer.

Layer Types

There are a large number of core layer types for standard neural networks.

Some common and useful layer types you can choose from are:

  • Dense: Fully connected layer and the most common type of layer used on multi-layer perceptron models
  • Dropout: Apply dropout to the model, setting a fraction of inputs to zero in an effort to reduce overfitting
  • Concatenate: Combine the outputs from multiple layers as input to a single layer

You can learn about the full list of core Keras layers on the Core Layers page.

Model Compilation

Once you have defined your model, it needs to be compiled.

This creates the efficient structures used by TensorFlow in order to efficiently execute your model during training. Specifically, TensorFlow converts your model into a graph so the training can be carried out efficiently.

You compile your model using the compile() function, and it accepts three important attributes:

  1. Model optimizer
  2. Loss function
  3. Metrics

1. Model Optimizers

The optimizer is the search technique used to update weights in your model.

You can create an optimizer object and pass it to the compile function via the optimizer argument. This allows you to configure the optimization procedure with its own arguments, such as learning rate. For example:

You can also use the default parameters of the optimizer by specifying the name of the optimizer to the optimizer argument. For example:

Some popular gradient descent optimizers you might want to choose from include:

  • SGD: stochastic gradient descent, with support for momentum
  • RMSprop: adaptive learning rate optimization method proposed by Geoff Hinton
  • Adam: Adaptive Moment Estimation (Adam) that also uses adaptive learning rates

You can learn about all of the optimizers supported by Keras on the Usage of optimizers page.

You can learn more about different gradient descent methods in the Gradient descent optimization algorithms section of Sebastian Ruder’s post, An overview of gradient descent optimization algorithms.

2. Model Loss Functions

The loss function, also called the objective function, is the evaluation of the model used by the optimizer to navigate the weight space.

You can specify the name of the loss function to use in the compile function by the loss argument. Some common examples include:

  • mse‘: for mean squared error
  • binary_crossentropy‘: for binary logarithmic loss (logloss)
  • categorical_crossentropy‘: for multi-class logarithmic loss (logloss)

You can learn more about the loss functions supported by Keras on the Losses page.

3. Model Metrics

Metrics are evaluated by the model during training.

Only one metric is supported at the moment, and that is accuracy.

Model Training

The model is trained on NumPy arrays using the fit() function; for example:

Training both specifies the number of epochs to train on and the batch size.

  • Epochs (epochs) refer to the number of times the model is exposed to the training dataset.
  • Batch Size (batch_size) is the number of training instances shown to the model before a weight update is performed.

The fit function also allows for some basic evaluation of the model during training. You can set the validation_split value to hold back a fraction of the training dataset for validation to be evaluated in each epoch or provide a validation_data tuple of (X, y) data to evaluate.

Fitting the model returns a history object with details and metrics calculated for the model in each epoch. This can be used for graphing model performance.

Model Prediction

Once you have trained your model, you can use it to make predictions on test data or new data.

There are a number of different output types you can calculate from your trained model, each calculated using a different function call on your model object. For example:

  • model.evaluate(): To calculate the loss values for the input data
  • model.predict(): To generate network output for the input data

For example, if you provided a batch of data X and the expected output y, you can use evaluate() to calculate the loss metric (the one you defined with compile() before). But for a batch of new data X, you can obtain the network output with predict(). It may not be the output you want, but it will be the output of your network. For example, a classification problem will probably output a softmax vector for each sample. You will need to use numpy.argmax() to convert the softmax vector into class labels.

Need help with Deep Learning in Python?

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

Summarize the Model

Once you are happy with your model, you can finalize it.

You may wish to output a summary of your model. For example, you can display a summary of a model by calling the summary function:

You can also retrieve a summary of the model configuration using the get_config() function:

Finally, you can create an image of your model structure directly:

Resources

You can learn more about how to create a simple neural network and deep learning models in Keras using the following resources:

Summary

In this post, you discovered the Keras API that you can use to create artificial neural networks and deep learning models.

Specifically, you learned about the life cycle of a Keras model, including:

  • Constructing a model
  • Creating and adding layers, including weight initialization and activation
  • Compiling models, including optimization method, loss function, and metrics
  • Fitting models, including epochs and batch size
  • Model predictions
  • Summarizing the model

If you have any questions about Keras for Deep Learning or this post, ask in the comments, and I will do my best to answer them.

27 Responses to How to Build Multi-Layer Perceptron Neural Network Models with Keras

  1. Avatar
    Chong Wang October 8, 2016 at 2:35 am #

    Hi, Jason.

    For the merge layer, Do you know how to use the mode=’dot’? I want to merge the outputs of two embedding layers by dot multiplication.

    The specific question is here: http://stackoverflow.com/questions/39919549/how-to-get-dot-product-of-the-outputs-of-two-embedding-layers-in-keras

    Thanks.

    • Avatar
      Chong Wang October 8, 2016 at 3:29 am #

      Oh, I just figure out that I should use ‘mul’, instead of ‘dot’.

      Do you think it makes sense that ‘mul’ can also capture the interaction of the two embeddings as ‘dot’ does in SVD? (The output of ‘mul’ will be input into further LSTM layers.)

      • Avatar
        Jason Brownlee October 8, 2016 at 10:43 am #

        Great question Chong, sorry I’ve not experimented with merge layers yet.

        • Avatar
          john woo January 13, 2017 at 4:03 pm #

          Does anyone know if the order of inputs into a merge layer has any significance? Assuming the merged layer goes straight into one dense layer?

          • Avatar
            Jason Brownlee January 15, 2017 at 5:17 am #

            Not off hand, you could email the google group, use trial and error with controlled experiments, or review source code to discern.

  2. Avatar
    Keshav September 9, 2017 at 4:16 am #

    Hi Jason,
    Thanks for this tutorial and all your other articles in the blog – they are super helpful and awesome!
    I have a question: What is the recommended number of hidden units in a single hidden (Dense) layer? For a MLP, shouldn’t it be less than the number of features in the input layer? (input_dim parameter)? So the number of units in the above example should be less than 8 right? so how does 16 work? referring to this code: ‘ Dense(16, input_dim=8) ‘

    • Avatar
      Jason Brownlee September 9, 2017 at 11:59 am #

      We cannot configure a neural network analytically, you must use trial and error to discover what works on your specific dataset.

  3. Avatar
    Shabnam December 16, 2017 at 6:29 pm #

    Thanks a lot for this great post.
    I tried to run the example that you provided. I noticed that I need to use “from keras.utils.vis_utils import plot_model” instead of “from keras.utils.visualize_util import plot”. I thought it may be useful for other people if they have similar issue, so I decided to share it here.

    • Avatar
      Jason Brownlee December 17, 2017 at 8:51 am #

      Thanks, fixed.

      • Avatar
        Shabnam December 18, 2017 at 7:30 am #

        No worries, I guess the line after that needs to be plot_model as well.

  4. Avatar
    Shabnam December 18, 2017 at 7:53 am #

    If we have multiple hidden layers, how can we explore the best combination of activation functions? For example, for a binary classification, I would think of sigmoid function for output. However, I do not know if it is better to use sigmoid for the hidden layers, or I need to try different functions to see which one is the best for my data.

    • Avatar
      Jason Brownlee December 18, 2017 at 3:23 pm #

      You must design experiments to test different configurations to see what works best on your specific data.

  5. Avatar
    Michael Lange May 24, 2018 at 3:52 am #

    Thanks for the article Jason! Well written. huge fan of Keras.

    “We cannot configure a neural network analytically, you must use trial and error to discover what works on your specific dataset.”

    Just an fyi to all that read this, trial and error is a pain in the arse. But a great way to learn the zoo. I must I would say.

    However, here is some code that you may find useful. A simple GA that optimizes your learning pipeline for scikit-learn. https://github.com/mikewlange/KETTLE – Originally called TPot and developed by Computational Genetics Lab http://epistasis.org/ – it’s limited to scikit-learn Classification and Regression pipelines. But I’m building in Keras support – one day! lol.

    Cheers,
    mike

  6. Avatar
    Arya October 26, 2018 at 6:54 pm #

    It is believed that initializing an MLP with all zero weights is not a good idea. Why?

  7. Avatar
    MAK January 1, 2019 at 8:32 am #

    Hello DR.Jason
    Fantastic blog…
    I want to predict all of my system features one step ahead , the data is temporal .
    I have 100 features that use as input , and 100 features as output (predict all the features one step ahead)
    I build an LSTM model with 10 hidden layers for predict the next step.
    Each hidden layer contains 200 neurons , in each layer my activation function is relu ,
    The output layer’s activation function is linear.
    optimizer loss function is mae
    During the training phase I notice some strange issue:
    1. I received low loss value and in addition low accuracy ? , can you explain how it’s possible ?
    2. During the training I insert different dataset with the same features , I suddenly received very high loss values (change from 0.01 to 2909090), may you have any idea why?
    3. The number of neurons in each layer is OK? may you have any finger rule for choice the number of neurons vs input layer?
    Thanks

  8. Avatar
    Walid February 6, 2019 at 1:05 am #

    I do not think some one can write a better introduction than this

  9. Avatar
    olufemi george September 18, 2019 at 1:58 am #

    Awesome tutorial! How can I create multiple outputs in an MLP model? all your examples show just one output.

    Thanks

    • Avatar
      Jason Brownlee September 18, 2019 at 6:15 am #

      Change the number of nodes in the output layer to the number of outputs required. That’s it.

  10. Avatar
    vasanth kumar April 8, 2020 at 5:28 am #

    Thanks for the tutorial. I am unable to use sequential() in my jupyter notebook. I know this is not the apt platform but please help me out. error message is “Using TensorFlow backend..No module named ‘tensorflow’ “. . Then i installed tensorflow. even then the issue persists.

    I am fine if anyone suggests any other python tool where the deep learning packages will be easily installed.

Leave a Reply