Last Updated on August 3, 2022
The Keras Python library for deep learning focuses on creating models as a sequence of layers.
In this post, you will discover the simple components you can use to create neural networks and simple deep learning models using Keras from TensorFlow.
Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
- May 2016: First version
- Update Mar/2017: Updated example for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0.
- Update Jun/2022: Updated code to TensorFlow 2.x. Update external links.
Neural Network Models in Keras
The focus of the Keras library is a model.
The simplest model is defined in the Sequential class, which is a linear stack of Layers.
You can create a Sequential model and define all the layers in the constructor; for example:
from tensorflow.keras.models import Sequential
model = Sequential(...)
A more useful idiom is to create a Sequential model and add your layers in the order of the computation you wish to perform; for example:
from tensorflow.keras.models import Sequential
model = Sequential()
The first layer in your model must specify the shape of the input.
This is the number of input attributes defined by the
input_shape argument. This argument expects a tuple.
For example, you can define input in terms of 8 inputs for a
Dense type layer as follows:
Layers of different types have a few properties in common, specifically their method of weight initialization and activation functions.
The type of initialization used for a layer is specified in the
Some common types of layer initialization include:
random_uniform: Weights are initialized to small uniformly random values between -0.05 and 0.05.
random_normal: Weights are initialized to small Gaussian random values (zero mean and standard deviation of 0.05).
zeros: All weights are set to zero values.
You can see a full list of the initialization techniques supported on the Usage of initializations page.
Keras supports a range of standard neuron activation functions, such as softmax, rectified linear (relu), tanh, and sigmoid.
You typically specify the type of activation function used by a layer in the activation argument, which takes a string value.
You can see a full list of activation functions supported by Keras on the Usage of activations page.
Interestingly, you can also create an Activation object and add it directly to your model after your layer to apply that activation to the output of the layer.
There are a large number of core layer types for standard neural networks.
Some common and useful layer types you can choose from are:
- Dense: Fully connected layer and the most common type of layer used on multi-layer perceptron models
- Dropout: Apply dropout to the model, setting a fraction of inputs to zero in an effort to reduce overfitting
- Concatenate: Combine the outputs from multiple layers as input to a single layer
You can learn about the full list of core Keras layers on the Core Layers page.
Once you have defined your model, it needs to be compiled.
This creates the efficient structures used by TensorFlow in order to efficiently execute your model during training. Specifically, TensorFlow converts your model into a graph so the training can be carried out efficiently.
You compile your model using the
compile() function, and it accepts three important attributes:
- Model optimizer
- Loss function
model.compile(optimizer=..., loss=..., metrics=...)
1. Model Optimizers
The optimizer is the search technique used to update weights in your model.
You can create an optimizer object and pass it to the compile function via the optimizer argument. This allows you to configure the optimization procedure with its own arguments, such as learning rate. For example:
from tensorflow.keras.optimizers import SGD
sgd = SGD(...)
You can also use the default parameters of the optimizer by specifying the name of the optimizer to the optimizer argument. For example:
Some popular gradient descent optimizers you might want to choose from include:
- SGD: stochastic gradient descent, with support for momentum
- RMSprop: adaptive learning rate optimization method proposed by Geoff Hinton
- Adam: Adaptive Moment Estimation (Adam) that also uses adaptive learning rates
You can learn about all of the optimizers supported by Keras on the Usage of optimizers page.
You can learn more about different gradient descent methods in the Gradient descent optimization algorithms section of Sebastian Ruder’s post, An overview of gradient descent optimization algorithms.
2. Model Loss Functions
The loss function, also called the objective function, is the evaluation of the model used by the optimizer to navigate the weight space.
You can specify the name of the loss function to use in the compile function by the loss argument. Some common examples include:
- ‘mse‘: for mean squared error
- ‘binary_crossentropy‘: for binary logarithmic loss (logloss)
- ‘categorical_crossentropy‘: for multi-class logarithmic loss (logloss)
You can learn more about the loss functions supported by Keras on the Losses page.
3. Model Metrics
Metrics are evaluated by the model during training.
Only one metric is supported at the moment, and that is accuracy.
The model is trained on NumPy arrays using the fit() function; for example:
model.fit(X, y, epochs=..., batch_size=...)
Training both specifies the number of epochs to train on and the batch size.
- Epochs (
epochs) refer to the number of times the model is exposed to the training dataset.
- Batch Size (
batch_size) is the number of training instances shown to the model before a weight update is performed.
The fit function also allows for some basic evaluation of the model during training. You can set the validation_split value to hold back a fraction of the training dataset for validation to be evaluated in each epoch or provide a validation_data tuple of (X, y) data to evaluate.
Fitting the model returns a history object with details and metrics calculated for the model in each epoch. This can be used for graphing model performance.
Once you have trained your model, you can use it to make predictions on test data or new data.
There are a number of different output types you can calculate from your trained model, each calculated using a different function call on your model object. For example:
- model.evaluate(): To calculate the loss values for the input data
- model.predict(): To generate network output for the input data
For example, if you provided a batch of data
X and the expected output
y, you can use
evaluate() to calculate the loss metric (the one you defined with
compile() before). But for a batch of new data
X, you can obtain the network output with
predict(). It may not be the output you want, but it will be the output of your network. For example, a classification problem will probably output a softmax vector for each sample. You will need to use
numpy.argmax() to convert the softmax vector into class labels.
Need help with Deep Learning in Python?
Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).
Click to sign-up now and also get a free PDF Ebook version of the course.
Summarize the Model
Once you are happy with your model, you can finalize it.
You may wish to output a summary of your model. For example, you can display a summary of a model by calling the summary function:
You can also retrieve a summary of the model configuration using the get_config() function:
Finally, you can create an image of your model structure directly:
from tensorflow.keras.utils import plot_model
You can learn more about how to create a simple neural network and deep learning models in Keras using the following resources:
In this post, you discovered the Keras API that you can use to create artificial neural networks and deep learning models.
Specifically, you learned about the life cycle of a Keras model, including:
- Constructing a model
- Creating and adding layers, including weight initialization and activation
- Compiling models, including optimization method, loss function, and metrics
- Fitting models, including epochs and batch size
- Model predictions
- Summarizing the model
If you have any questions about Keras for Deep Learning or this post, ask in the comments, and I will do my best to answer them.