The post PyTorch Tutorial: How to Develop Deep Learning Models with Python appeared first on Machine Learning Mastery.

]]>PyTorch is the premier open-source deep learning framework developed and maintained by Facebook.

At its core, PyTorch is a mathematical library that allows you to perform efficient computation and automatic differentiation on graph-based models. Achieving this directly is challenging, although thankfully, the modern PyTorch API provides classes and idioms that allow you to easily develop a suite of deep learning models.

In this tutorial, you will discover a step-by-step guide to developing deep learning models in PyTorch.

After completing this tutorial, you will know:

- The difference between Torch and PyTorch and how to install and confirm PyTorch is working.
- The five-step life-cycle of PyTorch models and how to define, fit, and evaluate models.
- How to develop PyTorch deep learning models for regression, classification, and predictive modeling tasks.

Let’s get started.

The focus of this tutorial is on using the PyTorch API for common deep learning model development tasks; we will not be diving into the math and theory of deep learning. For that, I recommend starting with this excellent book.

The best way to learn deep learning in python is by doing. Dive in. You can circle back for more theory later.

I have designed each code example to use best practices and to be standalone so that you can copy and paste it directly into your project and adapt it to your specific needs. This will give you a massive head start over trying to figure out the API from official documentation alone.

It is a large tutorial, and as such, it is divided into three parts; they are:

- How to Install PyTorch
- What Are Torch and PyTorch?
- How to Install PyTorch
- How to Confirm PyTorch Is Installed

- PyTorch Deep Learning Model Life-Cycle
- Step 1: Prepare the Data
- Step 2: Define the Model
- Step 3: Train the Model
- Step 4: Evaluate the Model
- Step 5: Make Predictions

- How to Develop PyTorch Deep Learning Models
- How to Develop an MLP for Binary Classification
- How to Develop an MLP for Multiclass Classification
- How to Develop an MLP for Regression
- How to Develop a CNN for Image Classification

Work through this tutorial. It will take you 60 minutes, max!

**You do not need to understand everything (at least not right now)**. Your goal is to run through the tutorial end-to-end and get a result. You do not need to understand everything on the first pass. List down your questions as you go. Make heavy use of the API documentation to learn about all of the functions that you’re using.

**You do not need to know the math first**. Math is a compact way of describing how algorithms work, specifically tools from linear algebra, probability, and calculus. These are not the only tools that you can use to learn how algorithms work. You can also use code and explore algorithm behavior with different inputs and outputs. Knowing the math will not tell you what algorithm to choose or how to best configure it. You can only discover that through carefully controlled experiments.

**You do not need to know how the algorithms work**. It is important to know about the limitations and how to configure deep learning algorithms. But learning about algorithms can come later. You need to build up this algorithm knowledge slowly over a long period of time. Today, start by getting comfortable with the platform.

**You do not need to be a Python programmer**. The syntax of the Python language can be intuitive if you are new to it. Just like other languages, focus on function calls (e.g. function()) and assignments (e.g. a = “b”). This will get you most of the way. You are a developer; you know how to pick up the basics of a language really fast. Just get started and dive into the details later.

**You do not need to be a deep learning expert**. You can learn about the benefits and limitations of various algorithms later, and there are plenty of tutorials that you can read to brush up on the steps of a deep learning project.

In this section, you will discover what PyTorch is, how to install it, and how to confirm that it is installed correctly.

PyTorch is an open-source Python library for deep learning developed and maintained by Facebook.

The project started in 2016 and quickly became a popular framework among developers and researchers.

Torch (*Torch7*) is an open-source project for deep learning written in C and generally used via the Lua interface. It was a precursor project to PyTorch and is no longer actively developed. PyTorch includes “*Torch*” in the name, acknowledging the prior torch library with the “*Py*” prefix indicating the Python focus of the new project.

The PyTorch API is simple and flexible, making it a favorite for academics and researchers in the development of new deep learning models and applications. The extensive use has led to many extensions for specific applications (such as text, computer vision, and audio data), and may pre-trained models that can be used directly. As such, it may be the most popular library used by academics.

The flexibility of PyTorch comes at the cost of ease of use, especially for beginners, as compared to simpler interfaces like Keras. The choice to use PyTorch instead of Keras gives up some ease of use, a slightly steeper learning curve, and more code for more flexibility, and perhaps a more vibrant academic community.

Before installing PyTorch, ensure that you have Python installed, such as Python 3.6 or higher.

If you don’t have Python installed, you can install it using Anaconda. This tutorial will show you how:

There are many ways to install the PyTorch open-source deep learning library.

The most common, and perhaps simplest, way to install PyTorch on your workstation is by using pip.

For example, on the command line, you can type:

sudo pip install torch

Perhaps the most popular application of deep learning is for computer vision, and the PyTorch computer vision package is called “torchvision.”

Installing torchvision is also highly recommended and it can be installed as follows:

sudo pip install torchvision

If you prefer to use an installation method more specific to your platform or package manager, you can see a complete list of installation instructions here:

There is no need to set up the GPU now.

All examples in this tutorial will work just fine on a modern CPU. If you want to configure PyTorch for your GPU, you can do that after completing this tutorial. Don’t get distracted!

Once PyTorch is installed, it is important to confirm that the library was installed successfully and that you can start using it.

Don’t skip this step.

If PyTorch is not installed correctly or raises an error on this step, you won’t be able to run the examples later.

Create a new file called *versions.py* and copy and paste the following code into the file.

# check pytorch version import torch print(torch.__version__)

Save the file, then open your command line and change directory to where you saved the file.

Then type:

python versions.py

You should then see output like the following:

1.3.1

This confirms that PyTorch is installed correctly and that we are all using the same version.

This also shows you how to run a Python script from the command line. I recommend running all code from the command line in this manner, and not from a notebook or an IDE.

In this section, you will discover the life-cycle for a deep learning model and the PyTorch API that you can use to define models.

A model has a life-cycle, and this very simple knowledge provides the backbone for both modeling a dataset and understanding the PyTorch API.

The five steps in the life-cycle are as follows:

- 1. Prepare the Data.
- 2. Define the Model.
- 3. Train the Model.
- 4. Evaluate the Model.
- 5. Make Predictions.

Let’s take a closer look at each step in turn.

**Note**: There are many ways to achieve each of these steps using the PyTorch API, although I have aimed to show you the simplest, or most common, or most idiomatic.

If you discover a better approach, let me know in the comments below.

The first step is to load and prepare your data.

Neural network models require numerical input data and numerical output data.

You can use standard Python libraries to load and prepare tabular data, like CSV files. For example, Pandas can be used to load your CSV file, and tools from scikit-learn can be used to encode categorical data, such as class labels.

PyTorch provides the Dataset class that you can extend and customize to load your dataset.

For example, the constructor of your dataset object can load your data file (e.g. a CSV file). You can then override the *__len__()* function that can be used to get the length of the dataset (number of rows or samples), and the *__getitem__()* function that is used to get a specific sample by index.

When loading your dataset, you can also perform any required transforms, such as scaling or encoding.

A skeleton of a custom *Dataset* class is provided below.

# dataset definition class CSVDataset(Dataset): # load the dataset def __init__(self, path): # store the inputs and outputs self.X = ... self.y = ... # number of rows in the dataset def __len__(self): return len(self.X) # get a row at an index def __getitem__(self, idx): return [self.X[idx], self.y[idx]]

Once loaded, PyTorch provides the DataLoader class to navigate a *Dataset* instance during the training and evaluation of your model.

A *DataLoader* instance can be created for the training dataset, test dataset, and even a validation dataset.

The random_split() function can be used to split a dataset into train and test sets. Once split, a selection of rows from the *Dataset* can be provided to a DataLoader, along with the batch size and whether the data should be shuffled every epoch.

For example, we can define a *DataLoader* by passing in a selected sample of rows in the dataset.

... # create the dataset dataset = CSVDataset(...) # select rows from the dataset train, test = random_split(dataset, [[...], [...]]) # create a data loader for train and test sets train_dl = DataLoader(train, batch_size=32, shuffle=True) test_dl = DataLoader(test, batch_size=1024, shuffle=False)

Once defined, a *DataLoader* can be enumerated, yielding one batch worth of samples each iteration.

... # train the model for i, (inputs, targets) in enumerate(train_dl): ...

The next step is to define a model.

The idiom for defining a model in PyTorch involves defining a class that extends the Module class.

The constructor of your class defines the layers of the model and the forward() function is the override that defines how to forward propagate input through the defined layers of the model.

Many layers are available, such as Linear for fully connected layers, Conv2d for convolutional layers, and MaxPool2d for pooling layers.

Activation functions can also be defined as layers, such as ReLU, Softmax, and Sigmoid.

Below is an example of a simple MLP model with one layer.

# model definition class MLP(Module): # define model elements def __init__(self, n_inputs): super(MLP, self).__init__() self.layer = Linear(n_inputs, 1) self.activation = Sigmoid() # forward propagate input def forward(self, X): X = self.layer(X) X = self.activation(X) return X

The weights of a given layer can also be initialized after the layer is defined in the constructor.

Common examples include the Xavier and He weight initialization schemes. For example:

... xavier_uniform_(self.layer.weight)

The training process requires that you define a loss function and an optimization algorithm.

Common loss functions include the following:

- BCELoss: Binary cross-entropy loss for binary classification.
- CrossEntropyLoss: Categorical cross-entropy loss for multi-class classification.
- MSELoss: Mean squared loss for regression.

For more on loss functions generally, see the tutorial:

Stochastic gradient descent is used for optimization, and the standard algorithm is provided by the SGD class, although other versions of the algorithm are available, such as Adam.

# define the optimization criterion = MSELoss() optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)

Training the model involves enumerating the *DataLoader* for the training dataset.

First, a loop is required for the number of training epochs. Then an inner loop is required for the mini-batches for stochastic gradient descent.

... # enumerate epochs for epoch in range(100): # enumerate mini batches for i, (inputs, targets) in enumerate(train_dl): ...

Each update to the model involves the same general pattern comprised of:

- Clearing the last error gradient.
- A forward pass of the input through the model.
- Calculating the loss for the model output.
- Backpropagating the error through the model.
- Update the model in an effort to reduce loss.

For example:

... # clear the gradients optimizer.zero_grad() # compute the model output yhat = model(inputs) # calculate loss loss = criterion(yhat, targets) # credit assignment loss.backward() # update model weights optimizer.step()

Once the model is fit, it can be evaluated on the test dataset.

This can be achieved by using the *DataLoader* for the test dataset and collecting the predictions for the test set, then comparing the predictions to the expected values of the test set and calculating a performance metric.

... for i, (inputs, targets) in enumerate(test_dl): # evaluate the model on the test set yhat = model(inputs) ...

A fit model can be used to make a prediction on new data.

For example, you might have a single image or a single row of data and want to make a prediction.

This requires that you wrap the data in a PyTorch Tensor data structure.

A Tensor is just the PyTorch version of a NumPy array for holding data. It also allows you to perform the automatic differentiation tasks in the model graph, like calling *backward()* when training the model.

The prediction too will be a Tensor, although you can retrieve the NumPy array by detaching the Tensor from the automatic differentiation graph and calling the NumPy function.

... # convert row to data row = Variable(Tensor([row]).float()) # make prediction yhat = model(row) # retrieve numpy array yhat = yhat.detach().numpy()

Now that we are familiar with the PyTorch API at a high-level and the model life-cycle, let’s look at how we can develop some standard deep learning models from scratch.

In this section, you will discover how to develop, evaluate, and make predictions with standard deep learning models, including Multilayer Perceptrons (MLP) and Convolutional Neural Networks (CNN).

A Multilayer Perceptron model, or MLP for short, is a standard fully connected neural network model.

It is comprised of layers of nodes where each node is connected to all outputs from the previous layer and the output of each node is connected to all inputs for nodes in the next layer.

An MLP is a model with one or more fully connected layers. This model is appropriate for tabular data, that is data as it looks in a table or spreadsheet with one column for each variable and one row for each variable. There are three predictive modeling problems you may want to explore with an MLP; they are binary classification, multiclass classification, and regression.

Let’s fit a model on a real dataset for each of these cases.

**Note**: The models in this section are effective, but not optimized. See if you can improve their performance. Post your findings in the comments below.

We will use the Ionosphere binary (two class) classification dataset to demonstrate an MLP for binary classification.

This dataset involves predicting whether there is a structure in the atmosphere or not given radar returns.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

We will use a LabelEncoder to encode the string labels to integer values 0 and 1. The model will be fit on 67 percent of the data, and the remaining 33 percent will be used for evaluation, split using the train_test_split() function.

It is a good practice to use ‘*relu*‘ activation with a ‘*He Uniform*‘ weight initialization. This combination goes a long way to overcome the problem of vanishing gradients when training deep neural network models. For more on ReLU, see the tutorial:

The model predicts the probability of class 1 and uses the sigmoid activation function. The model is optimized using stochastic gradient descent and seeks to minimize the binary cross-entropy loss.

The complete example is listed below.

# pytorch mlp for binary classification from numpy import vstack from pandas import read_csv from sklearn.preprocessing import LabelEncoder from sklearn.metrics import accuracy_score from torch.utils.data import Dataset from torch.utils.data import DataLoader from torch.utils.data import random_split from torch import Tensor from torch.nn import Linear from torch.nn import ReLU from torch.nn import Sigmoid from torch.nn import Module from torch.optim import SGD from torch.nn import BCELoss from torch.nn.init import kaiming_uniform_ from torch.nn.init import xavier_uniform_ # dataset definition class CSVDataset(Dataset): # load the dataset def __init__(self, path): # load the csv file as a dataframe df = read_csv(path, header=None) # store the inputs and outputs self.X = df.values[:, :-1] self.y = df.values[:, -1] # ensure input data is floats self.X = self.X.astype('float32') # label encode target and ensure the values are floats self.y = LabelEncoder().fit_transform(self.y) self.y = self.y.astype('float32') self.y = self.y.reshape((len(self.y), 1)) # number of rows in the dataset def __len__(self): return len(self.X) # get a row at an index def __getitem__(self, idx): return [self.X[idx], self.y[idx]] # get indexes for train and test rows def get_splits(self, n_test=0.33): # determine sizes test_size = round(n_test * len(self.X)) train_size = len(self.X) - test_size # calculate the split return random_split(self, [train_size, test_size]) # model definition class MLP(Module): # define model elements def __init__(self, n_inputs): super(MLP, self).__init__() # input to first hidden layer self.hidden1 = Linear(n_inputs, 10) kaiming_uniform_(self.hidden1.weight, nonlinearity='relu') self.act1 = ReLU() # second hidden layer self.hidden2 = Linear(10, 8) kaiming_uniform_(self.hidden2.weight, nonlinearity='relu') self.act2 = ReLU() # third hidden layer and output self.hidden3 = Linear(8, 1) xavier_uniform_(self.hidden3.weight) self.act3 = Sigmoid() # forward propagate input def forward(self, X): # input to first hidden layer X = self.hidden1(X) X = self.act1(X) # second hidden layer X = self.hidden2(X) X = self.act2(X) # third hidden layer and output X = self.hidden3(X) X = self.act3(X) return X # prepare the dataset def prepare_data(path): # load the dataset dataset = CSVDataset(path) # calculate split train, test = dataset.get_splits() # prepare data loaders train_dl = DataLoader(train, batch_size=32, shuffle=True) test_dl = DataLoader(test, batch_size=1024, shuffle=False) return train_dl, test_dl # train the model def train_model(train_dl, model): # define the optimization criterion = BCELoss() optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9) # enumerate epochs for epoch in range(100): # enumerate mini batches for i, (inputs, targets) in enumerate(train_dl): # clear the gradients optimizer.zero_grad() # compute the model output yhat = model(inputs) # calculate loss loss = criterion(yhat, targets) # credit assignment loss.backward() # update model weights optimizer.step() # evaluate the model def evaluate_model(test_dl, model): predictions, actuals = list(), list() for i, (inputs, targets) in enumerate(test_dl): # evaluate the model on the test set yhat = model(inputs) # retrieve numpy array yhat = yhat.detach().numpy() actual = targets.numpy() actual = actual.reshape((len(actual), 1)) # round to class values yhat = yhat.round() # store predictions.append(yhat) actuals.append(actual) predictions, actuals = vstack(predictions), vstack(actuals) # calculate accuracy acc = accuracy_score(actuals, predictions) return acc # make a class prediction for one row of data def predict(row, model): # convert row to data row = Tensor([row]) # make prediction yhat = model(row) # retrieve numpy array yhat = yhat.detach().numpy() return yhat # prepare the data path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv' train_dl, test_dl = prepare_data(path) print(len(train_dl.dataset), len(test_dl.dataset)) # define the network model = MLP(34) # train the model train_model(train_dl, model) # evaluate the model acc = evaluate_model(test_dl, model) print('Accuracy: %.3f' % acc) # make a single prediction (expect class=1) row = [1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1,0.03760,0.85243,-0.17755,0.59755,-0.44945,0.60536,-0.38223,0.84356,-0.38542,0.58212,-0.32192,0.56971,-0.29674,0.36946,-0.47357,0.56811,-0.51171,0.41078,-0.46168,0.21266,-0.34090,0.42267,-0.54487,0.18641,-0.45300] yhat = predict(row, model) print('Predicted: %.3f (class=%d)' % (yhat, yhat.round()))

Running the example first reports the shape of the train and test datasets, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.

Your specific results will vary given the stochastic nature of the learning algorithm. Try running the example a few times.

**What result did you get?**

**Can you change the model to do better?**

Post your findings to the comments below.

In this case, we can see that the model achieved a classification accuracy of about 94 percent and then predicted a probability of 0.99 that the one row of data belong to class 1.

235 116 Accuracy: 0.948 Predicted: 0.998 (class=1)

We will use the Iris flowers multiclass classification dataset to demonstrate an MLP for multiclass classification.

This problem involves predicting the species of iris flower given measures of the flower.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

Given that it is a multiclass classification, the model must have one node for each class in the output layer and use the softmax activation function. The loss function is the cross entropy, which is appropriate for integer encoded class labels (e.g. 0 for one class, 1 for the next class, etc.).

The complete example of fitting and evaluating an MLP on the iris flowers dataset is listed below.

# pytorch mlp for multiclass classification from numpy import vstack from numpy import argmax from pandas import read_csv from sklearn.preprocessing import LabelEncoder from sklearn.metrics import accuracy_score from torch import Tensor from torch.utils.data import Dataset from torch.utils.data import DataLoader from torch.utils.data import random_split from torch.nn import Linear from torch.nn import ReLU from torch.nn import Softmax from torch.nn import Module from torch.optim import SGD from torch.nn import CrossEntropyLoss from torch.nn.init import kaiming_uniform_ from torch.nn.init import xavier_uniform_ # dataset definition class CSVDataset(Dataset): # load the dataset def __init__(self, path): # load the csv file as a dataframe df = read_csv(path, header=None) # store the inputs and outputs self.X = df.values[:, :-1] self.y = df.values[:, -1] # ensure input data is floats self.X = self.X.astype('float32') # label encode target and ensure the values are floats self.y = LabelEncoder().fit_transform(self.y) # number of rows in the dataset def __len__(self): return len(self.X) # get a row at an index def __getitem__(self, idx): return [self.X[idx], self.y[idx]] # get indexes for train and test rows def get_splits(self, n_test=0.33): # determine sizes test_size = round(n_test * len(self.X)) train_size = len(self.X) - test_size # calculate the split return random_split(self, [train_size, test_size]) # model definition class MLP(Module): # define model elements def __init__(self, n_inputs): super(MLP, self).__init__() # input to first hidden layer self.hidden1 = Linear(n_inputs, 10) kaiming_uniform_(self.hidden1.weight, nonlinearity='relu') self.act1 = ReLU() # second hidden layer self.hidden2 = Linear(10, 8) kaiming_uniform_(self.hidden2.weight, nonlinearity='relu') self.act2 = ReLU() # third hidden layer and output self.hidden3 = Linear(8, 3) xavier_uniform_(self.hidden3.weight) self.act3 = Softmax(dim=1) # forward propagate input def forward(self, X): # input to first hidden layer X = self.hidden1(X) X = self.act1(X) # second hidden layer X = self.hidden2(X) X = self.act2(X) # output layer X = self.hidden3(X) X = self.act3(X) return X # prepare the dataset def prepare_data(path): # load the dataset dataset = CSVDataset(path) # calculate split train, test = dataset.get_splits() # prepare data loaders train_dl = DataLoader(train, batch_size=32, shuffle=True) test_dl = DataLoader(test, batch_size=1024, shuffle=False) return train_dl, test_dl # train the model def train_model(train_dl, model): # define the optimization criterion = CrossEntropyLoss() optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9) # enumerate epochs for epoch in range(500): # enumerate mini batches for i, (inputs, targets) in enumerate(train_dl): # clear the gradients optimizer.zero_grad() # compute the model output yhat = model(inputs) # calculate loss loss = criterion(yhat, targets) # credit assignment loss.backward() # update model weights optimizer.step() # evaluate the model def evaluate_model(test_dl, model): predictions, actuals = list(), list() for i, (inputs, targets) in enumerate(test_dl): # evaluate the model on the test set yhat = model(inputs) # retrieve numpy array yhat = yhat.detach().numpy() actual = targets.numpy() # convert to class labels yhat = argmax(yhat, axis=1) # reshape for stacking actual = actual.reshape((len(actual), 1)) yhat = yhat.reshape((len(yhat), 1)) # store predictions.append(yhat) actuals.append(actual) predictions, actuals = vstack(predictions), vstack(actuals) # calculate accuracy acc = accuracy_score(actuals, predictions) return acc # make a class prediction for one row of data def predict(row, model): # convert row to data row = Tensor([row]) # make prediction yhat = model(row) # retrieve numpy array yhat = yhat.detach().numpy() return yhat # prepare the data path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv' train_dl, test_dl = prepare_data(path) print(len(train_dl.dataset), len(test_dl.dataset)) # define the network model = MLP(4) # train the model train_model(train_dl, model) # evaluate the model acc = evaluate_model(test_dl, model) print('Accuracy: %.3f' % acc) # make a single prediction row = [5.1,3.5,1.4,0.2] yhat = predict(row, model) print('Predicted: %s (class=%d)' % (yhat, argmax(yhat)))

Running the example first reports the shape of the train and test datasets, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.

Your specific results will vary given the stochastic nature of the learning algorithm. Try running the example a few times.

**What result did you get?
Can you change the model to do better?**

Post your findings to the comments below.

In this case, we can see that the model achieved a classification accuracy of about 98 percent and then predicted a probability of a row of data belonging to each class, although class 0 has the highest probability.

100 50 Accuracy: 0.980 Predicted: [[9.5524162e-01 4.4516966e-02 2.4138369e-04]] (class=0)

We will use the Boston housing regression dataset to demonstrate an MLP for regression predictive modeling.

This problem involves predicting house value based on properties of the house and neighborhood.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

This is a regression problem that involves predicting a single numeric value. As such, the output layer has a single node and uses the default or linear activation function (no activation function). The mean squared error (mse) loss is minimized when fitting the model.

Recall that this is regression, not classification; therefore, we cannot calculate classification accuracy. For more on this, see the tutorial:

The complete example of fitting and evaluating an MLP on the Boston housing dataset is listed below.

# pytorch mlp for regression from numpy import vstack from numpy import sqrt from pandas import read_csv from sklearn.metrics import mean_squared_error from torch.utils.data import Dataset from torch.utils.data import DataLoader from torch.utils.data import random_split from torch import Tensor from torch.nn import Linear from torch.nn import Sigmoid from torch.nn import Module from torch.optim import SGD from torch.nn import MSELoss from torch.nn.init import xavier_uniform_ # dataset definition class CSVDataset(Dataset): # load the dataset def __init__(self, path): # load the csv file as a dataframe df = read_csv(path, header=None) # store the inputs and outputs self.X = df.values[:, :-1].astype('float32') self.y = df.values[:, -1].astype('float32') # ensure target has the right shape self.y = self.y.reshape((len(self.y), 1)) # number of rows in the dataset def __len__(self): return len(self.X) # get a row at an index def __getitem__(self, idx): return [self.X[idx], self.y[idx]] # get indexes for train and test rows def get_splits(self, n_test=0.33): # determine sizes test_size = round(n_test * len(self.X)) train_size = len(self.X) - test_size # calculate the split return random_split(self, [train_size, test_size]) # model definition class MLP(Module): # define model elements def __init__(self, n_inputs): super(MLP, self).__init__() # input to first hidden layer self.hidden1 = Linear(n_inputs, 10) xavier_uniform_(self.hidden1.weight) self.act1 = Sigmoid() # second hidden layer self.hidden2 = Linear(10, 8) xavier_uniform_(self.hidden2.weight) self.act2 = Sigmoid() # third hidden layer and output self.hidden3 = Linear(8, 1) xavier_uniform_(self.hidden3.weight) # forward propagate input def forward(self, X): # input to first hidden layer X = self.hidden1(X) X = self.act1(X) # second hidden layer X = self.hidden2(X) X = self.act2(X) # third hidden layer and output X = self.hidden3(X) return X # prepare the dataset def prepare_data(path): # load the dataset dataset = CSVDataset(path) # calculate split train, test = dataset.get_splits() # prepare data loaders train_dl = DataLoader(train, batch_size=32, shuffle=True) test_dl = DataLoader(test, batch_size=1024, shuffle=False) return train_dl, test_dl # train the model def train_model(train_dl, model): # define the optimization criterion = MSELoss() optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9) # enumerate epochs for epoch in range(100): # enumerate mini batches for i, (inputs, targets) in enumerate(train_dl): # clear the gradients optimizer.zero_grad() # compute the model output yhat = model(inputs) # calculate loss loss = criterion(yhat, targets) # credit assignment loss.backward() # update model weights optimizer.step() # evaluate the model def evaluate_model(test_dl, model): predictions, actuals = list(), list() for i, (inputs, targets) in enumerate(test_dl): # evaluate the model on the test set yhat = model(inputs) # retrieve numpy array yhat = yhat.detach().numpy() actual = targets.numpy() actual = actual.reshape((len(actual), 1)) # store predictions.append(yhat) actuals.append(actual) predictions, actuals = vstack(predictions), vstack(actuals) # calculate mse mse = mean_squared_error(actuals, predictions) return mse # make a class prediction for one row of data def predict(row, model): # convert row to data row = Tensor([row]) # make prediction yhat = model(row) # retrieve numpy array yhat = yhat.detach().numpy() return yhat # prepare the data path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv' train_dl, test_dl = prepare_data(path) print(len(train_dl.dataset), len(test_dl.dataset)) # define the network model = MLP(13) # train the model train_model(train_dl, model) # evaluate the model mse = evaluate_model(test_dl, model) print('MSE: %.3f, RMSE: %.3f' % (mse, sqrt(mse))) # make a single prediction (expect class=1) row = [0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98] yhat = predict(row, model) print('Predicted: %.3f' % yhat)

Running the example first reports the shape of the train and test datasets, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.

Your specific results will vary given the stochastic nature of the learning algorithm. Try running the example a few times.

**What result did you get?
Can you change the model to do better?**

Post your findings to the comments below.

In this case, we can see that the model achieved a MSE of about 82, which is an RMSE of about nine (units are thousands of dollars). A value of 21 is then predicted for the single example.

339 167 MSE: 82.576, RMSE: 9.087 Predicted: 21.909

Convolutional Neural Networks, or CNNs for short, are a type of network designed for image input.

They are comprised of models with convolutional layers that extract features (called feature maps) and pooling layers that distill features down to the most salient elements.

CNNs are best suited to image classification tasks, although they can be used on a wide array of tasks that take images as input.

A popular image classification task is the MNIST handwritten digit classification. It involves tens of thousands of handwritten digits that must be classified as a number between 0 and 9.

The torchvision API provides a convenience function to download and load this dataset directly.

The example below loads the dataset and plots the first few images.

# load mnist dataset in pytorch from torch.utils.data import DataLoader from torchvision.datasets import MNIST from torchvision.transforms import Compose from torchvision.transforms import ToTensor from matplotlib import pyplot # define location to save or load the dataset path = '~/.torch/datasets/mnist' # define the transforms to apply to the data trans = Compose([ToTensor()]) # download and define the datasets train = MNIST(path, train=True, download=True, transform=trans) test = MNIST(path, train=False, download=True, transform=trans) # define how to enumerate the datasets train_dl = DataLoader(train, batch_size=32, shuffle=True) test_dl = DataLoader(test, batch_size=32, shuffle=True) # get one batch of images i, (inputs, targets) = next(enumerate(train_dl)) # plot some images for i in range(25): # define subplot pyplot.subplot(5, 5, i+1) # plot raw pixel data pyplot.imshow(inputs[i][0], cmap='gray') # show the figure pyplot.show()

Running the example loads the MNIST dataset, then summarizes the default train and test datasets.

Train: X=(60000, 28, 28), y=(60000,) Test: X=(10000, 28, 28), y=(10000,)

A plot is then created showing a grid of examples of handwritten images in the training dataset.

We can train a CNN model to classify the images in the MNIST dataset.

Note that the images are arrays of grayscale pixel data, therefore, we must add a channel dimension to the data before we can use the images as input to the model.

It is a good idea to scale the pixel values from the default range of 0-255 to have a zero mean and a standard deviation of 1. For more on scaling pixel values, see the tutorial:

The complete example of fitting and evaluating a CNN model on the MNIST dataset is listed below.

# pytorch cnn for multiclass classification from numpy import vstack from numpy import argmax from pandas import read_csv from sklearn.metrics import accuracy_score from torchvision.datasets import MNIST from torchvision.transforms import Compose from torchvision.transforms import ToTensor from torchvision.transforms import Normalize from torch.utils.data import DataLoader from torch.nn import Conv2d from torch.nn import MaxPool2d from torch.nn import Linear from torch.nn import ReLU from torch.nn import Softmax from torch.nn import Module from torch.optim import SGD from torch.nn import CrossEntropyLoss from torch.nn.init import kaiming_uniform_ from torch.nn.init import xavier_uniform_ # model definition class CNN(Module): # define model elements def __init__(self, n_channels): super(CNN, self).__init__() # input to first hidden layer self.hidden1 = Conv2d(n_channels, 32, (3,3)) kaiming_uniform_(self.hidden1.weight, nonlinearity='relu') self.act1 = ReLU() # first pooling layer self.pool1 = MaxPool2d((2,2), stride=(2,2)) # second hidden layer self.hidden2 = Conv2d(32, 32, (3,3)) kaiming_uniform_(self.hidden2.weight, nonlinearity='relu') self.act2 = ReLU() # second pooling layer self.pool2 = MaxPool2d((2,2), stride=(2,2)) # fully connected layer self.hidden3 = Linear(5*5*32, 100) kaiming_uniform_(self.hidden3.weight, nonlinearity='relu') self.act3 = ReLU() # output layer self.hidden4 = Linear(100, 10) xavier_uniform_(self.hidden4.weight) self.act4 = Softmax(dim=1) # forward propagate input def forward(self, X): # input to first hidden layer X = self.hidden1(X) X = self.act1(X) X = self.pool1(X) # second hidden layer X = self.hidden2(X) X = self.act2(X) X = self.pool2(X) # flatten X = X.view(-1, 4*4*50) # third hidden layer X = self.hidden3(X) X = self.act3(X) # output layer X = self.hidden4(X) X = self.act4(X) return X # prepare the dataset def prepare_data(path): # define standardization trans = Compose([ToTensor(), Normalize((0.1307,), (0.3081,))]) # load dataset train = MNIST(path, train=True, download=True, transform=trans) test = MNIST(path, train=False, download=True, transform=trans) # prepare data loaders train_dl = DataLoader(train, batch_size=64, shuffle=True) test_dl = DataLoader(test, batch_size=1024, shuffle=False) return train_dl, test_dl # train the model def train_model(train_dl, model): # define the optimization criterion = CrossEntropyLoss() optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9) # enumerate epochs for epoch in range(10): # enumerate mini batches for i, (inputs, targets) in enumerate(train_dl): # clear the gradients optimizer.zero_grad() # compute the model output yhat = model(inputs) # calculate loss loss = criterion(yhat, targets) # credit assignment loss.backward() # update model weights optimizer.step() # evaluate the model def evaluate_model(test_dl, model): predictions, actuals = list(), list() for i, (inputs, targets) in enumerate(test_dl): # evaluate the model on the test set yhat = model(inputs) # retrieve numpy array yhat = yhat.detach().numpy() actual = targets.numpy() # convert to class labels yhat = argmax(yhat, axis=1) # reshape for stacking actual = actual.reshape((len(actual), 1)) yhat = yhat.reshape((len(yhat), 1)) # store predictions.append(yhat) actuals.append(actual) predictions, actuals = vstack(predictions), vstack(actuals) # calculate accuracy acc = accuracy_score(actuals, predictions) return acc # prepare the data path = '~/.torch/datasets/mnist' train_dl, test_dl = prepare_data(path) print(len(train_dl.dataset), len(test_dl.dataset)) # define the network model = CNN(1) # # train the model train_model(train_dl, model) # evaluate the model acc = evaluate_model(test_dl, model) print('Accuracy: %.3f' % acc)

Running the example first reports the shape of the train and test datasets, then fits the model and evaluates it on the test dataset.

**What result did you get?
Can you change the model to do better?**

Post your findings to the comments below.

In this case, we can see that the model achieved a classification accuracy of about 98 percent on the test dataset. We can then see that the model predicted class 5 for the first image in the training set.

60000 10000 Accuracy: 0.985

This section provides more resources on the topic if you are looking to go deeper.

- Deep Learning, 2016.
- Programming PyTorch for Deep Learning: Creating and Deploying Deep Learning Applications, 2018.
- Deep Learning with PyTorch, 2020.
- Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD, 2020.

- PyTorch Homepage.
- PyTorch Documentation
- PyTorch Installation Guide
- PyTorch, Wikipedia.
- PyTorch on GitHub.

In this tutorial, you discovered a step-by-step guide to developing deep learning models in PyTorch.

Specifically, you learned:

- The difference between Torch and PyTorch and how to install and confirm PyTorch is working.
- The five-step life-cycle of PyTorch models and how to define, fit, and evaluate models.
- How to develop PyTorch deep learning models for regression, classification, and predictive modeling tasks.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post PyTorch Tutorial: How to Develop Deep Learning Models with Python appeared first on Machine Learning Mastery.

]]>The post Neural Networks are Function Approximation Algorithms appeared first on Machine Learning Mastery.

]]>Given a dataset comprised of inputs and outputs, we assume that there is an unknown underlying function that is consistent in mapping inputs to outputs in the target domain and resulted in the dataset. We then use supervised learning algorithms to approximate this function.

Neural networks are an example of a supervised machine learning algorithm that is perhaps best understood in the context of function approximation. This can be demonstrated with examples of neural networks approximating simple one-dimensional functions that aid in developing the intuition for what is being learned by the model.

In this tutorial, you will discover the intuition behind neural networks as function approximation algorithms.

After completing this tutorial, you will know:

- Training a neural network on data approximates the unknown underlying mapping function from inputs to outputs.
- One dimensional input and output datasets provide a useful basis for developing the intuitions for function approximation.
- How to develop and evaluate a small neural network for function approximation.

Let’s get started.

This tutorial is divided into three parts; they are:

- What Is Function Approximation
- Definition of a Simple Function
- Approximating a Simple Function

Function approximation is a technique for estimating an unknown underlying function using historical or available observations from the domain.

Artificial neural networks learn to approximate a function.

In supervised learning, a dataset is comprised of inputs and outputs, and the supervised learning algorithm learns how to best map examples of inputs to examples of outputs.

We can think of this mapping as being governed by a mathematical function, called the **mapping function**, and it is this function that a supervised learning algorithm seeks to best approximate.

Neural networks are an example of a supervised learning algorithm and seek to approximate the function represented by your data. This is achieved by calculating the error between the predicted outputs and the expected outputs and minimizing this error during the training process.

It is best to think of feedforward networks as function approximation machines that are designed to achieve statistical generalization, occasionally drawing some insights from what we know about the brain, rather than as models of brain function.

— Page 169, Deep Learning, 2016.

We say “*approximate*” because although we suspect such a mapping function exists, we don’t know anything about it.

The true function that maps inputs to outputs is unknown and is often referred to as the **target function**. It is the target of the learning process, the function we are trying to approximate using only the data that is available. If we knew the target function, we would not need to approximate it, i.e. we would not need a supervised machine learning algorithm. Therefore, function approximation is only a useful tool when the underlying target mapping function is unknown.

All we have are observations from the domain that contain examples of inputs and outputs. This implies things about the size and quality of the data; for example:

- The more examples we have, the more we might be able to figure out about the mapping function.
- The less noise we have in observations, the more crisp approximation we can make of the mapping function.

So why do we like using neural networks for function approximation?

The reason is that they are a **universal approximator**. In theory, they can be used to approximate any function.

… the universal approximation theorem states that a feedforward network with a linear output layer and at least one hidden layer with any “squashing” activation function (such as the logistic sigmoid activation function) can approximate any […] function from one finite-dimensional space to another with any desired non-zero amount of error, provided that the network is given enough hidden units

— Page 198, Deep Learning, 2016.

Regression predictive modeling involves predicting a numerical quantity given inputs. Classification predictive modeling involves predicting a class label given inputs.

Both of these predictive modeling problems can be seen as examples of function approximation.

To make this concrete, we can review a worked example.

In the next section, let’s define a simple function that we can later approximate.

We can define a simple function with one numerical input variable and one numerical output variable and use this as the basis for understanding neural networks for function approximation.

We can define a domain of numbers as our input, such as floating-point values from -50 to 50.

We can then select a mathematical operation to apply to the inputs to get the output values. The selected mathematical operation will be the mapping function, and because we are choosing it, we will know what it is. In practice, this is not the case and is the reason why we would use a supervised learning algorithm like a neural network to learn or discover the mapping function.

In this case, we will use the square of the input as the mapping function, defined as:

- y = x^2

Where *y* is the output variable and *x* is the input variable.

We can develop an intuition for this mapping function by enumerating the values in the range of our input variable and calculating the output value for each input and plotting the result.

The example below implements this in Python.

# example of creating a univariate dataset with a given mapping function from matplotlib import pyplot # define the input data x = [i for i in range(-50,51)] # define the output data y = [i**2.0 for i in x] # plot the input versus the output pyplot.scatter(x,y) pyplot.title('Input (x) versus Output (y)') pyplot.xlabel('Input Variable (x)') pyplot.ylabel('Output Variable (y)') pyplot.show()

Running the example first creates a list of integer values across the entire input domain.

The output values are then calculated using the mapping function, then a plot is created with the input values on the x-axis and the output values on the y-axis.

The input and output variables represent our dataset.

Next, we can then pretend to forget that we know what the mapping function is and use a neural network to re-learn or re-discover the mapping function.

We can fit a neural network model on examples of inputs and outputs and see if the model can learn the mapping function.

This is a very simple mapping function, so we would expect a small neural network could learn it quickly.

We will define the network using the Keras deep learning library and use some data preparation tools from the scikit-learn library.

First, let’s define the dataset.

... # define the dataset x = asarray([i for i in range(-50,51)]) y = asarray([i**2.0 for i in x]) print(x.min(), x.max(), y.min(), y.max())

Next, we can reshape the data so that the input and output variables are columns with one observation per row, as is expected when using supervised learning models.

... # reshape arrays into into rows and cols x = x.reshape((len(x), 1)) y = y.reshape((len(y), 1))

Next, we will need to scale the inputs and the outputs.

The inputs will have a range between -50 and 50, whereas the outputs will have a range between -50^2 (2500) and 0^2 (0). Large input and output values can make training neural networks unstable, therefore, it is a good idea to scale data first.

We can use the MinMaxScaler to separately normalize the input values and the output values to values in the range between 0 and 1.

... # separately scale the input and output variables scale_x = MinMaxScaler() x = scale_x.fit_transform(x) scale_y = MinMaxScaler() y = scale_y.fit_transform(y) print(x.min(), x.max(), y.min(), y.max())

We can now define a neural network model.

With some trial and error, I chose a model with two hidden layers and 10 nodes in each layer. Perhaps experiment with other configurations to see if you can do better.

... # design the neural network model model = Sequential() model.add(Dense(10, input_dim=1, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(10, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1))

We will fit the model using a mean squared loss and use the efficient adam version of stochastic gradient descent to optimize the model.

This means the model will seek to minimize the mean squared error between the predictions made and the expected output values (*y*) while it tries to approximate the mapping function.

... # define the loss function and optimization algorithm model.compile(loss='mse', optimizer='adam')

We don’t have a lot of data (e.g. about 100 rows), so we will fit the model for 500 epochs and use a small batch size of 10.

Again, these values were found after a little trial and error; try different values and see if you can do better.

... # ft the model on the training dataset model.fit(x, y, epochs=500, batch_size=10, verbose=0)

Once fit, we can evaluate the model.

We will make a prediction for each example in the dataset and calculate the error. A perfect approximation would be 0.0. This is not possible in general because of noise in the observations, incomplete data, and complexity of the unknown underlying mapping function.

In this case, it is possible because we have all observations, there is no noise in the data, and the underlying function is not complex.

First, we can make the prediction.

... # make predictions for the input data yhat = model.predict(x)

We then must invert the scaling that we performed.

This is so the error is reported in the original units of the target variable.

... # inverse transforms x_plot = scale_x.inverse_transform(x) y_plot = scale_y.inverse_transform(y) yhat_plot = scale_y.inverse_transform(yhat)

We can then calculate and report the prediction error in the original units of the target variable.

... # report model error print('MSE: %.3f' % mean_squared_error(y_plot, yhat_plot))

Finally, we can create a scatter plot of the real mapping of inputs to outputs and compare it to the mapping of inputs to the predicted outputs and see what the approximation of the mapping function looks like spatially.

This is helpful for developing the intuition behind what neural networks are learning.

... # plot x vs yhat pyplot.scatter(x_plot,yhat_plot, label='Predicted') pyplot.title('Input (x) versus Output (y)') pyplot.xlabel('Input Variable (x)') pyplot.ylabel('Output Variable (y)') pyplot.legend() pyplot.show()

Tying this together, the complete example is listed below.

# example of fitting a neural net on x vs x^2 from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error from keras.models import Sequential from keras.layers import Dense from numpy import asarray from matplotlib import pyplot # define the dataset x = asarray([i for i in range(-50,51)]) y = asarray([i**2.0 for i in x]) print(x.min(), x.max(), y.min(), y.max()) # reshape arrays into into rows and cols x = x.reshape((len(x), 1)) y = y.reshape((len(y), 1)) # separately scale the input and output variables scale_x = MinMaxScaler() x = scale_x.fit_transform(x) scale_y = MinMaxScaler() y = scale_y.fit_transform(y) print(x.min(), x.max(), y.min(), y.max()) # design the neural network model model = Sequential() model.add(Dense(10, input_dim=1, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(10, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1)) # define the loss function and optimization algorithm model.compile(loss='mse', optimizer='adam') # ft the model on the training dataset model.fit(x, y, epochs=500, batch_size=10, verbose=0) # make predictions for the input data yhat = model.predict(x) # inverse transforms x_plot = scale_x.inverse_transform(x) y_plot = scale_y.inverse_transform(y) yhat_plot = scale_y.inverse_transform(yhat) # report model error print('MSE: %.3f' % mean_squared_error(y_plot, yhat_plot)) # plot x vs y pyplot.scatter(x_plot,y_plot, label='Actual') # plot x vs yhat pyplot.scatter(x_plot,yhat_plot, label='Predicted') pyplot.title('Input (x) versus Output (y)') pyplot.xlabel('Input Variable (x)') pyplot.ylabel('Output Variable (y)') pyplot.legend() pyplot.show()

Running the example first reports the range of values for the input and output variables, then the range of the same variables after scaling. This confirms that the scaling operation was performed as we expected.

The model is then fit and evaluated on the dataset.

Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.

In this case, we can see that the mean squared error is about 1,300, in squared units. If we calculate the square root, this gives us the root mean squared error (RMSE) in the original units. We can see that the average error is about 36 units, which is fine, but not great.

**What results did you get?** Can you do better?

Let me know in the comments below.

-50 50 0.0 2500.0 0.0 1.0 0.0 1.0 MSE: 1300.776

A scatter plot is then created comparing the inputs versus the real outputs, and the inputs versus the predicted outputs.

The difference between these two data series is the error in the approximation of the mapping function. We can see that the approximation is reasonable; it captures the general shape. We can see that there are errors, especially around the 0 input values.

This suggests that there is plenty of room for improvement, such as using a different activation function or different network architecture to better approximate the mapping function.

This section provides more resources on the topic if you are looking to go deeper.

- Deep Learning, 2016.

In this tutorial, you discovered the intuition behind neural networks as function approximation algorithms.

Specifically, you learned:

- Training a neural network on data approximates the unknown underlying mapping function from inputs to outputs.
- One dimensional input and output datasets provide a useful basis for developing the intuitions for function approximation.
- How to develop and evaluate a small neural network for function approximation.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post Neural Networks are Function Approximation Algorithms appeared first on Machine Learning Mastery.

]]>The post TensorFlow 2 Tutorial: Get Started in Deep Learning With tf.keras appeared first on Machine Learning Mastery.

]]>TensorFlow is the premier open-source deep learning framework developed and maintained by Google. Although using TensorFlow directly can be challenging, the modern tf.keras API beings the simplicity and ease of use of Keras to the TensorFlow project.

Using tf.keras allows you to design, fit, evaluate, and use deep learning models to make predictions in just a few lines of code. It makes common deep learning tasks, such as classification and regression predictive modeling, accessible to average developers looking to get things done.

In this tutorial, you will discover a step-by-step guide to developing deep learning models in TensorFlow using the tf.keras API.

After completing this tutorial, you will know:

- The difference between Keras and tf.keras and how to install and confirm TensorFlow is working.
- The 5-step life-cycle of tf.keras models and how to use the sequential and functional APIs.
- How to develop MLP, CNN, and RNN models with tf.keras for regression, classification, and time series forecasting.
- How to use the advanced features of the tf.keras API to inspect and diagnose your model.
- How to improve the performance of your tf.keras model by reducing overfitting and accelerating training.

This is a large tutorial, and a lot of fun. You might want to bookmark it.

The examples are small and focused; you can finish this tutorial in about 60 minutes.

Let’s get started.

This tutorial is designed to be your complete introduction to tf.keras for your deep learning project.

The focus is on using the API for common deep learning model development tasks; we will not be diving into the math and theory of deep learning. For that, I recommend starting with this excellent book.

The best way to learn deep learning in python is by doing. Dive in. You can circle back for more theory later.

I have designed each code example to use best practices and to be standalone so that you can copy and paste it directly into your project and adapt it to your specific needs. This will give you a massive head start over trying to figure out the API from official documentation alone.

It is a large tutorial and as such, it is divided into five parts; they are:

- Install TensorFlow and tf.keras
- What Are Keras and tf.keras?
- How to Install TensorFlow
- How to Confirm TensorFlow Is Installed

- Deep Learning Model Life-Cycle
- The 5-Step Model Life-Cycle
- Sequential Model API (Simple)
- Functional Model API (Advanced)

- How to Develop Deep Learning Models
- Develop Multilayer Perceptron Models
- Develop Convolutional Neural Network Models
- Develop Recurrent Neural Network Models

- How to Use Advanced Model Features
- How to Visualize a Deep Learning Model
- How to Plot Model Learning Curves
- How to Save and Load Your Model

- How to Get Better Model Performance
- How to Reduce Overfitting With Dropout
- How to Accelerate Training With Batch Normalization
- How to Halt Training at the Right Time With Early Stopping

Work through the tutorial at your own pace.

**You do not need to understand everything (at least not right now)**. Your goal is to run through the tutorial end-to-end and get results. You do not need to understand everything on the first pass. List down your questions as you go. Make heavy use of the API documentation to learn about all of the functions that you’re using.

**You do not need to know the math first**. Math is a compact way of describing how algorithms work, specifically tools from linear algebra, probability, and statistics. These are not the only tools that you can use to learn how algorithms work. You can also use code and explore algorithm behavior with different inputs and outputs. Knowing the math will not tell you what algorithm to choose or how to best configure it. You can only discover that through careful, controlled experiments.

**You do not need to know how the algorithms work**. It is important to know about the limitations and how to configure deep learning algorithms. But learning about algorithms can come later. You need to build up this algorithm knowledge slowly over a long period of time. Today, start by getting comfortable with the platform.

**You do not need to be a Python programmer**. The syntax of the Python language can be intuitive if you are new to it. Just like other languages, focus on function calls (e.g. function()) and assignments (e.g. a = “b”). This will get you most of the way. You are a developer, so you know how to pick up the basics of a language really fast. Just get started and dive into the details later.

**You do not need to be a deep learning expert**. You can learn about the benefits and limitations of various algorithms later, and there are plenty of posts that you can read later to brush up on the steps of a deep learning project and the importance of evaluating model skill using cross-validation.

In this section, you will discover what tf.keras is, how to install it, and how to confirm that it is installed correctly.

Keras is an open-source deep learning library written in Python.

The project was started in 2015 by Francois Chollet. It quickly became a popular framework for developers, becoming one of, if not the most, popular deep learning libraries.

During the period of 2015-2019, developing deep learning models using mathematical libraries like TensorFlow, Theano, and PyTorch was cumbersome, requiring tens or even hundreds of lines of code to achieve the simplest tasks. The focus of these libraries was on research, flexibility, and speed, not ease of use.

Keras was popular because the API was clean and simple, allowing standard deep learning models to be defined, fit, and evaluated in just a few lines of code.

A secondary reason Keras took-off was because it allowed you to use any one among the range of popular deep learning mathematical libraries as the backend (e.g. used to perform the computation), such as TensorFlow, Theano, and later, CNTK. This allowed the power of these libraries to be harnessed (e.g. GPUs) with a very clean and simple interface.

In 2019, Google released a new version of their TensorFlow deep learning library (TensorFlow 2) that integrated the Keras API directly and promoted this interface as the default or standard interface for deep learning development on the platform.

This integration is commonly referred to as the *tf.keras* interface or API (“*tf*” is short for “*TensorFlow*“). This is to distinguish it from the so-called standalone Keras open source project.

**Standalone Keras**. The standalone open source project that supports TensorFlow, Theano and CNTK backends.**tf.keras**. The Keras API integrated into TensorFlow 2.

The Keras API implementation in Keras is referred to as “*tf.keras*” because this is the Python idiom used when referencing the API. First, the TensorFlow module is imported and named “*tf*“; then, Keras API elements are accessed via calls to *tf.keras*; for example:

# example of tf.keras python idiom import tensorflow as tf # use keras API model = tf.keras.Sequential() ...

I generally don’t use this idiom myself; I don’t think it reads cleanly.

Given that TensorFlow was the de facto standard backend for the Keras open source project, the integration means that a single library can now be used instead of two separate libraries. Further, the standalone Keras project now recommends all future Keras development use the *tf.keras* API.

At this time, we recommend that Keras users who use multi-backend Keras with the TensorFlow backend switch to tf.keras in TensorFlow 2.0. tf.keras is better maintained and has better integration with TensorFlow features (eager execution, distribution support and other).

— Keras Project Homepage, Accessed December 2019.

Before installing TensorFlow, ensure that you have Python installed, such as Python 3.6 or higher.

If you don’t have Python installed, you can install it using Anaconda. This tutorial will show you how:

There are many ways to install the TensorFlow open-source deep learning library.

The most common, and perhaps the simplest, way to install TensorFlow on your workstation is by using *pip*.

For example, on the command line, you can type:

sudo pip install tensorflow

If you prefer to use an installation method more specific to your platform or package manager, you can see a complete list of installation instructions here:

There is no need to set up the GPU now.

All examples in this tutorial will work just fine on a modern CPU. If you want to configure TensorFlow for your GPU, you can do that after completing this tutorial. Don’t get distracted!

Once TensorFlow is installed, it is important to confirm that the library was installed successfully and that you can start using it.

*Don’t skip this step*.

If TensorFlow is not installed correctly or raises an error on this step, you won’t be able to run the examples later.

Create a new file called *versions.py* and copy and paste the following code into the file.

# check version import tensorflow print(tensorflow.__version__)

Save the file, then open your command line and change directory to where you saved the file.

Then type:

python versions.py

You should then see output like the following:

2.0.0

This confirms that TensorFlow is installed correctly and that we are all using the same version.

**What version did you get? **

Post your output in the comments below.

This also shows you how to run a Python script from the command line. I recommend running all code from the command line in this manner, and not from a notebook or an IDE.

Sometimes when you use the *tf.keras* API, you may see warnings printed.

This might include messages that your hardware supports features that your TensorFlow installation was not configured to use.

Some examples on my workstation include:

Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA XLA service 0x7fde3f2e6180 executing computations on platform Host. Devices: StreamExecutor device (0): Host, Default Version

They are not your fault. **You did nothing wrong**.

These are information messages and they will not prevent the execution of your code. You can safely ignore messages of this type for now.

It’s an intentional design decision made by the TensorFlow team to show these warning messages. A downside of this decision is that it confuses beginners and it trains developers to ignore all messages, including those that potentially may impact the execution.

Now that you know what tf.keras is, how to install TensorFlow, and how to confirm your development environment is working, let’s look at the life-cycle of deep learning models in TensorFlow.

In this section, you will discover the life-cycle for a deep learning model and the two tf.keras APIs that you can use to define models.

A model has a life-cycle, and this very simple knowledge provides the backbone for both modeling a dataset and understanding the tf.keras API.

The five steps in the life-cycle are as follows:

- Define the model.
- Compile the model.
- Fit the model.
- Evaluate the model.
- Make predictions.

Let’s take a closer look at each step in turn.

Defining the model requires that you first select the type of model that you need and then choose the architecture or network topology.

From an API perspective, this involves defining the layers of the model, configuring each layer with a number of nodes and activation function, and connecting the layers together into a cohesive model.

Models can be defined either with the Sequential API or the Functional API, and we will take a look at this in the next section.

... # define the model model = ...

Compiling the model requires that you first select a loss function that you want to optimize, such as mean squared error or cross-entropy.

It also requires that you select an algorithm to perform the optimization procedure, typically stochastic gradient descent, or a modern variation, such as Adam. It may also require that you select any performance metrics to keep track of during the model training process.

From an API perspective, this involves calling a function to compile the model with the chosen configuration, which will prepare the appropriate data structures required for the efficient use of the model you have defined.

The optimizer can be specified as a string for a known optimizer class, e.g. ‘*sgd*‘ for stochastic gradient descent, or you can configure an instance of an optimizer class and use that.

For a list of supported optimizers, see this:

... # compile the model opt = SGD(learning_rate=0.01, momentum=0.9) model.compile(optimizer=opt, loss='binary_crossentropy')

The three most common loss functions are:

- ‘
*binary_crossentropy*‘ for binary classification. - ‘
*sparse_categorical_crossentropy*‘ for multi-class classification. - ‘
*mse*‘ (mean squared error) for regression.

... # compile the model model.compile(optimizer='sgd', loss='mse')

For a list of supported loss functions, see:

Metrics are defined as a list of strings for known metric functions or a list of functions to call to evaluate predictions.

For a list of supported metrics, see:

... # compile the model model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])

Fitting the model requires that you first select the training configuration, such as the number of epochs (loops through the training dataset) and the batch size (number of samples in an epoch used to estimate model error).

Training applies the chosen optimization algorithm to minimize the chosen loss function and updates the model using the backpropagation of error algorithm.

Fitting the model is the slow part of the whole process and can take seconds to hours to days, depending on the complexity of the model, the hardware you’re using, and the size of the training dataset.

From an API perspective, this involves calling a function to perform the training process. This function will block (not return) until the training process has finished.

... # fit the model model.fit(X, y, epochs=100, batch_size=32)

For help on how to choose the batch size, see this tutorial:

While fitting the model, a progress bar will summarize the status of each epoch and the overall training process. This can be simplified to a simple report of model performance each epoch by setting the “*verbose*” argument to 2. All output can be turned off during training by setting “*verbose*” to 0.

... # fit the model model.fit(X, y, epochs=100, batch_size=32, verbose=0)

Evaluating the model requires that you first choose a holdout dataset used to evaluate the model. This should be data not used in the training process so that we can get an unbiased estimate of the performance of the model when making predictions on new data.

The speed of model evaluation is proportional to the amount of data you want to use for the evaluation, although it is much faster than training as the model is not changed.

From an API perspective, this involves calling a function with the holdout dataset and getting a loss and perhaps other metrics that can be reported.

... # evaluate the model loss = model.evaluate(X, y, verbose=0)

Making a prediction is the final step in the life-cycle. It is why we wanted the model in the first place.

It requires you have new data for which a prediction is required, e.g. where you do not have the target values.

From an API perspective, you simply call a function to make a prediction of a class label, probability, or numerical value: whatever you designed your model to predict.

You may want to save the model and later load it to make predictions. You may also choose to fit a model on all of the available data before you start using it.

Now that we are familiar with the model life-cycle, let’s take a look at the two main ways to use the tf.keras API to build models: sequential and functional.

... # make a prediction yhat = model.predict(X)

The sequential model API is the simplest and is the API that I recommend, especially when getting started.

It is referred to as “*sequential*” because it involves defining a Sequential class and adding layers to the model one by one in a linear manner, from input to output.

The example below defines a Sequential MLP model that accepts eight inputs, has one hidden layer with 10 nodes and then an output layer with one node to predict a numerical value.

# example of a model defined with the sequential api from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense # define the model model = Sequential() model.add(Dense(10, input_shape=(8,))) model.add(Dense(1))

Note that the visible layer of the network is defined by the “*input_shape*” argument on the first hidden layer. That means in the above example, the model expects the input for one sample to be a vector of eight numbers.

The sequential API is easy to use because you keep calling *model.add()* until you have added all of your layers.

For example, here is a deep MLP with five hidden layers.

# example of a model defined with the sequential api from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense # define the model model = Sequential() model.add(Dense(100, input_shape=(8,))) model.add(Dense(80)) model.add(Dense(30)) model.add(Dense(10)) model.add(Dense(5)) model.add(Dense(1))

The functional API is more complex but is also more flexible.

It involves explicitly connecting the output of one layer to the input of another layer. Each connection is specified.

First, an input layer must be defined via the *Input* class, and the shape of an input sample is specified. We must retain a reference to the input layer when defining the model.

... # define the layers x_in = Input(shape=(8,))

Next, a fully connected layer can be connected to the input by calling the layer and passing the input layer. This will return a reference to the output connection in this new layer.

... x = Dense(10)(x_in)

We can then connect this to an output layer in the same manner.

... x_out = Dense(1)(x)

Once connected, we define a Model object and specify the input and output layers. The complete example is listed below.

# example of a model defined with the functional api from tensorflow.keras import Model from tensorflow.keras import Input from tensorflow.keras.layers import Dense # define the layers x_in = Input(shape=(8,)) x = Dense(10)(x_in) x_out = Dense(1)(x) # define the model model = Model(inputs=x_in, outputs=x_out)

As such, it allows for more complicated model designs, such as models that may have multiple input paths (separate vectors) and models that have multiple output paths (e.g. a word and a number).

The functional API can be a lot of fun when you get used to it.

For more on the functional API, see:

Now that we are familiar with the model life-cycle and the two APIs that can be used to define models, let’s look at developing some standard models.

In this section, you will discover how to develop, evaluate, and make predictions with standard deep learning models, including Multilayer Perceptrons (MLP), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs).

A Multilayer Perceptron model, or MLP for short, is a standard fully connected neural network model.

It is comprised of layers of nodes where each node is connected to all outputs from the previous layer and the output of each node is connected to all inputs for nodes in the next layer.

An MLP is created by with one or more *Dense* layers. This model is appropriate for tabular data, that is data as it looks in a table or spreadsheet with one column for each variable and one row for each variable. There are three predictive modeling problems you may want to explore with an MLP; they are binary classification, multiclass classification, and regression.

Let’s fit a model on a real dataset for each of these cases.

Note, the models in this section are effective, but not optimized. See if you can improve their performance. Post your findings in the comments below.

We will use the Ionosphere binary (two-class) classification dataset to demonstrate an MLP for binary classification.

This dataset involves predicting whether a structure is in the atmosphere or not given radar returns.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

We will use a LabelEncoder to encode the string labels to integer values 0 and 1. The model will be fit on 67 percent of the data, and the remaining 33 percent will be used for evaluation, split using the train_test_split() function.

It is a good practice to use ‘*relu*‘ activation with a ‘*he_normal*‘ weight initialization. This combination goes a long way to overcome the problem of vanishing gradients when training deep neural network models. For more on ReLU, see the tutorial:

The model predicts the probability of class 1 and uses the sigmoid activation function. The model is optimized using the adam version of stochastic gradient descent and seeks to minimize the cross-entropy loss.

The complete example is listed below.

# mlp for binary classification from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv' df = read_csv(path, header=None) # split into input and output columns X, y = df.values[:, :-1], df.values[:, -1] # ensure all data are floating point values X = X.astype('float32') # encode strings to integer y = LabelEncoder().fit_transform(y) # split into train and test datasets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) # determine the number of input features n_features = X_train.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(8, activation='relu', kernel_initializer='he_normal')) model.add(Dense(1, activation='sigmoid')) # compile the model model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # fit the model model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0) # evaluate the model loss, acc = model.evaluate(X_test, y_test, verbose=0) print('Test Accuracy: %.3f' % acc) # make a prediction row = [1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1,0.03760,0.85243,-0.17755,0.59755,-0.44945,0.60536,-0.38223,0.84356,-0.38542,0.58212,-0.32192,0.56971,-0.29674,0.36946,-0.47357,0.56811,-0.51171,0.41078,-0.46168,0.21266,-0.34090,0.42267,-0.54487,0.18641,-0.45300] yhat = model.predict([row]) print('Predicted: %.3f' % yhat)

Running the example first reports the shape of the dataset, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.

**What results did you get?** Can you change the model to do better?

Post your findings to the comments below.

In this case, we can see that the model achieved a classification accuracy of about 94 percent and then predicted a probability of 0.9 that the one row of data belongs to class 1.

(235, 34) (116, 34) (235,) (116,) Test Accuracy: 0.940 Predicted: 0.991

We will use the Iris flowers multiclass classification dataset to demonstrate an MLP for multiclass classification.

This problem involves predicting the species of iris flower given measures of the flower.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

Given that it is a multiclass classification, the model must have one node for each class in the output layer and use the softmax activation function. The loss function is the ‘*sparse_categorical_crossentropy*‘, which is appropriate for integer encoded class labels (e.g. 0 for one class, 1 for the next class, etc.)

The complete example of fitting and evaluating an MLP on the iris flowers dataset is listed below.

# mlp for multiclass classification from numpy import argmax from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv' df = read_csv(path, header=None) # split into input and output columns X, y = df.values[:, :-1], df.values[:, -1] # ensure all data are floating point values X = X.astype('float32') # encode strings to integer y = LabelEncoder().fit_transform(y) # split into train and test datasets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) # determine the number of input features n_features = X_train.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(8, activation='relu', kernel_initializer='he_normal')) model.add(Dense(3, activation='softmax')) # compile the model model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # fit the model model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0) # evaluate the model loss, acc = model.evaluate(X_test, y_test, verbose=0) print('Test Accuracy: %.3f' % acc) # make a prediction row = [5.1,3.5,1.4,0.2] yhat = model.predict([row]) print('Predicted: %s (class=%d)' % (yhat, argmax(yhat)))

Running the example first reports the shape of the dataset, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.

**What results did you get?** Can you change the model to do better?

Post your findings to the comments below.

In this case, we can see that the model achieved a classification accuracy of about 98 percent and then predicted a probability of a row of data belonging to each class, although class 0 has the highest probability.

(100, 4) (50, 4) (100,) (50,) Test Accuracy: 0.980 Predicted: [[0.8680804 0.12356871 0.00835086]] (class=0)

We will use the Boston housing regression dataset to demonstrate an MLP for regression predictive modeling.

This problem involves predicting house value based on properties of the house and neighborhood.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

This is a regression problem that involves predicting a single numerical value. As such, the output layer has a single node and uses the default or linear activation function (no activation function). The mean squared error (mse) loss is minimized when fitting the model.

Recall that this is a regression, not classification; therefore, we cannot calculate classification accuracy. For more on this, see the tutorial:

The complete example of fitting and evaluating an MLP on the Boston housing dataset is listed below.

# mlp for regression from numpy import sqrt from pandas import read_csv from sklearn.model_selection import train_test_split from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv' df = read_csv(path, header=None) # split into input and output columns X, y = df.values[:, :-1], df.values[:, -1] # split into train and test datasets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) # determine the number of input features n_features = X_train.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(8, activation='relu', kernel_initializer='he_normal')) model.add(Dense(1)) # compile the model model.compile(optimizer='adam', loss='mse') # fit the model model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0) # evaluate the model error = model.evaluate(X_test, y_test, verbose=0) print('MSE: %.3f, RMSE: %.3f' % (error, sqrt(error))) # make a prediction row = [0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98] yhat = model.predict([row]) print('Predicted: %.3f' % yhat)

Running the example first reports the shape of the dataset then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.

**What results did you get?** Can you change the model to do better?

Post your findings to the comments below.

In this case, we can see that the model achieved an MSE of about 60 which is an RMSE of about 7 (units are thousands of dollars). A value of about 26 is then predicted for the single example.

(339, 13) (167, 13) (339,) (167,) MSE: 60.751, RMSE: 7.794 Predicted: 26.983

Convolutional Neural Networks, or CNNs for short, are a type of network designed for image input.

They are comprised of models with convolutional layers that extract features (called feature maps) and pooling layers that distill features down to the most salient elements.

CNNs are most well-suited to image classification tasks, although they can be used on a wide array of tasks that take images as input.

A popular image classification task is the MNIST handwritten digit classification. It involves tens of thousands of handwritten digits that must be classified as a number between 0 and 9.

The tf.keras API provides a convenience function to download and load this dataset directly.

The example below loads the dataset and plots the first few images.

# example of loading and plotting the mnist dataset from tensorflow.keras.datasets.mnist import load_data from matplotlib import pyplot # load dataset (trainX, trainy), (testX, testy) = load_data() # summarize loaded dataset print('Train: X=%s, y=%s' % (trainX.shape, trainy.shape)) print('Test: X=%s, y=%s' % (testX.shape, testy.shape)) # plot first few images for i in range(25): # define subplot pyplot.subplot(5, 5, i+1) # plot raw pixel data pyplot.imshow(trainX[i], cmap=pyplot.get_cmap('gray')) # show the figure pyplot.show()

Running the example loads the MNIST dataset, then summarizes the default train and test datasets.

Train: X=(60000, 28, 28), y=(60000,) Test: X=(10000, 28, 28), y=(10000,)

A plot is then created showing a grid of examples of handwritten images in the training dataset.

We can train a CNN model to classify the images in the MNIST dataset.

Note that the images are arrays of grayscale pixel data; therefore, we must add a channel dimension to the data before we can use the images as input to the model. The reason is that CNN models expect images in a channels-last format, that is each example to the network has the dimensions [rows, columns, channels], where channels represent the color channels of the image data.

It is also a good idea to scale the pixel values from the default range of 0-255 to 0-1 when training a CNN. For more on scaling pixel values, see the tutorial:

The complete example of fitting and evaluating a CNN model on the MNIST dataset is listed below.

# example of a cnn for image classification from numpy import unique from numpy import argmax from tensorflow.keras.datasets.mnist import load_data from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.layers import Conv2D from tensorflow.keras.layers import MaxPool2D from tensorflow.keras.layers import Flatten from tensorflow.keras.layers import Dropout # load dataset (x_train, y_train), (x_test, y_test) = load_data() # reshape data to have a single channel x_train = x_train.reshape((x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)) x_test = x_test.reshape((x_test.shape[0], x_test.shape[1], x_test.shape[2], 1)) # determine the shape of the input images in_shape = x_train.shape[1:] # determine the number of classes n_classes = len(unique(y_train)) print(in_shape, n_classes) # normalize pixel values x_train = x_train.astype('float32') / 255.0 x_test = x_test.astype('float32') / 255.0 # define model model = Sequential() model.add(Conv2D(32, (3,3), activation='relu', kernel_initializer='he_uniform', input_shape=in_shape)) model.add(MaxPool2D((2, 2))) model.add(Flatten()) model.add(Dense(100, activation='relu', kernel_initializer='he_uniform')) model.add(Dropout(0.5)) model.add(Dense(n_classes, activation='softmax')) # define loss and optimizer model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # fit the model model.fit(x_train, y_train, epochs=10, batch_size=128, verbose=0) # evaluate the model loss, acc = model.evaluate(x_test, y_test, verbose=0) print('Accuracy: %.3f' % acc) # make a prediction image = x_train[0] yhat = model.predict([[image]]) print('Predicted: class=%d' % argmax(yhat))

Running the example first reports the shape of the dataset, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single image.

**What results did you get?** Can you change the model to do better?

Post your findings to the comments below.

First, the shape of each image is reported along with the number of classes; we can see that each image is 28×28 pixels and there are 10 classes as we expected.

In this case, we can see that the model achieved a classification accuracy of about 98 percent on the test dataset. We can then see that the model predicted class 5 for the first image in the training set.

(28, 28, 1) 10 Accuracy: 0.987 Predicted: class=5

Recurrent Neural Networks, or RNNs for short, are designed to operate upon sequences of data.

They have proven to be very effective for natural language processing problems where sequences of text are provided as input to the model. RNNs have also seen some modest success for time series forecasting and speech recognition.

The most popular type of RNN is the Long Short-Term Memory network, or LSTM for short. LSTMs can be used in a model to accept a sequence of input data and make a prediction, such as assign a class label or predict a numerical value like the next value or values in the sequence.

We will use the car sales dataset to demonstrate an LSTM RNN for univariate time series forecasting.

This problem involves predicting the number of car sales per month.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

We will frame the problem to take a window of the last five months of data to predict the current month’s data.

To achieve this, we will define a new function named *split_sequence()* that will split the input sequence into windows of data appropriate for fitting a supervised learning model, like an LSTM.

For example, if the sequence was:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Then the samples for training the model will look like:

Input Output 1, 2, 3, 4, 5 6 2, 3, 4, 5, 6 7 3, 4, 5, 6, 7 8 ...

We will use the last 12 months of data as the test dataset.

LSTMs expect each sample in the dataset to have two dimensions; the first is the number of time steps (in this case it is 5), and the second is the number of observations per time step (in this case it is 1).

Because it is a regression type problem, we will use a linear activation function (no activation

function) in the output layer and optimize the mean squared error loss function. We will also evaluate the model using the mean absolute error (MAE) metric.

The complete example of fitting and evaluating an LSTM for a univariate time series forecasting problem is listed below.

# lstm for time series forecasting from numpy import sqrt from numpy import asarray from pandas import read_csv from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.layers import LSTM # split a univariate sequence into samples def split_sequence(sequence, n_steps): X, y = list(), list() for i in range(len(sequence)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the sequence if end_ix > len(sequence)-1: break # gather input and output parts of the pattern seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] X.append(seq_x) y.append(seq_y) return asarray(X), asarray(y) # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv' df = read_csv(path, header=0, index_col=0, squeeze=True) # retrieve the values values = df.values.astype('float32') # specify the window size n_steps = 5 # split into samples X, y = split_sequence(values, n_steps) # reshape into [samples, timesteps, features] X = X.reshape((X.shape[0], X.shape[1], 1)) # split into train/test n_test = 12 X_train, X_test, y_train, y_test = X[:-n_test], X[-n_test:], y[:-n_test], y[-n_test:] print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) # define model model = Sequential() model.add(LSTM(100, activation='relu', kernel_initializer='he_normal', input_shape=(n_steps,1))) model.add(Dense(50, activation='relu', kernel_initializer='he_normal')) model.add(Dense(50, activation='relu', kernel_initializer='he_normal')) model.add(Dense(1)) # compile the model model.compile(optimizer='adam', loss='mse', metrics=['mae']) # fit the model model.fit(X_train, y_train, epochs=350, batch_size=32, verbose=2, validation_data=(X_test, y_test)) # evaluate the model mse, mae = model.evaluate(X_test, y_test, verbose=0) print('MSE: %.3f, RMSE: %.3f, MAE: %.3f' % (mse, sqrt(mse), mae)) # make a prediction row = asarray([18024.0, 16722.0, 14385.0, 21342.0, 17180.0]).reshape((1, n_steps, 1)) yhat = model.predict(row) print('Predicted: %.3f' % (yhat))

Running the example first reports the shape of the dataset, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single example.

**What results did you get?** Can you change the model to do better?

Post your findings to the comments below.

First, the shape of the train and test datasets is displayed, confirming that the last 12 examples are used for model evaluation.

In this case, the model achieved an MAE of about 2,800 and predicted the next value in the sequence from the test set as 13,199, where the expected value is 14,577 (pretty close).

(91, 5, 1) (12, 5, 1) (91,) (12,) MSE: 12755421.000, RMSE: 3571.473, MAE: 2856.084 Predicted: 13199.325

**Note**: it is good practice to scale and make the series stationary the data prior to fitting the model. I recommend this as an extension in order to achieve better performance. For more on preparing time series data for modeling, see the tutorial:

In this section, you will discover how to use some of the slightly more advanced model features, such as reviewing learning curves and saving models for later use.

The architecture of deep learning models can quickly become large and complex.

As such, it is important to have a clear idea of the connections and data flow in your model. This is especially important if you are using the functional API to ensure you have indeed connected the layers of the model in the way you intended.

There are two tools you can use to visualize your model: a text description and a plot.

A text description of your model can be displayed by calling the summary() function on your model.

The example below defines a small model with three layers and then summarizes the structure.

# example of summarizing a model from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(8,))) model.add(Dense(8, activation='relu', kernel_initializer='he_normal')) model.add(Dense(1, activation='sigmoid')) # summarize the model model.summary()

Running the example prints a summary of each layer, as well as a total summary.

This is an invaluable diagnostic for checking the output shapes and number of parameters (weights) in your model.

Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 10) 90 _________________________________________________________________ dense_1 (Dense) (None, 8) 88 _________________________________________________________________ dense_2 (Dense) (None, 1) 9 ================================================================= Total params: 187 Trainable params: 187 Non-trainable params: 0 _________________________________________________________________

You can create a plot of your model by calling the plot_model() function.

This will create an image file that contains a box and line diagram of the layers in your model.

The example below creates a small three-layer model and saves a plot of the model architecture to ‘*model.png*‘ that includes input and output shapes.

# example of plotting a model from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.utils import plot_model # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(8,))) model.add(Dense(8, activation='relu', kernel_initializer='he_normal')) model.add(Dense(1, activation='sigmoid')) # summarize the model plot_model(model, 'model.png', show_shapes=True)

Running the example creates a plot of the model showing a box for each layer with shape information, and arrows that connect the layers, showing the flow of data through the network.

Learning curves are a plot of neural network model performance over time, such as calculated at the end of each training epoch.

Plots of learning curves provide insight into the learning dynamics of the model, such as whether the model is learning well, whether it is underfitting the training dataset, or whether it is overfitting the training dataset.

For a gentle introduction to learning curves and how to use them to diagnose learning dynamics of models, see the tutorial:

You can easily create learning curves for your deep learning models.

First, you must update your call to the fit function to include reference to a validation dataset. This is a portion of the training set not used to fit the model, and is instead used to evaluate the performance of the model during training.

You can split the data manually and specify the *validation_data* argument, or you can use the *validation_split* argument and specify a percentage split of the training dataset and let the API perform the split for you. The latter is simpler for now.

The fit function will return a *history* object that contains a trace of performance metrics recorded at the end of each training epoch. This includes the chosen loss function and each configured metric, such as accuracy, and each loss and metric is calculated for the training and validation datasets.

A learning curve is a plot of the loss on the training dataset and the validation dataset. We can create this plot from the *history* object using the Matplotlib library.

The example below fits a small neural network on a synthetic binary classification problem. A validation split of 30 percent is used to evaluate the model during training and the cross-entropy loss on the train and validation datasets are then graphed using a line plot.

# example of plotting learning curves from sklearn.datasets import make_classification from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.optimizers import SGD from matplotlib import pyplot # create the dataset X, y = make_classification(n_samples=1000, n_classes=2, random_state=1) # determine the number of input features n_features = X.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(1, activation='sigmoid')) # compile the model sgd = SGD(learning_rate=0.001, momentum=0.8) model.compile(optimizer=sgd, loss='binary_crossentropy') # fit the model history = model.fit(X, y, epochs=100, batch_size=32, verbose=0, validation_split=0.3) # plot learning curves pyplot.title('Learning Curves') pyplot.xlabel('Epoch') pyplot.ylabel('Cross Entropy') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='val') pyplot.legend() pyplot.show()

Running the example fits the model on the dataset. At the end of the run, the *history* object is returned and used as the basis for creating the line plot.

The cross-entropy loss for the training dataset is accessed via the ‘*loss*‘ key and the loss on the validation dataset is accessed via the ‘*val_loss*‘ key on the history attribute of the history object.

Training and evaluating models is great, but we may want to use a model later without retraining it each time.

This can be achieved by saving the model to file and later loading it and using it to make predictions.

This can be achieved using the *save()* function on the model to save the model. It can be loaded later using the load_model() function.

The model is saved in H5 format, an efficient array storage format. As such, you must ensure that the h5py library is installed on your workstation. This can be achieved using *pip*; for example:

pip install h5py

The example below fits a simple model on a synthetic binary classification problem and then saves the model file.

# example of saving a fit model from sklearn.datasets import make_classification from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.optimizers import SGD # create the dataset X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=1) # determine the number of input features n_features = X.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(1, activation='sigmoid')) # compile the model sgd = SGD(learning_rate=0.001, momentum=0.8) model.compile(optimizer=sgd, loss='binary_crossentropy') # fit the model model.fit(X, y, epochs=100, batch_size=32, verbose=0, validation_split=0.3) # save model to file model.save('model.h5')

Running the example fits the model and saves it to file with the name ‘*model.h5*‘.

We can then load the model and use it to make a prediction, or continue training it, or do whatever we wish with it.

The example below loads the model and uses it to make a prediction.

# example of loading a saved model from sklearn.datasets import make_classification from tensorflow.keras.models import load_model # create the dataset X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=1) # load the model from file model = load_model('model.h5') # make a prediction row = [1.91518414, 1.14995454, -1.52847073, 0.79430654] yhat = model.predict([row]) print('Predicted: %.3f' % yhat[0])

Running the example loads the image from file, then uses it to make a prediction on a new row of data and prints the result.

Predicted: 0.831

In this section, you will discover some of the techniques that you can use to improve the performance of your deep learning models.

A big part of improving deep learning performance involves avoiding overfitting by slowing down the learning process or stopping the learning process at the right time.

Dropout is a clever regularization method that reduces overfitting of the training dataset and makes the model more robust.

This is achieved during training, where some number of layer outputs are randomly ignored or “*dropped out*.” This has the effect of making the layer look like – and be treated like – a layer with a different number of nodes and connectivity to the prior layer.

Dropout has the effect of making the training process noisy, forcing nodes within a layer to probabilistically take on more or less responsibility for the inputs.

For more on how dropout works, see this tutorial:

You can add dropout to your models as a new layer prior to the layer that you want to have input connections dropped-out.

This involves adding a layer called Dropout() that takes an argument that specifies the probability that each output from the previous to drop. E.g. 0.4 means 40 percent of inputs will be dropped each update to the model.

You can add Dropout layers in MLP, CNN, and RNN models, although there are also specialized versions of dropout for use with CNN and RNN models that you might also want to explore.

The example below fits a small neural network model on a synthetic binary classification problem.

A dropout layer with 50 percent dropout is inserted between the first hidden layer and the output layer.

# example of using dropout from sklearn.datasets import make_classification from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.layers import Dropout from matplotlib import pyplot # create the dataset X, y = make_classification(n_samples=1000, n_classes=2, random_state=1) # determine the number of input features n_features = X.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid')) # compile the model model.compile(optimizer='adam', loss='binary_crossentropy') # fit the model model.fit(X, y, epochs=100, batch_size=32, verbose=0)

The scale and distribution of inputs to a layer can greatly impact how easy or quickly that layer can be trained.

This is generally why it is a good idea to scale input data prior to modeling it with a neural network model.

Batch normalization is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks.

For more on how batch normalization works, see this tutorial:

You can use batch normalization in your network by adding a batch normalization layer prior to the layer that you wish to have standardized inputs. You can use batch normalization with MLP, CNN, and RNN models.

This can be achieved by adding the BatchNormalization layer directly.

The example below defines a small MLP network for a binary classification prediction problem with a batch normalization layer between the first hidden layer and the output layer.

# example of using batch normalization from sklearn.datasets import make_classification from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.layers import BatchNormalization from matplotlib import pyplot # create the dataset X, y = make_classification(n_samples=1000, n_classes=2, random_state=1) # determine the number of input features n_features = X.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(BatchNormalization()) model.add(Dense(1, activation='sigmoid')) # compile the model model.compile(optimizer='adam', loss='binary_crossentropy') # fit the model model.fit(X, y, epochs=100, batch_size=32, verbose=0)

Also, tf.keras has a range of other normalization layers you might like to explore; see:

Neural networks are challenging to train.

Too little training and the model is underfit; too much training and the model overfits the training dataset. Both cases result in a model that is less effective than it could be.

One approach to solving this problem is to use early stopping. This involves monitoring the loss on the training dataset and a validation dataset (a subset of the training set not used to fit the model). As soon as loss for the validation set starts to show signs of overfitting, the training process can be stopped.

For more on early stopping, see the tutorial:

Early stopping can be used with your model by first ensuring that you have a validation dataset. You can define the validation dataset manually via the *validation_data* argument to the *fit()* function, or you can use the *validation_split* and specify the amount of the training dataset to hold back for validation.

You can then define an EarlyStopping and instruct it on which performance measure to monitor, such as ‘*val_loss*‘ for loss on the validation dataset, and the number of epochs to observed overfitting before taking action, e.g. 5.

This configured EarlyStopping callback can then be provided to the *fit()* function via the “*callbacks*” argument that takes a list of callbacks.

This allows you to set the number of epochs to a large number and be confident that training will end as soon as the model starts overfitting. You might also like to create a learning curve to discover more insights into the learning dynamics of the run and when training was halted.

The example below demonstrates a small neural network on a synthetic binary classification problem that uses early stopping to halt training as soon as the model starts overfitting (after about 50 epochs).

# example of using early stopping from sklearn.datasets import make_classification from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.callbacks import EarlyStopping # create the dataset X, y = make_classification(n_samples=1000, n_classes=2, random_state=1) # determine the number of input features n_features = X.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(1, activation='sigmoid')) # compile the model model.compile(optimizer='adam', loss='binary_crossentropy') # configure early stopping es = EarlyStopping(monitor='val_loss', patience=5) # fit the model history = model.fit(X, y, epochs=200, batch_size=32, verbose=0, validation_split=0.3, callbacks=[es])

The tf.keras API provides a number of callbacks that you might like to explore; you can learn more here:

This section provides more resources on the topic if you are looking to go deeper.

- How to Control the Stability of Training Neural Networks With the Batch Size
- A Gentle Introduction to the Rectified Linear Unit (ReLU)
- Difference Between Classification and Regression in Machine Learning
- How to Manually Scale Image Pixel Data for Deep Learning
- 4 Common Machine Learning Data Transforms for Time Series Forecasting
- How to use Learning Curves to Diagnose Machine Learning Model Performance
- A Gentle Introduction to Dropout for Regularizing Deep Neural Networks
- A Gentle Introduction to Batch Normalization for Deep Neural Networks
- A Gentle Introduction to Early Stopping to Avoid Overtraining Neural Networks

- Deep Learning, 2016.

- Install TensorFlow 2 Guide.
- TensorFlow Core: Keras
- Tensorflow Core: Keras Overview Guide
- The Keras functional API in TensorFlow
- Save and load models
- Normalization Layers Guide.

In this tutorial, you discovered a step-by-step guide to developing deep learning models in TensorFlow using the tf.keras API.

Specifically, you learned:

- The difference between Keras and tf.keras and how to install and confirm TensorFlow is working.
- The 5-step life-cycle of tf.keras models and how to use the sequential and functional APIs.
- How to develop MLP, CNN, and RNN models with tf.keras for regression, classification, and time series forecasting.
- How to use the advanced features of the tf.keras API to inspect and diagnose your model.
- How to improve the performance of your tf.keras model by reducing overfitting and accelerating training.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.

The post TensorFlow 2 Tutorial: Get Started in Deep Learning With tf.keras appeared first on Machine Learning Mastery.

]]>The post What is Deep Learning? appeared first on Machine Learning Mastery.

]]>If you are just starting out in the field of deep learning or you had some experience with neural networks some time ago, you may be confused. I know I was confused initially and so were many of my colleagues and friends who learned and used neural networks in the 1990s and early 2000s.

The leaders and experts in the field have ideas of what deep learning is and these specific and nuanced perspectives shed a lot of light on what deep learning is all about.

In this post, you will discover exactly what deep learning is by hearing from a range of experts and leaders in the field.

Discover how to develop deep learning models for a range of predictive modeling problems with just a few lines of code in my new book, with 18 step-by-step tutorials and 9 projects.

Let’s dive in.

Andrew Ng from Coursera and Chief Scientist at Baidu Research formally founded Google Brain that eventually resulted in the productization of deep learning technologies across a large number of Google services.

He has spoken and written a lot about what deep learning is and is a good place to start.

In early talks on deep learning, Andrew described deep learning in the context of traditional artificial neural networks. In the 2013 talk titled “Deep Learning, Self-Taught Learning and Unsupervised Feature Learning” he described the idea of deep learning as:

Using brain simulations, hope to:

– Make learning algorithms much better and easier to use.

– Make revolutionary advances in machine learning and AI.

I believe this is our best shot at progress towards real AI

Later his comments became more nuanced.

The core of deep learning according to Andrew is that we now have fast enough computers and enough data to actually train large neural networks. When discussing why now is the time that deep learning is taking off at ExtractConf 2015 in a talk titled “What data scientists should know about deep learning“, he commented:

very large neural networks we can now have and … huge amounts of data that we have access to

He also commented on the important point that it is all about scale. That as we construct larger neural networks and train them with more and more data, their performance continues to increase. This is generally different to other machine learning techniques that reach a plateau in performance.

for most flavors of the old generations of learning algorithms … performance will plateau. … deep learning … is the first class of algorithms … that is scalable. … performance just keeps getting better as you feed them more data

He provides a nice cartoon of this in his slides:

Finally, he is clear to point out that the benefits from deep learning that we are seeing in practice come from supervised learning. From the 2015 ExtractConf talk, he commented:

almost all the value today of deep learning is through supervised learning or learning from labeled data

Earlier at a talk to Stanford University titled “Deep Learning” in 2014 he made a similar comment:

one reason that deep learning has taken off like crazy is because it is fantastic at supervised learning

Andrew often mentions that we should and will see more benefits coming from the unsupervised side of the tracks as the field matures to deal with the **abundance of unlabeled data available**.

Jeff Dean is a Wizard and Google Senior Fellow in the Systems and Infrastructure Group at Google and has been involved and perhaps partially responsible for the scaling and adoption of deep learning within Google. Jeff was involved in the Google Brain project and the development of large-scale deep learning software DistBelief and later TensorFlow.

In a 2016 talk titled “Deep Learning for Building Intelligent Computer Systems” he made a comment in the similar vein, that deep learning is really all about large neural networks.

When you hear the term deep learning, just think of a large deep neural net. Deep refers to the number of layers typically and so this kind of the popular term that’s been adopted in the press. I think of them as deep neural networks generally.

He has given this talk a few times, and in a modified set of slides for the same talk, he highlights the **scalability of neural networks** indicating that results get better with more data and larger models, that in turn require more computation to train.

In addition to scalability, another often cited benefit of deep learning models is their ability to perform automatic feature extraction from raw data, also called feature learning.

Yoshua Bengio is another leader in deep learning although began with a strong interest in the automatic feature learning that large neural networks are capable of achieving.

He describes deep learning in terms of the algorithms ability to discover and learn good representations using feature learning. In his 2012 paper titled “Deep Learning of Representations for Unsupervised and Transfer Learning” he commented:

Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higher-level learned features defined in terms of lower-level features

An elaborated perspective of deep learning along these lines is provided in his 2009 technical report titled “Learning deep architectures for AI” where he emphasizes the importance the hierarchy in feature learning.

Deep learning methods aim at learning feature hierarchies with features from higher levels of the hierarchy formed by the composition of lower level features. Automatically learning features at multiple levels of abstraction allow a system to learn complex functions mapping the input to the output directly from data, without depending completely on human-crafted features.

In the soon to be published book titled “Deep Learning” co-authored with Ian Goodfellow and Aaron Courville, they define deep learning in terms of the depth of the architecture of the models.

The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep, with many layers. For this reason, we call this approach to AI deep learning.

This is an important book and will likely become the definitive resource for the field for some time. The book goes on to describe multilayer perceptrons as an algorithm used in the field of deep learning, giving the idea that deep learning has subsumed artificial neural networks.

The quintessential example of a deep learning model is the feedforward deep network or multilayer perceptron (MLP).

Peter Norvig is the Director of Research at Google and famous for his textbook on AI titled “Artificial Intelligence: A Modern Approach“.

In a 2016 talk he gave titled “Deep Learning and Understandability versus Software Engineering and Verification” he defined deep learning in a very similar way to Yoshua, focusing on the power of abstraction permitted by using a deeper network structure.

a kind of learning where the representation you form have several levels of abstraction, rather than a direct input to output

Why Not Just “

Geoffrey Hinton is a pioneer in the field of artificial neural networks and co-published the first paper on the backpropagation algorithm for training multilayer perceptron networks.

He may have started the introduction of the phrasing “*deep*” to describe the development of large artificial neural networks.

He co-authored a paper in 2006 titled “A Fast Learning Algorithm for Deep Belief Nets” in which they describe an approach to training “deep” (as in a many layered network) of restricted Boltzmann machines.

Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

This paper and the related paper Geoff co-authored titled “Deep Boltzmann Machines” on an undirected deep network were well received by the community (now cited many hundreds of times) because they were successful examples of greedy layer-wise training of networks, allowing many more layers in feedforward networks.

In a co-authored article in Science titled “Reducing the Dimensionality of Data with Neural Networks” they stuck with the same description of “deep” to describe their approach to developing networks with many more layers than was previously typical.

We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

In the same article, they make an interesting comment that meshes with Andrew Ng’s comment about the recent increase in compute power and access to large datasets that has unleashed the untapped capability of neural networks when used at larger scale.

It has been obvious since the 1980s that backpropagation through deep autoencoders would be very effective for nonlinear dimensionality reduction, provided that computers were fast enough, data sets were big enough, and the initial weights were close enough to a good solution. All three conditions are now satisfied.

In a talk to the Royal Society in 2016 titled “Deep Learning“, Geoff commented that Deep Belief Networks were the start of deep learning in 2006 and that the first successful application of this new wave of deep learning was to speech recognition in 2009 titled “Acoustic Modeling using Deep Belief Networks“, achieving state of the art results.

It was the results that made the speech recognition and the neural network communities take notice, the use “deep” as a differentiator on previous neural network techniques that probably resulted in the name change.

The descriptions of deep learning in the Royal Society talk are very backpropagation centric as you would expect. Interesting, he gives 4 reasons why backpropagation (read “deep learning”) did not take off last time around in the 1990s. The first two points match comments by Andrew Ng above about datasets being too small and computers being too slow.

Deep learning excels on problem domains where the inputs (and even output) are analog. Meaning, they are not a few quantities in a tabular format but instead are **images of pixel data, documents of text data or files of audio data**.

Yann LeCun is the director of Facebook Research and is the father of the network architecture that excels at object recognition in image data called the Convolutional Neural Network (CNN). This technique is seeing great success because like multilayer perceptron feedforward neural networks, the technique scales with data and model size and can be trained with backpropagation.

This biases his definition of deep learning as the development of very large CNNs, which have had great success on object recognition in photographs.

In a 2016 talk at Lawrence Livermore National Laboratory titled “Accelerating Understanding: Deep Learning, Intelligent Applications, and GPUs” he described deep learning generally as learning hierarchical representations and defines it as a scalable approach to building object recognition systems:

deep learning [is] … a pipeline of modules all of which are trainable. … deep because [has] multiple stages in the process of recognizing an object and all of those stages are part of the training”

Jurgen Schmidhuber is the father of another popular algorithm that like MLPs and CNNs also scales with model size and dataset size and can be trained with backpropagation, but is instead tailored to learning sequence data, called the Long Short-Term Memory Network (LSTM), a type of recurrent neural network.

We do see some confusion in the phrasing of the field as “deep learning”. In his 2014 paper titled “Deep Learning in Neural Networks: An Overview” he does comment on the problematic naming of the field and the differentiation of deep from shallow learning. He also interestingly describes depth in terms of the complexity of the problem rather than the model used to solve the problem.

At which problem depth does Shallow Learning end, and Deep Learning begin? Discussions with DL experts have not yet yielded a conclusive response to this question. […], let me just define for the purposes of this overview: problems of depth > 10 require Very Deep Learning.

Demis Hassabis is the founder of DeepMind, later acquired by Google. DeepMind made the breakthrough of combining deep learning techniques with reinforcement learning to handle complex learning problems like game playing, famously demonstrated in playing Atari games and the game Go with Alpha Go.

In keeping with the naming, they called their new technique a Deep Q-Network, combining Deep Learning with Q-Learning. They also name the broader field of study “Deep Reinforcement Learning”.

In their 2015 nature paper titled “Human-level control through deep reinforcement learning” they comment on the important role of deep neural networks in their breakthrough and highlight the need for hierarchical abstraction.

To achieve this,we developed a novel agent, a deep Q-network (DQN), which is able to combine reinforcement learning with a class of artificial neural network known as deep neural networks. Notably, recent advances in deep neural networks, in which several layers of nodes are used to build up progressively more abstract representations of the data, have made it possible for artificial neural networks to learn concepts such as object categories directly from raw sensory data.

Finally, in what may be considered a defining paper in the field, Yann LeCun, Yoshua Bengio and Geoffrey Hinton published a paper in Nature titled simply “Deep Learning“. In it, they open with a clean definition of deep learning highlighting the multi-layered approach.

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction.

Later the multi-layered approach is described in terms of **representation learning and abstraction**.

Deep-learning methods are representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level. […] The key aspect of deep learning is that these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure.

This is a nice and generic a description, and could easily describe most artificial neural network algorithms. It is also a good note to end on.

In this post you discovered that deep learning is just very big neural networks on a lot more data, requiring bigger computers.

Although early approaches published by Hinton and collaborators focus on greedy layerwise training and unsupervised methods like autoencoders, modern state-of-the-art deep learning is focused on training deep (many layered) neural network models using the backpropagation algorithm. The most popular techniques are:

- Multilayer Perceptron Networks.
- Convolutional Neural Networks.
- Long Short-Term Memory Recurrent Neural Networks.

I hope this has cleared up what deep learning is and how leading definitions fit together under the one umbrella.

**If you have any questions about deep learning** or about this post, ask your questions in the comments below and I will do my best to answer them.

The post What is Deep Learning? appeared first on Machine Learning Mastery.

]]>The post Your First Deep Learning Project in Python with Keras Step-By-Step appeared first on Machine Learning Mastery.

]]>It wraps the efficient numerical computation libraries Theano and TensorFlow and allows you to define and train neural network models in just a few lines of code.

In this tutorial, you will discover how to create your first deep learning neural network model in Python using Keras.

Discover how to develop deep learning models for a range of predictive modeling problems with just a few lines of code in my new book, with 18 step-by-step tutorials and 9 projects.

*Let’s get started.*

**Update Feb/2017**: Updated prediction example so rounding works in Python 2 and 3.**Update Mar/2017**: Updated example for the latest versions of Keras and TensorFlow.**Update Mar/2018**: Added alternate link to download the dataset.**Update Jul/2019**: Expanded and added more useful resources.**Update Sep/2019**: Updated for Keras 2.2.5 API.**Update Oct/2019**: Updated for Keras 2.3.0 API and TensorFlow 2.0.0.

There is not a lot of code required, but we are going to step over it slowly so that you will know how to create your own models in the future.

*The steps you are going to cover in this tutorial are as follows:*

- Load Data.
- Define Keras Model.
- Compile Keras Model.
- Fit Keras Model.
- Evaluate Keras Model.
- Tie It All Together.
- Make Predictions

**This Keras tutorial has a few requirements:**

- You have Python 2 or 3 installed and configured.
- You have SciPy (including NumPy) installed and configured.
- You have Keras and a backend (Theano or TensorFlow) installed and configured.

If you need help with your environment, see the tutorial:

Create a new file called **keras_first_network.py** and type or copy-and-paste the code into the file as you go.

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

The first step is to define the functions and classes we intend to use in this tutorial.

We will use the NumPy library to load our dataset and we will use two classes from the Keras library to define our model.

The imports required are listed below.

# first neural network with keras tutorial from numpy import loadtxt from keras.models import Sequential from keras.layers import Dense ...

We can now load our dataset.

In this Keras tutorial, we are going to use the Pima Indians onset of diabetes dataset. This is a standard machine learning dataset from the UCI Machine Learning repository. It describes patient medical record data for Pima Indians and whether they had an onset of diabetes within five years.

As such, it is a binary classification problem (onset of diabetes as 1 or not as 0). All of the input variables that describe each patient are numerical. This makes it easy to use directly with neural networks that expect numerical input and output values, and ideal for our first neural network in Keras.

The dataset us available from here:

Download the dataset and place it in your local working directory, the same location as your python file.

Save it with the filename:

pima-indians-diabetes.csv

Take a look inside the file, you should see rows of data like the following:

6,148,72,35,0,33.6,0.627,50,1 1,85,66,29,0,26.6,0.351,31,0 8,183,64,0,0,23.3,0.672,32,1 1,89,66,23,94,28.1,0.167,21,0 0,137,40,35,168,43.1,2.288,33,1 ...

We can now load the file as a matrix of numbers using the NumPy function loadtxt().

There are eight input variables and one output variable (the last column). We will be learning a model to map rows of input variables (X) to an output variable (y), which we often summarize as *y = f(X)*.

The variables can be summarized as follows:

Input Variables (X):

- Number of times pregnant
- Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- Diastolic blood pressure (mm Hg)
- Triceps skin fold thickness (mm)
- 2-Hour serum insulin (mu U/ml)
- Body mass index (weight in kg/(height in m)^2)
- Diabetes pedigree function
- Age (years)

Output Variables (y):

- Class variable (0 or 1)

Once the CSV file is loaded into memory, we can split the columns of data into input and output variables.

The data will be stored in a 2D array where the first dimension is rows and the second dimension is columns, e.g. [rows, columns].

We can split the array into two arrays by selecting subsets of columns using the standard NumPy slice operator or “:” We can select the first 8 columns from index 0 to index 7 via the slice 0:8. We can then select the output column (the 9th variable) via index 8.

... # load the dataset dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',') # split into input (X) and output (y) variables X = dataset[:,0:8] y = dataset[:,8] ...

We are now ready to define our neural network model.

**Note**, the dataset has 9 columns and the range 0:8 will select columns from 0 to 7, stopping before index 8. If this is new to you, then you can learn more about array slicing and ranges in this post:

Models in Keras are defined as a sequence of layers.

We create a *Sequential model* and add layers one at a time until we are happy with our network architecture.

The first thing to get right is to ensure the input layer has the right number of input features. This can be specified when creating the first layer with the **input_dim** argument and setting it to 8 for the 8 input variables.

How do we know the number of layers and their types?

This is a very hard question. There are heuristics that we can use and often the best network structure is found through a process of trial and error experimentation (I explain more about this here). Generally, you need a network large enough to capture the structure of the problem.

In this example, we will use a fully-connected network structure with three layers.

Fully connected layers are defined using the Dense class. We can specify the number of neurons or nodes in the layer as the first argument, and specify the activation function using the **activation** argument.

We will use the rectified linear unit activation function referred to as ReLU on the first two layers and the Sigmoid function in the output layer.

It used to be the case that Sigmoid and Tanh activation functions were preferred for all layers. These days, better performance is achieved using the ReLU activation function. We use a sigmoid on the output layer to ensure our network output is between 0 and 1 and easy to map to either a probability of class 1 or snap to a hard classification of either class with a default threshold of 0.5.

We can piece it all together by adding each layer:

- The model expects rows of data with 8 variables (the
*input_dim=8*argument) - The first hidden layer has 12 nodes and uses the relu activation function.
- The second hidden layer has 8 nodes and uses the relu activation function.
- The output layer has one node and uses the sigmoid activation function.

... # define the keras model model = Sequential() model.add(Dense(12, input_dim=8, activation='relu')) model.add(Dense(8, activation='relu')) model.add(Dense(1, activation='sigmoid')) ...

**Note**, the most confusing thing here is that the shape of the input to the model is defined as an argument on the first hidden layer. This means that the line of code that adds the first Dense layer is doing 2 things, defining the input or visible layer and the first hidden layer.

Now that the model is defined, *we can compile it*.

Compiling the model uses the efficient numerical libraries under the covers (the so-called backend) such as Theano or TensorFlow. The backend automatically chooses the best way to represent the network for training and making predictions to run on your hardware, such as CPU or GPU or even distributed.

When compiling, we must specify some additional properties required when training the network. Remember training a network means finding the best set of weights to map inputs to outputs in our dataset.

We must specify the loss function to use to evaluate a set of weights, the optimizer is used to search through different weights for the network and any optional metrics we would like to collect and report during training.

In this case, we will use cross entropy as the **loss** argument. This loss is for a binary classification problems and is defined in Keras as “**binary_crossentropy**“. You can learn more about choosing loss functions based on your problem here:

We will define the **optimizer** as the efficient stochastic gradient descent algorithm “**adam**“. This is a popular version of gradient descent because it automatically tunes itself and gives good results in a wide range of problems. To learn more about the Adam version of stochastic gradient descent see the post:

Finally, because it is a classification problem, we will collect and report the classification accuracy, defined via the **metrics** argument.

... # compile the keras model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) ...

We have defined our model and compiled it ready for efficient computation.

Now it is time to execute the model on some data.

We can train or fit our model on our loaded data by calling the **fit()** function on the model.

Training occurs over epochs and each epoch is split into batches.

**Epoch**: One pass through all of the rows in the training dataset.**Batch**: One or more samples considered by the model within an epoch before weights are updated.

One epoch is comprised of one or more batches, based on the chosen batch size and the model is fit for many epochs. For more on the difference between epochs and batches, see the post:

The training process will run for a fixed number of iterations through the dataset called epochs, that we must specify using the **epochs** argument. We must also set the number of dataset rows that are considered before the model weights are updated within each epoch, called the batch size and set using the **batch_size** argument.

For this problem, we will run for a small number of epochs (150) and use a relatively small batch size of 10.

These configurations can be chosen experimentally by trial and error. We want to train the model enough so that it learns a good (or good enough) mapping of rows of input data to the output classification. The model will always have some error, but the amount of error will level out after some point for a given model configuration. This is called model convergence.

... # fit the keras model on the dataset model.fit(X, y, epochs=150, batch_size=10) ...

This is where the work happens on your CPU or GPU.

No GPU is required for this example, but if you’re interested in how to run large models on GPU hardware cheaply in the cloud, see this post:

We have trained our neural network on the entire dataset and we can evaluate the performance of the network on the same dataset.

This will only give us an idea of how well we have modeled the dataset (e.g. train accuracy), but no idea of how well the algorithm might perform on new data. We have done this for simplicity, but ideally, you could separate your data into train and test datasets for training and evaluation of your model.

You can evaluate your model on your training dataset using the **evaluate()** function on your model and pass it the same input and output used to train the model.

This will generate a prediction for each input and output pair and collect scores, including the average loss and any metrics you have configured, such as accuracy.

The **evaluate()** function will return a list with two values. The first will be the loss of the model on the dataset and the second will be the accuracy of the model on the dataset. We are only interested in reporting the accuracy, so we will ignore the loss value.

... # evaluate the keras model _, accuracy = model.evaluate(X, y) print('Accuracy: %.2f' % (accuracy*100))

You have just seen how you can easily create your first neural network model in Keras.

Let’s tie it all together into a complete code example.

# first neural network with keras tutorial from numpy import loadtxt from keras.models import Sequential from keras.layers import Dense # load the dataset dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',') # split into input (X) and output (y) variables X = dataset[:,0:8] y = dataset[:,8] # define the keras model model = Sequential() model.add(Dense(12, input_dim=8, activation='relu')) model.add(Dense(8, activation='relu')) model.add(Dense(1, activation='sigmoid')) # compile the keras model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # fit the keras model on the dataset model.fit(X, y, epochs=150, batch_size=10) # evaluate the keras model _, accuracy = model.evaluate(X, y) print('Accuracy: %.2f' % (accuracy*100))

You can copy all of the code into your Python file and save it as “**keras_first_network.py**” in the same directory as your data file “**pima-indians-diabetes.csv**“. You can then run the Python file as a script from your command line (command prompt) as follows:

python keras_first_network.py

Running this example, you should see a message for each of the 150 epochs printing the loss and accuracy, followed by the final evaluation of the trained model on the training dataset.

It takes about 10 seconds to execute on my workstation running on the CPU.

Ideally, we would like the loss to go to zero and accuracy to go to 1.0 (e.g. 100%). This is not possible for any but the most trivial machine learning problems. Instead, we will always have some error in our model. The goal is to choose a model configuration and training configuration that achieve the lowest loss and highest accuracy possible for a given dataset.

... 768/768 [==============================] - 0s 63us/step - loss: 0.4817 - acc: 0.7708 Epoch 147/150 768/768 [==============================] - 0s 63us/step - loss: 0.4764 - acc: 0.7747 Epoch 148/150 768/768 [==============================] - 0s 63us/step - loss: 0.4737 - acc: 0.7682 Epoch 149/150 768/768 [==============================] - 0s 64us/step - loss: 0.4730 - acc: 0.7747 Epoch 150/150 768/768 [==============================] - 0s 63us/step - loss: 0.4754 - acc: 0.7799 768/768 [==============================] - 0s 38us/step Accuracy: 76.56

**Note,** if you try running this example in an IPython or Jupyter notebook you may get an error.

The reason is the output progress bars during training. You can easily turn these off by setting **verbose=0** in the call to the **fit()** and **evaluate()** functions, for example:

... # fit the keras model on the dataset without progress bars model.fit(X, y, epochs=150, batch_size=10, verbose=0) # evaluate the keras model _, accuracy = model.evaluate(X, y, verbose=0) ...

**Note**, the accuracy of your model will vary.

**What score did you get?**

Post your results in the comments below.

Neural networks are a stochastic algorithm, meaning that the same algorithm on the same data can train a different model with different skill each time the code is run. This is a feature, not a bug. You can learn more about this in the post:

The variance in the performance of the model means that to get a reasonable approximation of how well your model is performing, you may need to fit it many times and calculate the average of the accuracy scores. For more on this approach to evaluating neural networks, see the post:

For example, below are the accuracy scores from re-running the example 5 times:

Accuracy: 75.00 Accuracy: 77.73 Accuracy: 77.60 Accuracy: 78.12 Accuracy: 76.17

We can see that all accuracy scores are around 77% and the average is 76.924%.

The number one question I get asked is:

After I train my model, how can I use it to make predictions on new data?

Great question.

We can adapt the above example and use it to generate predictions on the training dataset, pretending it is a new dataset we have not seen before.

Making predictions is as easy as calling the **predict()** function on the model. We are using a sigmoid activation function on the output layer, so the predictions will be a probability in the range between 0 and 1. We can easily convert them into a crisp binary prediction for this classification task by rounding them.

For example:

... # make probability predictions with the model predictions = model.predict(X) # round predictions rounded = [round(x[0]) for x in predictions]

Alternately, we can call the **predict_classes()** function on the model to predict crisp classes directly, for example:

... # make class predictions with the model predictions = model.predict_classes(X)

The complete example below makes predictions for each example in the dataset, then prints the input data, predicted class and expected class for the first 5 examples in the dataset.

# first neural network with keras make predictions from numpy import loadtxt from keras.models import Sequential from keras.layers import Dense # load the dataset dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',') # split into input (X) and output (y) variables X = dataset[:,0:8] y = dataset[:,8] # define the keras model model = Sequential() model.add(Dense(12, input_dim=8, activation='relu')) model.add(Dense(8, activation='relu')) model.add(Dense(1, activation='sigmoid')) # compile the keras model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # fit the keras model on the dataset model.fit(X, y, epochs=150, batch_size=10, verbose=0) # make class predictions with the model predictions = model.predict_classes(X) # summarize the first 5 cases for i in range(5): print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))

Running the example does not show the progress bar as before as we have set the verbose argument to 0.

After the model is fit, predictions are made for all examples in the dataset, and the input rows and predicted class value for the first 5 examples is printed and compared to the expected class value.

We can see that most rows are correctly predicted. In fact, we would expect about 76.9% of the rows to be correctly predicted based on our estimated performance of the model in the previous section.

[6.0, 148.0, 72.0, 35.0, 0.0, 33.6, 0.627, 50.0] => 0 (expected 1) [1.0, 85.0, 66.0, 29.0, 0.0, 26.6, 0.351, 31.0] => 0 (expected 0) [8.0, 183.0, 64.0, 0.0, 0.0, 23.3, 0.672, 32.0] => 1 (expected 1) [1.0, 89.0, 66.0, 23.0, 94.0, 28.1, 0.167, 21.0] => 0 (expected 0) [0.0, 137.0, 40.0, 35.0, 168.0, 43.1, 2.288, 33.0] => 1 (expected 1)

If you would like to know more about how to make predictions with Keras models, see the post:

In this post, you discovered how to create your first neural network model using the powerful Keras Python library for deep learning.

Specifically, you learned the six key steps in using Keras to create a neural network or deep learning model, step-by-step including:

- How to load data.
- How to define a neural network in Keras.
- How to compile a Keras model using the efficient numerical backend.
- How to train a model on data.
- How to evaluate a model on data.
- How to make predictions with the model.

Do you have any questions about Keras or about this tutorial?

Ask your question in the comments and I will do my best to answer.

Well done, you have successfully developed your first neural network using the Keras deep learning library in Python.

This section provides some extensions to this tutorial that you might want to explore.

**Tune the Model.**Change the configuration of the model or training process and see if you can improve the performance of the model, e.g. achieve better than 76% accuracy.**Save the Model**. Update the tutorial to save the model to file, then load it later and use it to make predictions (see this tutorial).**Summarize the Model**. Update the tutorial to summarize the model and create a plot of model layers (see this tutorial).**Separate Train and Test Datasets**. Split the loaded dataset into a train and test set (split based on rows) and use one set to train the model and the other set to estimate the performance of the model on new data.**Plot Learning Curves**. The fit() function returns a history object that summarizes the loss and accuracy at the end of each epoch. Create line plots of this data, called learning curves (see this tutorial).**Learn a New Dataset**. Update the tutorial to use a different tabular dataset, perhaps from the UCI Machine Learning Repository.**Use Functional API**. Update the tutorial to use the Keras Functional API for defining the model (see this tutorial).

- 5 Step Life-Cycle for Neural Network Models in Keras
- Multi-Class Classification Tutorial with the Keras Deep Learning Library
- Regression Tutorial with the Keras Deep Learning Library in Python
- How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras

The post Your First Deep Learning Project in Python with Keras Step-By-Step appeared first on Machine Learning Mastery.

]]>The post How to Save and Load Your Keras Deep Learning Model appeared first on Machine Learning Mastery.

]]>Given that deep learning models can take hours, days and even weeks to train, it is important to know how to save and load them from disk.

In this post, you will discover how you can save your Keras models to file and load them up again to make predictions.

After reading this tutorial you will know:

- How to save model weights and model architecture in separate files.
- How to save model architecture in both YAML and JSON format.
- How to save model weights and architecture into a single file for later use.

Discover how to develop deep learning models for a range of predictive modeling problems with just a few lines of code in my new book, with 18 step-by-step tutorials and 9 projects.

Let’s get started.

**Update Mar 2017**: Added instructions to install h5py first.**Update Mar/2017**: Updated examples for changes to the Keras API.**Update Mar/2018**: Added alternate link to download the dataset.**Update May/2019**: Added section on saving and loading the model to a single file.**Update Sep/2019**: Added note about using PyYAML version 5.

If you are new to Keras or deep learning, see this step-by-step Keras tutorial.

Keras separates the concerns of saving your model architecture and saving your model weights.

Model weights are saved to HDF5 format. This is a grid format that is ideal for storing multi-dimensional arrays of numbers.

The model structure can be described and saved using two different formats: JSON and YAML.

In this post we are going to look at two examples of saving and loading your model to file:

- Save Model to JSON.
- Save Model to YAML.

Each example will also demonstrate saving and loading your model weights to HDF5 formatted files.

The examples will use the same simple network trained on the Pima Indians onset of diabetes binary classification dataset. This is a small dataset that contains all numerical data and is easy to work with. You can download this dataset and place it in your working directory with the filename “*pima-indians-diabetes.csv*” (update: download from here).

Confirm that you have the latest version of Keras installed (e.g. v2.2.4 as of May 2019).

**Note**: Saving models requires that you have the h5py library installed. You can install it easily as follows:

sudo pip install h5py

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

JSON is a simple file format for describing data hierarchically.

Keras provides the ability to describe any model using JSON format with a *to_json()* function. This can be saved to file and later loaded via the *model_from_json()* function that will create a new model from the JSON specification.

The weights are saved directly from the model using the *save_weights()* function and later loaded using the symmetrical *load_weights()* function.

The example below trains and evaluates a simple model on the Pima Indians dataset. The model is then converted to JSON format and written to model.json in the local directory. The network weights are written to *model.h5* in the local directory.

The model and weight data is loaded from the saved files and a new model is created. It is important to compile the loaded model before it is used. This is so that predictions made using the model can use the appropriate efficient computation from the Keras backend.

The model is evaluated in the same way printing the same evaluation score.

# MLP for Pima Indians Dataset Serialize to JSON and HDF5 from keras.models import Sequential from keras.layers import Dense from keras.models import model_from_json import numpy import os # fix random seed for reproducibility numpy.random.seed(7) # load pima indians dataset dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") # split into input (X) and output (Y) variables X = dataset[:,0:8] Y = dataset[:,8] # create model model = Sequential() model.add(Dense(12, input_dim=8, activation='relu')) model.add(Dense(8, activation='relu')) model.add(Dense(1, activation='sigmoid')) # Compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # Fit the model model.fit(X, Y, epochs=150, batch_size=10, verbose=0) # evaluate the model scores = model.evaluate(X, Y, verbose=0) print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) # serialize model to JSON model_json = model.to_json() with open("model.json", "w") as json_file: json_file.write(model_json) # serialize weights to HDF5 model.save_weights("model.h5") print("Saved model to disk") # later... # load json and create model json_file = open('model.json', 'r') loaded_model_json = json_file.read() json_file.close() loaded_model = model_from_json(loaded_model_json) # load weights into new model loaded_model.load_weights("model.h5") print("Loaded model from disk") # evaluate loaded model on test data loaded_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy']) score = loaded_model.evaluate(X, Y, verbose=0) print("%s: %.2f%%" % (loaded_model.metrics_names[1], score[1]*100))

Running this example provides the output below.

acc: 78.78% Saved model to disk Loaded model from disk acc: 78.78%

The JSON format of the model looks like the following:

{ "class_name":"Sequential", "config":{ "name":"sequential_1", "layers":[ { "class_name":"Dense", "config":{ "name":"dense_1", "trainable":true, "batch_input_shape":[ null, 8 ], "dtype":"float32", "units":12, "activation":"relu", "use_bias":true, "kernel_initializer":{ "class_name":"VarianceScaling", "config":{ "scale":1.0, "mode":"fan_avg", "distribution":"uniform", "seed":null } }, "bias_initializer":{ "class_name":"Zeros", "config":{ } }, "kernel_regularizer":null, "bias_regularizer":null, "activity_regularizer":null, "kernel_constraint":null, "bias_constraint":null } }, { "class_name":"Dense", "config":{ "name":"dense_2", "trainable":true, "dtype":"float32", "units":8, "activation":"relu", "use_bias":true, "kernel_initializer":{ "class_name":"VarianceScaling", "config":{ "scale":1.0, "mode":"fan_avg", "distribution":"uniform", "seed":null } }, "bias_initializer":{ "class_name":"Zeros", "config":{ } }, "kernel_regularizer":null, "bias_regularizer":null, "activity_regularizer":null, "kernel_constraint":null, "bias_constraint":null } }, { "class_name":"Dense", "config":{ "name":"dense_3", "trainable":true, "dtype":"float32", "units":1, "activation":"sigmoid", "use_bias":true, "kernel_initializer":{ "class_name":"VarianceScaling", "config":{ "scale":1.0, "mode":"fan_avg", "distribution":"uniform", "seed":null } }, "bias_initializer":{ "class_name":"Zeros", "config":{ } }, "kernel_regularizer":null, "bias_regularizer":null, "activity_regularizer":null, "kernel_constraint":null, "bias_constraint":null } } ] }, "keras_version":"2.2.5", "backend":"tensorflow" }

This example is much the same as the above JSON example, except the YAML format is used for the model specification.

Note, this example assumes that you have PyYAML 5 installed, for example:

sudo pip install PyYAML

In this example, the model is described using YAML, saved to file *model.yaml* and later loaded into a new model via the *model_from_yaml()* function.

Weights are handled in the same way as above in HDF5 format as *model.h5*.

# MLP for Pima Indians Dataset serialize to YAML and HDF5 from keras.models import Sequential from keras.layers import Dense from keras.models import model_from_yaml import numpy import os # fix random seed for reproducibility seed = 7 numpy.random.seed(seed) # load pima indians dataset dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") # split into input (X) and output (Y) variables X = dataset[:,0:8] Y = dataset[:,8] # create model model = Sequential() model.add(Dense(12, input_dim=8, activation='relu')) model.add(Dense(8, activation='relu')) model.add(Dense(1, activation='sigmoid')) # Compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # Fit the model model.fit(X, Y, epochs=150, batch_size=10, verbose=0) # evaluate the model scores = model.evaluate(X, Y, verbose=0) print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) # serialize model to YAML model_yaml = model.to_yaml() with open("model.yaml", "w") as yaml_file: yaml_file.write(model_yaml) # serialize weights to HDF5 model.save_weights("model.h5") print("Saved model to disk") # later... # load YAML and create model yaml_file = open('model.yaml', 'r') loaded_model_yaml = yaml_file.read() yaml_file.close() loaded_model = model_from_yaml(loaded_model_yaml) # load weights into new model loaded_model.load_weights("model.h5") print("Loaded model from disk") # evaluate loaded model on test data loaded_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy']) score = loaded_model.evaluate(X, Y, verbose=0) print("%s: %.2f%%" % (loaded_model.metrics_names[1], score[1]*100))

Running the example displays the following output:

acc: 78.78% Saved model to disk Loaded model from disk acc: 78.78%

The model described in YAML format looks like the following:

backend: tensorflow class_name: Sequential config: layers: - class_name: Dense config: activation: relu activity_regularizer: null batch_input_shape: !!python/tuple - null - 8 bias_constraint: null bias_initializer: class_name: Zeros config: {} bias_regularizer: null dtype: float32 kernel_constraint: null kernel_initializer: class_name: VarianceScaling config: distribution: uniform mode: fan_avg scale: 1.0 seed: null kernel_regularizer: null name: dense_1 trainable: true units: 12 use_bias: true - class_name: Dense config: activation: relu activity_regularizer: null bias_constraint: null bias_initializer: class_name: Zeros config: {} bias_regularizer: null dtype: float32 kernel_constraint: null kernel_initializer: class_name: VarianceScaling config: distribution: uniform mode: fan_avg scale: 1.0 seed: null kernel_regularizer: null name: dense_2 trainable: true units: 8 use_bias: true - class_name: Dense config: activation: sigmoid activity_regularizer: null bias_constraint: null bias_initializer: class_name: Zeros config: {} bias_regularizer: null dtype: float32 kernel_constraint: null kernel_initializer: class_name: VarianceScaling config: distribution: uniform mode: fan_avg scale: 1.0 seed: null kernel_regularizer: null name: dense_3 trainable: true units: 1 use_bias: true name: sequential_1 keras_version: 2.2.5

Keras also supports a simpler interface to save both the model weights and model architecture together into a single H5 file.

Saving the model in this way includes everything we need to know about the model, including:

- Model weights.
- Model architecture.
- Model compilation details (loss and metrics).
- Model optimizer state.

This means that we can load and use the model directly, without having to re-compile it as we did in the examples above.

**Note**: this is the preferred way for saving and loading your Keras model.

You can save your model by calling the *save()* function on the model and specifying the filename.

The example below demonstrates this by first fitting a model, evaluating it and saving it to the file *model.h5*.

# MLP for Pima Indians Dataset saved to single file from numpy import loadtxt from keras.models import Sequential from keras.layers import Dense # load pima indians dataset dataset = loadtxt("pima-indians-diabetes.csv", delimiter=",") # split into input (X) and output (Y) variables X = dataset[:,0:8] Y = dataset[:,8] # define model model = Sequential() model.add(Dense(12, input_dim=8, activation='relu')) model.add(Dense(8, activation='relu')) model.add(Dense(1, activation='sigmoid')) # compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # Fit the model model.fit(X, Y, epochs=150, batch_size=10, verbose=0) # evaluate the model scores = model.evaluate(X, Y, verbose=0) print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) # save model and architecture to single file model.save("model.h5") print("Saved model to disk")

Running the example fits the model, summarizes the models performance on the training dataset and saves the model to file.

acc: 77.73% Saved model to disk

We can later load this model from file and use it.

Your saved model can then be loaded later by calling the *load_model()* function and passing the filename. The function returns the model with the same architecture and weights.

In this case, we load the model, summarize the architecture and evaluate it on the same dataset to confirm the weights and architecture are the same.

# load and evaluate a saved model from numpy import loadtxt from keras.models import load_model # load model model = load_model('model.h5') # summarize model. model.summary() # load dataset dataset = loadtxt("pima-indians-diabetes.csv", delimiter=",") # split into input (X) and output (Y) variables X = dataset[:,0:8] Y = dataset[:,8] # evaluate the model score = model.evaluate(X, Y, verbose=0) print("%s: %.2f%%" % (model.metrics_names[1], score[1]*100))

Running the example first loads the model, prints a summary of the model architecture then evaluates the loaded model on the same dataset.

The model achieves the same accuracy score which in this case is 77%.

_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_1 (Dense) (None, 12) 108 _________________________________________________________________ dense_2 (Dense) (None, 8) 104 _________________________________________________________________ dense_3 (Dense) (None, 1) 9 ================================================================= Total params: 221 Trainable params: 221 Non-trainable params: 0 _________________________________________________________________ acc: 77.73%

- How can I save a Keras model? in the Keras documentation.
- About Keras models in the Keras documentation.

In this post, you discovered how to serialize your Keras deep learning models.

You learned how you can save your trained models to files and later load them up and use them to make predictions.

You also learned that model weights are easily stored using HDF5 format and that the network structure can be saved in either JSON or YAML format.

Do you have any questions about saving your deep learning models or about this post?

Ask your questions in the comments and I will do my best to answer them.

The post How to Save and Load Your Keras Deep Learning Model appeared first on Machine Learning Mastery.

]]>The post How to Calculate Precision, Recall, F1, and More for Deep Learning Models appeared first on Machine Learning Mastery.

]]>This is critical, as the reported performance allows you to both choose between candidate models and to communicate to stakeholders about how good the model is at solving the problem.

The Keras deep learning API model is very limited in terms of the metrics that you can use to report the model performance.

I am frequently asked questions, such as:

How can I calculate the precision and recall for my model?

And:

How can I calculate the F1-score or confusion matrix for my model?

In this tutorial, you will discover how to calculate metrics to evaluate your deep learning neural network model with a step-by-step example.

After completing this tutorial, you will know:

- How to use the scikit-learn metrics API to evaluate a deep learning model.
- How to make both class and probability predictions with a final model required by the scikit-learn API.
- How to calculate precision, recall, F1-score, ROC AUC, and more with the scikit-learn API for a model.

Let’s get started.

**Update Jan/2020**: Updated API for Keras 2.3 and TensorFlow 2.0.

This tutorial is divided into three parts; they are:

- Binary Classification Problem
- Multilayer Perceptron Model
- How to Calculate Model Metrics

We will use a standard binary classification problem as the basis for this tutorial, called the “*two circles*” problem.

It is called the two circles problem because the problem is comprised of points that when plotted, show two concentric circles, one for each class. As such, this is an example of a binary classification problem. The problem has two inputs that can be interpreted as x and y coordinates on a graph. Each point belongs to either the inner or outer circle.

The make_circles() function in the scikit-learn library allows you to generate samples from the two circles problem. The “*n_samples*” argument allows you to specify the number of samples to generate, divided evenly between the two classes. The “*noise*” argument allows you to specify how much random statistical noise is added to the inputs or coordinates of each point, making the classification task more challenging. The “*random_state*” argument specifies the seed for the pseudorandom number generator, ensuring that the same samples are generated each time the code is run.

The example below generates 1,000 samples, with 0.1 statistical noise and a seed of 1.

# generate 2d classification dataset X, y = make_circles(n_samples=1000, noise=0.1, random_state=1)

Once generated, we can create a plot of the dataset to get an idea of how challenging the classification task is.

The example below generates samples and plots them, coloring each point according to the class, where points belonging to class 0 (outer circle) are colored blue and points that belong to class 1 (inner circle) are colored orange.

# Example of generating samples from the two circle problem from sklearn.datasets import make_circles from matplotlib import pyplot from numpy import where # generate 2d classification dataset X, y = make_circles(n_samples=1000, noise=0.1, random_state=1) # scatter plot, dots colored by class value for i in range(2): samples_ix = where(y == i) pyplot.scatter(X[samples_ix, 0], X[samples_ix, 1]) pyplot.show()

Running the example generates the dataset and plots the points on a graph, clearly showing two concentric circles for points belonging to class 0 and class 1.

We will develop a Multilayer Perceptron, or MLP, model to address the binary classification problem.

This model is not optimized for the problem, but it is skillful (better than random).

After the samples for the dataset are generated, we will split them into two equal parts: one for training the model and one for evaluating the trained model.

# split into train and test n_test = 500 trainX, testX = X[:n_test, :], X[n_test:, :] trainy, testy = y[:n_test], y[n_test:]

Next, we can define our MLP model. The model is simple, expecting 2 input variables from the dataset, a single hidden layer with 100 nodes, and a ReLU activation function, then an output layer with a single node and a sigmoid activation function.

The model will predict a value between 0 and 1 that will be interpreted as to whether the input example belongs to class 0 or class 1.

# define model model = Sequential() model.add(Dense(100, input_dim=2, activation='relu')) model.add(Dense(1, activation='sigmoid'))

The model will be fit using the binary cross entropy loss function and we will use the efficient Adam version of stochastic gradient descent. The model will also monitor the classification accuracy metric.

# compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

We will fit the model for 300 training epochs with the default batch size of 32 samples and evaluate the performance of the model at the end of each training epoch on the test dataset.

# fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=300, verbose=0)

At the end of training, we will evaluate the final model once more on the train and test datasets and report the classification accuracy.

# evaluate the model _, train_acc = model.evaluate(trainX, trainy, verbose=0) _, test_acc = model.evaluate(testX, testy, verbose=0)

Finally, the performance of the model on the train and test sets recorded during training will be graphed using a line plot, one for each of the loss and the classification accuracy.

# plot loss during training pyplot.subplot(211) pyplot.title('Loss') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() # plot accuracy during training pyplot.subplot(212) pyplot.title('Accuracy') pyplot.plot(history.history['accuracy'], label='train') pyplot.plot(history.history['val_accuracy'], label='test') pyplot.legend() pyplot.show()

Tying all of these elements together, the complete code listing of training and evaluating an MLP on the two circles problem is listed below.

# multilayer perceptron model for the two circles problem from sklearn.datasets import make_circles from keras.models import Sequential from keras.layers import Dense from matplotlib import pyplot # generate dataset X, y = make_circles(n_samples=1000, noise=0.1, random_state=1) # split into train and test n_test = 500 trainX, testX = X[:n_test, :], X[n_test:, :] trainy, testy = y[:n_test], y[n_test:] # define model model = Sequential() model.add(Dense(100, input_dim=2, activation='relu')) model.add(Dense(1, activation='sigmoid')) # compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=300, verbose=0) # evaluate the model _, train_acc = model.evaluate(trainX, trainy, verbose=0) _, test_acc = model.evaluate(testX, testy, verbose=0) print('Train: %.3f, Test: %.3f' % (train_acc, test_acc)) # plot loss during training pyplot.subplot(211) pyplot.title('Loss') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() # plot accuracy during training pyplot.subplot(212) pyplot.title('Accuracy') pyplot.plot(history.history['accuracy'], label='train') pyplot.plot(history.history['val_accuracy'], label='test') pyplot.legend() pyplot.show()

Running the example fits the model very quickly on the CPU (no GPU is required).

The model is evaluated, reporting the classification accuracy on the train and test sets of about 83% and 85% respectively.

Note, your specific results may vary given the stochastic nature of the training algorithm.

Train: 0.838, Test: 0.850

A figure is created showing two line plots: one for the learning curves of the loss on the train and test sets and one for the classification on the train and test sets.

The plots suggest that the model has a good fit on the problem.

Perhaps you need to evaluate your deep learning neural network model using additional metrics that are not supported by the Keras metrics API.

The Keras metrics API is limited and you may want to calculate metrics such as precision, recall, F1, and more.

One approach to calculating new metrics is to implement them yourself in the Keras API and have Keras calculate them for you during model training and during model evaluation.

For help with this approach, see the tutorial:

This can be technically challenging.

A much simpler alternative is to use your final model to make a prediction for the test dataset, then calculate any metric you wish using the scikit-learn metrics API.

Three metrics, in addition to classification accuracy, that are commonly required for a neural network model on a binary classification problem are:

- Precision
- Recall
- F1 Score

In this section, we will calculate these three metrics, as well as classification accuracy using the scikit-learn metrics API, and we will also calculate three additional metrics that are less common but may be useful. They are:

- Cohen’s Kappa
- ROC AUC
- Confusion Matrix.

This is not a complete list of metrics for classification models supported by scikit-learn; nevertheless, calculating these metrics will show you how to calculate any metrics you may require using the scikit-learn API.

For a full list of supported metrics, see:

The example in this section will calculate metrics for an MLP model, but the same code for calculating metrics can be used for other models, such as RNNs and CNNs.

We can use the same code from the previous sections for preparing the dataset, as well as defining and fitting the model. To make the example simpler, we will put the code for these steps into simple function.

First, we can define a function called *get_data()* that will generate the dataset and split it into train and test sets.

# generate and prepare the dataset def get_data(): # generate dataset X, y = make_circles(n_samples=1000, noise=0.1, random_state=1) # split into train and test n_test = 500 trainX, testX = X[:n_test, :], X[n_test:, :] trainy, testy = y[:n_test], y[n_test:] return trainX, trainy, testX, testy

Next, we will define a function called *get_model()* that will define the MLP model and fit it on the training dataset.

# define and fit the model def get_model(trainX, trainy): # define model model = Sequential() model.add(Dense(100, input_dim=2, activation='relu')) model.add(Dense(1, activation='sigmoid')) # compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # fit model model.fit(trainX, trainy, epochs=300, verbose=0) return model

We can then call the *get_data()* function to prepare the dataset and the *get_model()* function to fit and return the model.

# generate data trainX, trainy, testX, testy = get_data() # fit model model = get_model(trainX, trainy)

Now that we have a model fit on the training dataset, we can evaluate it using metrics from the scikit-learn metrics API.

First, we must use the model to make predictions. Most of the metric functions require a comparison between the true class values (e.g. *testy*) and the predicted class values (*yhat_classes*). We can predict the class values directly with our model using the *predict_classes()* function on the model.

Some metrics, like the ROC AUC, require a prediction of class probabilities (*yhat_probs*). These can be retrieved by calling the *predict()* function on the model.

For more help with making predictions using a Keras model, see the post:

We can make the class and probability predictions with the model.

# predict probabilities for test set yhat_probs = model.predict(testX, verbose=0) # predict crisp classes for test set yhat_classes = model.predict_classes(testX, verbose=0)

The predictions are returned in a two-dimensional array, with one row for each example in the test dataset and one column for the prediction.

The scikit-learn metrics API expects a 1D array of actual and predicted values for comparison, therefore, we must reduce the 2D prediction arrays to 1D arrays.

# reduce to 1d array yhat_probs = yhat_probs[:, 0] yhat_classes = yhat_classes[:, 0]

We are now ready to calculate metrics for our deep learning neural network model. We can start by calculating the classification accuracy, precision, recall, and F1 scores.

# accuracy: (tp + tn) / (p + n) accuracy = accuracy_score(testy, yhat_classes) print('Accuracy: %f' % accuracy) # precision tp / (tp + fp) precision = precision_score(testy, yhat_classes) print('Precision: %f' % precision) # recall: tp / (tp + fn) recall = recall_score(testy, yhat_classes) print('Recall: %f' % recall) # f1: 2 tp / (2 tp + fp + fn) f1 = f1_score(testy, yhat_classes) print('F1 score: %f' % f1)

Notice that calculating a metric is as simple as choosing the metric that interests us and calling the function passing in the true class values (*testy*) and the predicted class values (*yhat_classes*).

We can also calculate some additional metrics, such as the Cohen’s kappa, ROC AUC, and confusion matrix.

Notice that the ROC AUC requires the predicted class probabilities (*yhat_probs*) as an argument instead of the predicted classes (*yhat_classes*).

# kappa kappa = cohen_kappa_score(testy, yhat_classes) print('Cohens kappa: %f' % kappa) # ROC AUC auc = roc_auc_score(testy, yhat_probs) print('ROC AUC: %f' % auc) # confusion matrix matrix = confusion_matrix(testy, yhat_classes) print(matrix)

Now that we know how to calculate metrics for a deep learning neural network using the scikit-learn API, we can tie all of these elements together into a complete example, listed below.

# demonstration of calculating metrics for a neural network model using sklearn from sklearn.datasets import make_circles from sklearn.metrics import accuracy_score from sklearn.metrics import precision_score from sklearn.metrics import recall_score from sklearn.metrics import f1_score from sklearn.metrics import cohen_kappa_score from sklearn.metrics import roc_auc_score from sklearn.metrics import confusion_matrix from keras.models import Sequential from keras.layers import Dense # generate and prepare the dataset def get_data(): # generate dataset X, y = make_circles(n_samples=1000, noise=0.1, random_state=1) # split into train and test n_test = 500 trainX, testX = X[:n_test, :], X[n_test:, :] trainy, testy = y[:n_test], y[n_test:] return trainX, trainy, testX, testy # define and fit the model def get_model(trainX, trainy): # define model model = Sequential() model.add(Dense(100, input_dim=2, activation='relu')) model.add(Dense(1, activation='sigmoid')) # compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # fit model model.fit(trainX, trainy, epochs=300, verbose=0) return model # generate data trainX, trainy, testX, testy = get_data() # fit model model = get_model(trainX, trainy) # predict probabilities for test set yhat_probs = model.predict(testX, verbose=0) # predict crisp classes for test set yhat_classes = model.predict_classes(testX, verbose=0) # reduce to 1d array yhat_probs = yhat_probs[:, 0] yhat_classes = yhat_classes[:, 0] # accuracy: (tp + tn) / (p + n) accuracy = accuracy_score(testy, yhat_classes) print('Accuracy: %f' % accuracy) # precision tp / (tp + fp) precision = precision_score(testy, yhat_classes) print('Precision: %f' % precision) # recall: tp / (tp + fn) recall = recall_score(testy, yhat_classes) print('Recall: %f' % recall) # f1: 2 tp / (2 tp + fp + fn) f1 = f1_score(testy, yhat_classes) print('F1 score: %f' % f1) # kappa kappa = cohen_kappa_score(testy, yhat_classes) print('Cohens kappa: %f' % kappa) # ROC AUC auc = roc_auc_score(testy, yhat_probs) print('ROC AUC: %f' % auc) # confusion matrix matrix = confusion_matrix(testy, yhat_classes) print(matrix)

Running the example prepares the dataset, fits the model, then calculates and reports the metrics for the model evaluated on the test dataset.

Your specific results may vary given the stochastic nature of the training algorithm.

Accuracy: 0.842000 Precision: 0.836576 Recall: 0.853175 F1 score: 0.844794 Cohens kappa: 0.683929 ROC AUC: 0.923739 [[206 42] [ 37 215]]

If you need help interpreting a given metric, perhaps start with the “Classification Metrics Guide” in the scikit-learn API documentation: Classification Metrics Guide

Also, checkout the Wikipedia page for your metric; for example: Precision and recall, Wikipedia.

This section provides more resources on the topic if you are looking to go deeper.

- How to Use Metrics for Deep Learning With Keras in Python
- How to Generate Test Datasets in Python With scikit-learn
- How to Make Predictions With Keras

- sklearn.metrics: Metrics API
- Classification Metrics Guide
- Keras Metrics API
- sklearn.datasets.make_circles API

- Evaluation of binary classifiers, Wikipedia.
- Confusion Matrix, Wikipedia.
- Precision and recall, Wikipedia.

In this tutorial, you discovered how to calculate metrics to evaluate your deep learning neural network model with a step-by-step example.

Specifically, you learned:

- How to use the scikit-learn metrics API to evaluate a deep learning model.
- How to make both class and probability predictions with a final model required by the scikit-learn API.
- How to calculate precision, recall, F1-score, ROC, AUC, and more with the scikit-learn API for a model.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Calculate Precision, Recall, F1, and More for Deep Learning Models appeared first on Machine Learning Mastery.

]]>The post 3 Levels of Deep Learning Competence appeared first on Machine Learning Mastery.

]]>This means that there is a ton of demand by businesses for effective deep learning practitioners.

The problem is, how can the average business differentiate between good and bad practitioners?

As a deep learning practitioner, how can you best demonstrate that you can deliver skillful deep learning models?

In this post, you will discover the three levels of deep learning competence, and as a practitioner, what you must demonstrate at each level.

After reading this post, you will know:

- The problem of evaluating deep learning competence can best be addressed through project portfolios.
- A hierarchy of three competency levels can be used to sort practitioners and provide a framework for identifying the expected skills.
- The most common mistake that beginners make is starting at level 3, meaning they are trying to learn all levels at once, leading to confusion and frustration.

Let’s get started.

This post is divided into three parts; they are:

- The Problem of Assessing Competence
- Develop a Deep Learning Portfolio
- Levels of Deep Learning Competence

How do you know if a practitioner is competent with deep learning?

It is a hard problem.

- An academic may be able to describe techniques well mathematically and provide a list of papers.
- A developer may also be able to describe techniques well using intuitive explanations and a list of APIs.

Both are good signs of understanding.

But, on real business projects, we don’t need explanations.

We need working models that make skillful predictions. We need results.

Results trump almost everything.

Certainly results trump classical signals of competence such as education pedigree, work history, and experience level.

Most developers and developer hiring managers know this already.

Some don’t.

The best way to answer the question of whether a practitioner is competent or not is to show, not tell.

The practitioner must provide evidence to demonstrate that they understand how to apply deep learning techniques and use them to develop skillful models.

This means developing a public portfolio of projects using open source libraries and publicly available datasets.

This has benefits for a number of reasons:

- Skillful models can be presented.
- Code can be reviewed.
- Design decisions can be defended.

An honest discussion of real projects will quickly shed light on whether the practitioner knows what they are doing, or not.

- Employers must require and then focus on a portfolio of completed work in order to assess competence.
- Deep learning practitioners must develop and maintain a portfolio of completed projects in order to demonstrate competence.

As a practitioner, the question becomes: what are the levels of competence and what are the expectations at each level?

Projects can be carefully chosen in order to both develop and, in turn, demonstrate specific skills.

In this section, we’ll outline the levels of deep learning competence and the types of projects that you, as a practitioner, can develop and implement in order to learn, acquire and demonstrate each level of competence.

There are three levels of deep learning competence; they are:

**Level 1**: Modeling**Level 2**: Tuning**Level 3**: Applications

This may not be complete, but provides a good starting point for practitioners in a business setting.

Some notes about this hierarchy:

- The levels assume you are already a machine learning practitioner, probably not starting from scratch.
- Not all businesses require or could make best use of a Level 3 practitioner.
- Many practitioners dive in at Level 3, and try to figure out levels 1 and 2 on the fly.
- Level 2 is often neglected, but I think is critical in order to demonstrate a deeper understanding.

Other levels might capture topics not discussed such as coding algorithms from scratch, handling big data or streaming data, GPU programming, developing novel methods, etc.

If you have more ideas of levels of competence or projects, please let me know in the comments below.

Now, let’s take a closer look at each level in turn.

This level of competence with deep learning assumes that you are already a machine learning practitioner.

It is the lowest level and shows that you can use the tools and methods effectively on a classical machine learning type project.

It does not mean that you are expected to have advanced degrees or that you are a master practitioner. Instead, it means that you are familiar with the basics of applied machine learning and the process of working through a predictive modeling project end-to-end.

This is not a strict prerequisite as these elements can quickly be learned at this level if needed.

Competence at this level demonstrates the following:

**Library Competence**: That you know how to use an open source deep learning library to develop a model.**Modeling Competence**: That you know how to apply the applied machine learning process with neural network models.

Library competence means that you know how to set up the development environment and use the most common aspects of the API in order to define, fit, and use neural network models to make predictions.

It also means that you know the basic differences between each type of neural network model and when it might be appropriate to use it.

It does not mean that you know every function call and every parameter. It also does not mean that you know the mathematical description for specific techniques.

Modeling competence means that you know how to work through a machine learning project end-to-end using neural network models.

Specifically, it means that you are able to complete tasks such as:

- Defining the supervised learning problem and gathering the relevant data.
- Preparing data, including feature selection, imputing missing values, scaling, and other transforms.
- Evaluating a suite of models and model configurations using an objective test harness.
- Selecting and preparing a final model and using it to make predictions on new data.

It means that you can effectively harness neural networks on new projects in developing and using a skillful model.

It does not mean that you’re an expert at using all or even some neural network techniques or that you can achieve the best possible result. It also does not mean that you are familiar with all data types.

Projects to demonstrate this level of competence should use an open source deep learning library (such as Keras) and show each step of the applied machine learning process on public tabular machine learning datasets.

This does not mean achieving the best possible result for a dataset or even that using a neural network is the best possible model for the dataset. Instead, the goal is to demonstrate the capability of using neural networks, most likely simpler model types such as Multilayer Perceptrons.

A good source of datasets are the small in-memory datasets used widely in the 1990s and 2000s to demonstrate machine learning and even neural network performance, such as those listed on the UCI Machine Learning Repository.

The fact that the datasets are small and easily fit in memory means that the scope of the projects is also small, permits the use of robust model evaluation schemes such as k-fold cross-validation, and may require careful model design in order to avoid overfitting.

I would expect a range of projects handling the normal concerns of standard predictive modeling projects, such as:

Varied input data in order to demonstrate data preparation suitable for neural networks:

- Input variables with the same scale.
- Input variables with differing scales.
- Mixture of numerical and categorical variables.
- Variables with missing values.
- Data with redundant input features.

Varied target variables in order to demonstrate a suitable model configuration:

- Binary classification tasks.
- Multiclass classification tasks.
- Regression tasks.

This level of competence assumes Level 1 competence and shows that you can use classical as well as modern techniques in order to get the most from deep learning neural network models.

It demonstrates the following:

**Learning Competence**. That you can improve the training process of neural network models.**Generalization Competence**. That you can reduce overfitting of the training data and reduce the generalization error on out of sample data.**Prediction Competence**. That you can reduce the variance in the prediction of final models and lift model skill.

Learning competence means that you know how to configure and tune the hyperparameters of the learning algorithm in order to get good or better performance.

This means skill in tuning hyperparameters of stochastic gradient descent, such as:

- Batch size.
- Learning rate.
- Learning rate schedules
- Adaptive learning rates.

This means skill in tuning aspects that influence model capability, such as:

- Model choice.
- Choice of activation functions.
- Number of nodes.
- Number of layers.

This means skill in defusing problems with learning, such as:

- Vanishing gradients.
- Exploding gradients.

It also means skill with techniques to accelerate learning, such as:

- Batch normalization.
- Layer-wise training.
- Transfer learning.

Generalization competence means that you know how to configure and tune a model in order to reduce overfitting and improve model performance on out of sample data.

This includes classical techniques such as:

- Weight regularization.
- Adding noise.
- Early Stopping.

This also includes modern techniques such as:

- Weight constraints.
- Activity regularization.
- Dropout.

Predictive competence means that you know how to use techniques to reduce the variance of a chosen model when making predictions and combine models in order to lift performance.

This means use of ensemble techniques, such as:

- Model averaging.
- Stacking ensembles.
- Weight averaging.

Projects that demonstrate this level of competence may be less focused on all of the steps in the process of applied machine learning and instead may focus on a specific problem and technique or range of techniques designed to alleviate it.

For the three areas of competence, these could be:

- A problem of poor or slow learning by a model.
- A problem of overfitting a training dataset.
- A problem of high variance predictions.

Again, this does not mean achieving the best possible performance on a specific problem, only demonstrating the correct use of a technique and its ability to address the identified issue.

Choice of datasets and even problem type may matter less than the clear manifestation of the issue that is being investigated.

Some datasets naturally introduce problems; for example, small training datasets and imbalanced datasets can result in overfitting.

Standard machine learning datasets could be used. Alternately, problems can be contrived to demonstrate the issue or dataset generators could be used.

This level of competence assumes Level 1 and 2 competence and shows that you can use deep learning neural network techniques in specialized problem domains.

This is the demonstration of deep learning techniques beyond simple tabular datasets.

This is also the demonstration of deep learning on the types of problem domains and specific problem instances where the techniques may perform well or even be state-of-the-art.

It demonstrates the following:

**Data Handling Competence**. That you can load and prepare domain-specific data ready for modeling with neural networks.**Technique Competence**. That you can compare and select appropriate domain-specific neural network models.

Data handling competence means that you can acquire, load, use, and prepare the data for modeling.

This will very likely demonstrate competence with standard libraries for handling the data and standard techniques for preparing the data.

Some examples of domains and data handling might include:

**Time Series Forecasting**. Code for framing time series problems as supervised learning problems.**Computer Vision**. APIs for loading images and transforms to resize, perhaps standardize, pixels.**Natural Language Processing**. APIs for loading text data and transforms for encoding characters or words.

Technique competence means that you can correctly identify the techniques, models, and model architectures that are appropriate for a given domain-specific modeling problem.

This will very likely require a familiarity with common methods used in the academic literature and/or industry for general classes of problems in the domain.

Some examples of domains and domain-specific methods might include:

**Time Series Forecasting**. Use of sequence prediction models such as convolutional and recurrent neural network models.**Computer Vision**. Use of deep convolutional neural network models and use of specific architectures.**Natural Language Processing**. Use of deep recurrent neural network models and use of specific architectures.

Projects that demonstrate this level of competence must cover the applied machine learning process, include careful model tuning (aspects of competence Levels 1 and 2), and must focus on domain-specific datasets.

Datasets may be sourced from:

- Standard datasets used in academia to demonstrate methods.
- Datasets used on competitive machine learning websites.
- Original datasets defined and collected by you.

There may be a large range of problems that belong to a given problem domain, although there will be a subset that is perhaps more common or prominent and these can be the focus of demonstration projects.

Some example of domains and prominent sub-problems might include:

**Time Series Forecasting**. Univariate, multivariate, multi-step, and classification.**Computer Vision**. Object classification, object localization, and object description.**Natural Language Processing**. Text classification, text translation, and text summarization.

It may be desirable to demonstrate competence at a high level across multiple domains, and the data handling, modeling techniques, and skills will translate well.

It may also be desirable to specialize in a domain and to narrow down to demonstration projects on subtle sub-problems after the most prominent problems and techniques have been addressed.

Because projects of this type may demonstrate the broader appeal of deep learning (e.g. the ability to outperform classical methods), there is a danger in jumping straight to this level.

It may be possible for experienced practitioners, with ether deeper knowledge and experience with other machine learning methods or with deeper knowledge and experience with the domain.

Nevertheless, it is significantly harder as you may have to learn and you must harness and demonstrate all three levels of competence at once.

This is by far the biggest mistake of beginners who dive into domain-specific projects and crash into roadblock after roadblock given they are not yet competent with the library, the process of working through a project, and the process of improving model performance, let alone the specific data handling and modeling techniques used in the domain.

Again, it is possible to start at this level; it just triples the workload and may result in frustration.

Did this competence framework resonate? Do you think there are any holes?

Let me know in the comments.

This section provides more resources on the topic if you are looking to go deeper.

- Build a Machine Learning Portfolio
- What is Deep Learning?
- 8 Inspirational Applications of Deep Learning
- 7 Applications of Deep Learning for Natural Language Processing

- UC Irvine Machine Learning Repository
- List of datasets for machine learning research, Wikipedia.
- Deep Learning Datasets
- 25 Open Datasets for Deep Learning Every Data Scientist Must Work With, Analytics Vidhya.

In this post, you discovered the three levels of deep learning competence and as a practitioner, what you must demonstrate at each level.

Specifically, you learned:

- The problem of evaluating deep learning competence can best be addressed through project portfolios.
- A hierarchy of three competency levels can be used to sort practitioners and provide a framework for identifying the expected skills
- The most common mistake that beginners make is starting at level 3, meaning they are trying to learn all levels at once, leading to confusion and frustration.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post 3 Levels of Deep Learning Competence appeared first on Machine Learning Mastery.

]]>The post Practical Deep Learning for Coders (Review) appeared first on Machine Learning Mastery.

]]>It is often taught in a bottom-up manner, requiring that you first get familiar with linear algebra, calculus, and mathematical optimization before eventually learning the neural network techniques. This can take years, and most of the background theory will not help you to get good results, fast.

Instead, a top-down approach can be used where first you learn how to get results with deep learning models on real-world problems and later learn more about how the methods work.

This is the exact approach used in the popular cause taught at fast.ai titled “*Practical Deep Learning for Coders*.”

In this post, you will discover the fast.ai course for developers looking to get started and get good at deep learning, including an overview of the course itself, the best practices introduced in the course, and a discussion and review of the whole course.

Let’s get started.

**Updated Nov/2019**: Fixed broken links after the course was updated for the 2019 edition.

This tutorial is divided into four parts; they are:

- Course Overview
- Course Breakdown
- Best Practices
- Discussion and Review

fast.ai is a small organization that provides free training on practical machine learning and deep learning.

Their mission is to make deep learning accessible to all, really to developers.

At the time of writing, fast.ai offers four courses; they are:

- Practical Deep Learning for Coders, Part 1
- Cutting-Edge Deep Learning for Coders, Part 2
- Introduction to Machine Learning for Coders
- Computational Linear Algebra for Coders

The organization was founded by Jeremy Howard (Enlitic, Kaggle, and more) and Rachel Thomas (University of San Francisco, Uber, and more).

Jeremy is a world-class practitioner, who first achieved top performance on Kaggle and later joined the company. Anything he has to say on the practice of machine learning or deep learning should be considered. Rachel has the academic (Ph.D.) and math chops required in the partnership and has gone on to provide a sister course on some relevant mathematical underpinnings of deep learning.

Most courses are first delivered at the University of San Francisco by either Jeremy or Rachel, then the videos and course material are made available for free.

Of note is their first and most important course:

**Practical Deep Learning for Coders (part 1)**.

This course was first delivered and made available at the end of 2016. It was recently updated or recreated (end of 2017), which is the current version of the course that is available at the time of writing. This may change with future updates.

Importantly, the main change from v1 to v2 of the course was the shift away from the Keras deep learning framework (wrapper for Google’s TensorFlow) to their own open source fast.ai library that provides a wrapper for Facebook’s PyTorch deep learning framework.

This move away from Keras towards PyTorch was made in the interest of flexibility. Their own wrapper captures many state-of-the-art methods and best practices but also hides a lot of the detail. It may be best suited to practitioners and less so to academics than the more general Keras.

**Update**: I reviewed the 2018 version of the course, although the 2019 version is now available.

The full list of lectures for the course is listed below (links to each embedded video).

- 1. Recognizing cats and dogs.
- 2. Improving your image classifier.
- 3. Understanding convolutions.
- 4. Structured, time series & and language models.
- 5. Collaborative filtering. Inside the training loop.
- 6. Interpreting embeddings. RNNs from scratch.
- 7. Resnets from scratch.

I prefer to watch the videos at double speed and take notes. All videos are available as a YouTube playlist.

The course teaches via a top-down, rather than a bottom-up, approach. Specifically, this means first showing how to do something, then later repeating the process but showing all of the detail. This does not mean math and theory in the follow-up necessarily; instead, it refers to the practical concerns of how to achieve a result.

It is an excellent way of approaching the material. A slide in lecture 3 (which shows up many times throughout the course) provides an overview of this approach to deep learning; specifically, the first few tutorials demonstrate how to achieve results with computer vision, structured data (tabular data), natural language processing, and collaborative filtering, then these topics are covered in reverse order again but models are developed from scratch to show how they work (i.e. not why they work).

A focus of the course is on teaching best practices.

These are the recommended ways of approaching and working through specific predictive modeling problems using deep learning methods.

Best practices are presented both in terms of process (e.g. a consistent way of working through a new predictive modeling problem) and techniques. They are also baked into the PyTorch wrapper called fast.ai used in all lectures.

Many best practices are covered, and some are subtle in that it is the way the subject is introduced rather than the practice being pointed out as an alternative to a conventional approach.

There has been some attempt to catalog the best practices; for example:

- 30+ Best Practices, Forum Post.
- Deep Learning Best Practices, GitHub.
- fastai v1 for PyTorch: Fast and accurate neural nets using modern best practices.
- Ten Techniques Learned From fast.ai

Looking at my notes, some best practices that I took away were the following:

- Always use transfer learning (e.g. ImageNet model) as a starting point in computer vision, but carefully choose the point in the model to add new layers.
- Try different learning rates for different layers in transfer learning for computer vision (e.g. differential learning).
- Use test time augmentation to give a model multiple chances of making a good prediction (wow!).
- First train a model with very small images, then later re-train with larger images (e.g. progress resizing of images).
- Use cyclical learning rates to quickly find a good learning rate for SGD (e.g. the learning rate finder).
- Use cosine annealing learning rate schedule with restarts during training.
- Use transfer learning for language models.
- Use of embedding layers more broadly, such as for all categorical input variables, not just words.
- Use of embedding layers for movies and users in collaborative filtering.

I was across each of these methods, I just had not considered that they should be the starting point (e.g. best practice). Instead, I considered them tools to bring to a project to lift performance when needed.

The course is excellent.

- Jeremy is a master practitioner and an excellent communicator.
- The level of detail is right: high-level first, then lower-level, but all how-to, not why.
- Application focused rather than technique focused.

If you’re a deep learning practitioner or you want to be, then the course is required viewing.

The videos are too long for me. I used the YouTube playlist and watched videos on double time while I took notes in a text editor.

I am not interested in using the fast.ai library or pytorch at this stage, so I skimmed or skipped over the code specific parts. In general, I prefer not to learn code from video, so I would skip these sections anyway.

The value of these lectures is in seeing the steps and thought processes behind the specific way that Jeremy works through predictive modeling problems using deep learning methods. Because he is focused on good and fast results, you get exactly what you need to know, without the detail and background that everyone else forces you to wade through before getting to the point.

A little like Andrew Ng, he explains everything so simply that you feel confident enough to pick up the tools and start using them.

His competence with Kaggle competitions makes you want to dive into past competition datasets to test the methods immediately.

Finally, the sense of community he cultivates on the forum and in calling out blog posts that summarize his teachings makes you want to join and contribute.

Again, if you at all care about being a deep learning practitioner, it is required viewing.

This section provides more resources on the topic if you are looking to go deeper.

- Practical Deep Learning For Coders, Part 1, Homepage.
- Deep Learning Certificate Part I, Homepage.
- Practical Deep Learning For Coders 1, YouTube Playlist (2018).
- Practical Deep Learning For Coders, Python Notebooks.
- fastai PyTorch Wrapper.
- The Fastai v1 Deep Learning Framework with Jeremy Howard, Podcast Interview.

In this post, you discovered the fast.ai course for developers looking to get started and get good at deep learning

Have you taken this course or worked through any of the material?

Let me know your thoughts on it in the comments below.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post Practical Deep Learning for Coders (Review) appeared first on Machine Learning Mastery.

]]>The post Why Initialize a Neural Network with Random Weights? appeared first on Machine Learning Mastery.

]]>This is because this is an expectation of the stochastic optimization algorithm used to train the model, called stochastic gradient descent.

To understand this approach to problem solving, you must first understand the role of nondeterministic and randomized algorithms as well as the need for stochastic optimization algorithms to harness randomness in their search process.

In this post, you will discover the full background as to why neural network weights must be randomly initialized.

After reading this post, you will know:

- About the need for nondeterministic and randomized algorithms for challenging problems.
- The use of randomness during initialization and search in stochastic optimization algorithms.
- That stochastic gradient descent is a stochastic optimization algorithm and requires the random initialization of network weights.

Let’s get started.

This post is divided into 4 parts; they are:

- Deterministic and Non-Deterministic Algorithms
- Stochastic Search Algorithms
- Random Initialization in Neural Networks
- Initialization Methods

Classical algorithms are deterministic.

An example is an algorithm to sort a list.

Given an unsorted list, the sorting algorithm, say bubble sort or quick sort, will systematically sort the list until you have an ordered result. Deterministic means that each time the algorithm is given the same list, it will execute in exactly the same way. It will make the same moves at each step of the procedure.

Deterministic algorithms are great as they can make guarantees about best, worst, and average running time. The problem is, they are not suitable for all problems.

Some problems are hard for computers. Perhaps because of the number of combinations; perhaps because of the size of data. They are so hard because a deterministic algorithm cannot be used to solve them efficiently. The algorithm may run, but will continue running until the heat death of the universe.

An alternate solution is to use nondeterministic algorithms. These are algorithms that use elements of randomness when making decisions during the execution of the algorithm. This means that a different order of steps will be followed when the same algorithm is rerun on the same data.

They can rapidly speed up the process of getting a solution, but the solution will be approximate, or “*good*,” but often not the “*best*.” Nondeterministic algorithms often cannot make strong guarantees about running time or the quality of the solution found.

This is often fine as the problems are so hard that any good solution will often be satisfactory.

Search problems are often very challenging and require the use of nondeterministic algorithms that make heavy use of randomness.

The algorithms are not random per se; instead they make careful use of randomness. They are random within a bound and are referred to as stochastic algorithms.

The incremental, or step-wise, nature of the search often means the process and the algorithms are referred to as an optimization from an initial state or position to a final state or position. For example, stochastic optimization problem or a stochastic optimization algorithm.

Some examples include the genetic algorithm, simulated annealing, and stochastic gradient descent.

The search process is incremental from a starting point in the space of possible solutions toward some good enough solution.

They share common features in their use of randomness, such as:

- Use of randomness during initialization.
- Use of randomness during the progression of the search.

We know nothing about the structure of the search space. Therefore, to remove bias from the search process, we start from a randomly chosen position.

As the search process unfolds, there is a risk that we are stuck in an unfavorable area of the search space. Using randomness during the search process gives some likelihood of getting unstuck and finding a better final candidate solution.

The idea of getting stuck and returning a less-good solution is referred to as getting stuck in a local optima.

These two elements of random initialization and randomness during the search work together.

They work together better if we consider any solution found by the search as provisional, or a candidate, and that the search process can be performed multiple times.

This gives the stochastic search process multiple opportunities to start and traverse the space of candidate solutions in search of a better candidate solution–a so-called global optima.

The navigation of the space of candidate solutions is often described using the analogy of a one- or two-landscape of mountains and valleys (e.g. like a fitness landscape). If we are maximizing a score during the search, we can think of small hills in the landscape as a local optima and the largest hills as the global optima.

This is a fascinating area of research, an area where I have some background. For example, see my book:

Artificial neural networks are trained using a stochastic optimization algorithm called stochastic gradient descent.

The algorithm uses randomness in order to find a good enough set of weights for the specific mapping function from inputs to outputs in your data that is being learned. It means that your specific network on your specific training data will fit a different network with a different model skill each time the training algorithm is run.

This is a feature, not a bug.

I write about this issue more in the post:

As described in the previous section, stochastic optimization algorithms such as stochastic gradient descent use randomness in selecting a starting point for the search and in the progression of the search.

Specifically, stochastic gradient descent requires that the weights of the network are initialized to small random values (random, but close to zero, such as in [0.0, 0.1]). Randomness is also used during the search process in the shuffling of the training dataset prior to each epoch, which in turn results in differences in the gradient estimate for each batch.

You can learn more about stochastic gradient descent in this post:

The progression of the search or learning of a neural network is referred to as convergence. The discovering of a sub-optimal solution or local optima is referred to as premature convergence.

Training algorithms for deep learning models are usually iterative in nature and thus require the user to specify some initial point from which to begin the iterations. Moreover, training deep models is a sufficiently difficult task that most algorithms are strongly affected by the choice of initialization.

— Page 301, Deep Learning, 2016.

The most effective way to evaluate the skill of a neural network configuration is to repeat the search process multiple times and report the average performance of the model over those repeats. This gives the configuration the best chance to search the space from multiple different sets of initial conditions. Sometimes this is called a multiple restart or multiple-restart search.

You can learn more about the effective evaluation of neural networks in this post:

We can use the same set of weights each time we train the network; for example, you could use the values of 0.0 for all weights.

In this case, the equations of the learning algorithm would fail to make any changes to the network weights, and the model will be stuck. It is important to note that the bias weight in each neuron is set to zero by default, not a small random value.

Specifically, nodes that are side-by-side in a hidden layer connected to the same inputs must have different weights for the learning algorithm to update the weights.

This is often referred to as the need to break symmetry during training.

Perhaps the only property known with complete certainty is that the initial parameters need to “break symmetry” between different units. If two hidden units with the same activation function are connected to the same inputs, then these units must have different initial parameters. If they have the same initial parameters, then a deterministic learning algorithm applied to a deterministic cost and model will constantly update both of these units in the same way.

— Page 301, Deep Learning, 2016.

We could use the same set of random numbers each time the network is trained.

This would not be helpful when evaluating network configurations.

It may be helpful in order to train the same final set of network weights given a training dataset in the case where a model is being used in a production environment.

You can learn more about fixing the random seed for neural networks developed with Keras in this post:

Traditionally, the weights of a neural network were set to small random numbers.

The initialization of the weights of neural networks is a whole field of study as the careful initialization of the network can speed up the learning process.

Modern deep learning libraries, such as Keras, offer a host of network initialization methods, all are variations of initializing the weights with small random numbers.

For example, the current methods are available in Keras at the time of writing for all network types:

**Zeros**: Initializer that generates tensors initialized to 0.**Ones**: Initializer that generates tensors initialized to 1.**Constant**: Initializer that generates tensors initialized to a constant value.**RandomNormal**: Initializer that generates tensors with a normal distribution.**RandomUniform**: Initializer that generates tensors with a uniform distribution.**TruncatedNormal**: Initializer that generates a truncated normal distribution.**VarianceScaling**: Initializer capable of adapting its scale to the shape of weights.**Orthogonal**: Initializer that generates a random orthogonal matrix.**Identity**: Initializer that generates the identity matrix.**lecun_uniform**: LeCun uniform initializer.**glorot_normal**: Glorot normal initializer, also called Xavier normal initializer.**glorot_uniform**: Glorot uniform initializer, also called Xavier uniform initializer.**he_normal**: He normal initializer.**lecun_normal**: LeCun normal initializer.**he_uniform**: He uniform variance scaling initializer.

See the documentation for more details.

Out of interest, the default initializers chosen by Keras developers for different layer types are as follows:

**Dense**(e.g. MLP):*glorot_uniform***LSTM**:*glorot_uniform***CNN**:*glorot_uniform*

You can learn more about “*glorot_uniform*“, also called “*Xavier uniform*“, named for the developer of the method Xavier Glorot, in the paper:

There is no single best way to initialize the weights of a neural network.

Modern initialization strategies are simple and heuristic. Designing improved initialization strategies is a difficult task because neural network optimization is not yet well understood. […] Our understanding of how the initial point affects generalization is especially primitive, offering little to no guidance for how to select the initial point.

— Page 301, Deep Learning, 2016.

It is one more hyperparameter for you to explore and test and experiment with on your specific predictive modeling problem.

Do you have a favorite method for weight initialization?

Let me know in the comments below.

This section provides more resources on the topic if you are looking to go deeper.

- Deep Learning, 2016.

- Nondeterministic algorithm on Wikipedia
- Randomized algorithm on Wikipedia
- Stochastic optimization on Wikipedia
- Stochastic gradient descent on Wikipedia
- Fitness landscape on Wikipedia
- Neural Network FAQ
- Keras Weight Initialization
- Understanding the difficulty of training deep feedforward neural networks, 2010.

- What are good initial weights in a neural network?
- Why should weights of Neural Networks be initialized to random numbers?
- What are good initial weights in a neural network?

In this post, you discovered why neural network weights must be randomly initialized.

Specifically, you learned:

- About the need for nondeterministic and randomized algorithms for challenging problems.
- The use of randomness during initialization and search in stochastic optimization algorithms.
- That stochastic gradient descent is a stochastic optimization algorithm and requires the random initialization of network weights.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post Why Initialize a Neural Network with Random Weights? appeared first on Machine Learning Mastery.

]]>