The post Softmax Activation Function with Python appeared first on Machine Learning Mastery.

]]>The most common use of the softmax function in applied machine learning is in its use as an activation function in a neural network model. Specifically, the network is configured to output N values, one for each class in the classification task, and the softmax function is used to normalize the outputs, converting them from weighted sum values into probabilities that sum to one. Each value in the output of the softmax function is interpreted as the probability of membership for each class.

In this tutorial, you will discover the softmax activation function used in neural network models.

After completing this tutorial, you will know:

- Linear and Sigmoid activation functions are inappropriate for multi-class classification tasks.
- Softmax can be thought of as a softened version of the argmax function that returns the index of the largest value in a list.
- How to implement the softmax function from scratch in Python and how to convert the output into a class label.

Let’s get started.

This tutorial is divided into three parts; they are:

- Predicting Probabilities With Neural Networks
- Max, Argmax, and Softmax
- Softmax Activation Function

Neural network models can be used to model classification predictive modeling problems.

Classification problems are those that involve predicting a class label for a given input. A standard approach to modeling classification problems is to use a model to predict the probability of class membership. That is, given an example, what is the probability of it belonging to each of the known class labels?

- For a binary classification problem, a Binomial probability distribution is used. This is achieved using a network with a single node in the output layer that predicts the probability of an example belonging to class 1.
- For a multi-class classification problem, a Multinomial probability is used. This is achieved using a network with one node for each class in the output layer and the sum of the predicted probabilities equals one.

A neural network model requires an activation function in the output layer of the model to make the prediction.

There are different activation functions to choose from; let’s look at a few.

One approach to predicting class membership probabilities is to use a linear activation.

A linear activation function is simply the sum of the weighted input to the node, required as input for any activation function. As such, it is often referred to as “*no activation function*” as no additional transformation is performed.

Recall that a probability or a likelihood is a numeric value between 0 and 1.

Given that no transformation is performed on the weighted sum of the input, it is possible for the linear activation function to output any numeric value. This makes the linear activation function inappropriate for predicting probabilities for either the binomial or multinomial case.

Another approach to predicting class membership probabilities is to use a sigmoid activation function.

This function is also called the logistic function. Regardless of the input, the function always outputs a value between 0 and 1. The form of the function is an S-shape between 0 and 1 with the vertical or middle of the “*S*” at 0.5.

This allows very large values given as the weighted sum of the input to be output as 1.0 and very small or negative values to be mapped to 0.0.

The sigmoid activation is an ideal activation function for a binary classification problem where the output is interpreted as a Binomial probability distribution.

The sigmoid activation function can also be used as an activation function for multi-class classification problems where classes are non-mutually exclusive. These are often referred to as a multi-label classification rather than multi-class classification.

The sigmoid activation function is not appropriate for multi-class classification problems with mutually exclusive classes where a multinomial probability distribution is required.

Instead, an alternate activation is required called the **softmax function**.

The maximum, or “*max*,” mathematical function returns the largest numeric value for a list of numeric values.

We can implement this using the *max()* Python function; for example:

# example of the max of a list of numbers # define data data = [1, 3, 2] # calculate the max of the list result = max(data) print(result)

Running the example returns the largest value “3” from the list of numbers.

3

The argmax, or “*arg max*,” mathematical function returns the index in the list that contains the largest value.

Think of it as the meta version of max: one level of indirection above max, pointing to the position in the list that has the max value rather than the value itself.

We can implement this using the argmax() NumPy function; for example:

# example of the argmax of a list of numbers from numpy import argmax # define data data = [1, 3, 2] # calculate the argmax of the list result = argmax(data) print(result)

Running the example returns the list index value “1” that points to the array index [1] that contains the largest value in the list “3”.

1

The softmax, or “*soft max*,” mathematical function can be thought to be a probabilistic or “*softer*” version of the argmax function.

The term softmax is used because this activation function represents a smooth version of the winner-takes-all activation model in which the unit with the largest input has output +1 while all other units have output 0.

— Page 238, Neural Networks for Pattern Recognition, 1995.

From a probabilistic perspective, if the *argmax()* function returns 1 in the previous section, it returns 0 for the other two array indexes, giving full weight to index 1 and no weight to index 0 and index 2 for the largest value in the list [1, 3, 2].

[0, 1, 0]

What if we were less sure and wanted to express the argmax probabilistically, with likelihoods?

This can be achieved by scaling the values in the list and converting them into probabilities such that all values in the returned list sum to 1.0.

This can be achieved by calculating the exponent of each value in the list and dividing it by the sum of the exponent values.

- probability = exp(value) / sum v in list exp(v)

For example, we can turn the first value “1” in the list [1, 3, 2] into a probability as follows:

- probability = exp(1) / (exp(1) + exp(3) + exp(2))
- probability = exp(1) / (exp(1) + exp(3) + exp(2))
- probability = 2.718281828459045 / 30.19287485057736
- probability = 0.09003057317038046

We can demonstrate this for each value in the list [1, 3, 2] in Python as follows:

# transform values into probabilities from math import exp # calculate each probability p1 = exp(1) / (exp(1) + exp(3) + exp(2)) p2 = exp(3) / (exp(1) + exp(3) + exp(2)) p3 = exp(2) / (exp(1) + exp(3) + exp(2)) # report probabilities print(p1, p2, p3) # report sum of probabilities print(p1 + p2 + p3)

Running the example converts each value in the list into a probability and reports the values, then confirms that all probabilities sum to the value 1.0.

We can see that most weight is put on index 1 (67 percent) with less weight on index 2 (24 percent) and even less on index 0 (9 percent).

0.09003057317038046 0.6652409557748219 0.24472847105479767 1.0

This is the softmax function.

We can implement it as a function that takes a list of numbers and returns the softmax or multinomial probability distribution for the list.

The example below implements the function and demonstrates it on our small list of numbers.

# example of a function for calculating softmax for a list of numbers from numpy import exp # calculate the softmax of a vector def softmax(vector): e = exp(vector) return e / e.sum() # define data data = [1, 3, 2] # convert list of numbers to a list of probabilities result = softmax(data) # report the probabilities print(result) # report the sum of the probabilities print(sum(result))

Running the example reports roughly the same numbers with minor differences in precision.

[0.09003057 0.66524096 0.24472847] 1.0

Finally, we can use the built-in softmax() NumPy function to calculate the softmax for an array or list of numbers, as follows:

# example of calculating the softmax for a list of numbers from scipy.special import softmax # define data data = [1, 3, 2] # calculate softmax result = softmax(data) # report the probabilities print(result) # report the sum of the probabilities print(sum(result))

Running the example, again, we get very similar results with very minor differences in precision.

[0.09003057 0.66524096 0.24472847] 0.9999999999999997

Now that we are familiar with the softmax function, let’s look at how it is used in a neural network model.

The softmax function is used as the activation function in the output layer of neural network models that predict a multinomial probability distribution.

That is, softmax is used as the activation function for multi-class classification problems where class membership is required on more than two class labels.

Any time we wish to represent a probability distribution over a discrete variable with n possible values, we may use the softmax function. This can be seen as a generalization of the sigmoid function which was used to represent a probability distribution over a binary variable.

— Page 184, Deep Learning, 2016.

The function can be used as an activation function for a hidden layer in a neural network, although this is less common. It may be used when the model internally needs to choose or weight multiple different inputs at a bottleneck or concatenation layer.

Softmax units naturally represent a probability distribution over a discrete variable with k possible values, so they may be used as a kind of switch.

— Page 196, Deep Learning, 2016.

In the Keras deep learning library with a three-class classification task, use of softmax in the output layer may look as follows:

... model.add(Dense(3, activation='softmax'))

By definition, the softmax activation will output one value for each node in the output layer. The output values will represent (or can be interpreted as) probabilities and the values sum to 1.0.

When modeling a multi-class classification problem, the data must be prepared. The target variable containing the class labels is first label encoded, meaning that an integer is applied to each class label from 0 to N-1, where N is the number of class labels.

The label encoded (or integer encoded) target variables are then one-hot encoded. This is a probabilistic representation of the class label, much like the softmax output. A vector is created with a position for each class label and the position. All values are marked 0 (impossible) and a 1 (certain) is used to mark the position for the class label.

For example, three class labels will be integer encoded as 0, 1, and 2. Then encoded to vectors as follows:

- Class 0: [1, 0, 0]
- Class 1: [0, 1, 0]
- Class 2: [0, 0, 1]

This is called a one-hot encoding.

It represents the expected multinomial probability distribution for each class used to correct the model under supervised learning.

The softmax function will output a probability of class membership for each class label and attempt to best approximate the expected target for a given input.

For example, if the integer encoded class 1 was expected for one example, the target vector would be:

- [0, 1, 0]

The softmax output might look as follows, which puts the most weight on class 1 and less weight on the other classes.

- [0.09003057 0.66524096 0.24472847]

The error between the expected and predicted multinomial probability distribution is often calculated using cross-entropy, and this error is then used to update the model. This is called the cross-entropy loss function.

For more on cross-entropy for calculating the difference between probability distributions, see the tutorial:

We may want to convert the probabilities back into an integer encoded class label.

This can be achieved using the *argmax()* function that returns the index of the list with the largest value. Given that the class labels are integer encoded from 0 to N-1, the argmax of the probabilities will always be the integer encoded class label.

- class integer = argmax([0.09003057 0.66524096 0.24472847])
- class integer = 1

This section provides more resources on the topic if you are looking to go deeper.

- Neural Networks for Pattern Recognition, 1995.
- Neural Networks: Tricks of the Trade: Tricks of the Trade, 2nd Edition, 2012.
- Deep Learning, 2016.

In this tutorial, you discovered the softmax activation function used in neural network models.

Specifically, you learned:

- Linear and Sigmoid activation functions are inappropriate for multi-class classification tasks.
- Softmax can be thought of as a softened version of the argmax function that returns the index of the largest value in a list.
- How to implement the softmax function from scratch in Python and how to convert the output into a class label.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.

The post Softmax Activation Function with Python appeared first on Machine Learning Mastery.

]]>The post How to Use AutoKeras for Classification and Regression appeared first on Machine Learning Mastery.

]]>When applied to neural networks, this involves both discovering the model architecture and the hyperparameters used to train the model, generally referred to as **neural architecture search**.

AutoKeras is an open-source library for performing AutoML for deep learning models. The search is performed using so-called Keras models via the TensorFlow tf.keras API.

It provides a simple and effective approach for automatically finding top-performing models for a wide range of predictive modeling tasks, including tabular or so-called structured classification and regression datasets.

In this tutorial, you will discover how to use AutoKeras to find good neural network models for classification and regression tasks.

After completing this tutorial, you will know:

- AutoKeras is an implementation of AutoML for deep learning that uses neural architecture search.
- How to use AutoKeras to find a top-performing model for a binary classification dataset.
- How to use AutoKeras to find a top-performing model for a regression dataset.

Let’s get started.

**Update Sep/2020**: Updated AutoKeras version and installation instructions.

This tutorial is divided into three parts; they are:

- AutoKeras for Deep Learning
- AutoKeras for Classification
- AutoKeras for Regression

Automated Machine Learning, or AutoML for short, refers to automatically finding the best combination of data preparation, model, and model hyperparameters for a predictive modeling problem.

The benefit of AutoML is allowing machine learning practitioners to quickly and effectively address predictive modeling tasks with very little input, e.g. fire and forget.

Automated Machine Learning (AutoML) has become a very important research topic with wide applications of machine learning techniques. The goal of AutoML is to enable people with limited machine learning background knowledge to use machine learning models easily.

— Auto-keras: An efficient neural architecture search system, 2019.

AutoKeras is an implementation of AutoML for deep learning models using the Keras API, specifically the tf.keras API provided by TensorFlow 2.

It uses a process of searching through neural network architectures to best address a modeling task, referred to more generally as Neural Architecture Search, or NAS for short.

… we have developed a widely adopted open-source AutoML system based on our proposed method, namely Auto-Keras. It is an open-source AutoML system, which can be downloaded and installed locally.

— Auto-keras: An efficient neural architecture search system, 2019.

In the spirit of Keras, AutoKeras provides an easy-to-use interface for different tasks, such as image classification, structured data classification or regression, and more. The user is only required to specify the location of the data and the number of models to try and is returned a model that achieves the best performance (under the configured constraints) on that dataset.

**Note**: AutoKeras provides a TensorFlow 2 Keras model (e.g. tf.keras) and not a Standalone Keras model. As such, the library assumes that you have Python 3 and TensorFlow 2.3.0 or higher installed.

At the time of writing, you require a prerequisite library called keras-tuner to be installed manually. You can install this library as follows:

sudo pip install git+https://github.com/keras-team/keras-tuner.git@1.0.2rc1

If things change again, as they often do with fast-moving open source projects, see the official installation instructions here:

Now we can instal AutoKeras.

To install AutoKeras, you can use Pip, as follows:

sudo pip install autokeras

You can confirm the installation was successful and check the version number as follows:

sudo pip show autokeras

You should see output like the following:

Name: autokeras Version: 1.0.8 Summary: AutoML for deep learning Home-page: http://autokeras.com Author: Data Analytics at Texas A&M (DATA) Lab, Keras Team Author-email: jhfjhfj1@gmail.com License: MIT Location: ... Requires: tensorflow, packaging, pandas, scikit-learn Required-by:

Once installed, you can then apply AutoKeras to find a good or great neural network model for your predictive modeling task.

We will take a look at two common examples where you may want to use AutoKeras, classification and regression on tabular data, so-called structured data.

AutoKeras can be used to discover a good or great model for classification tasks on tabular data.

Recall tabular data are those datasets composed of rows and columns, such as a table or data as you would see in a spreadsheet.

In this section, we will develop a model for the Sonar classification dataset for classifying sonar returns as rocks or mines. This dataset consists of 208 rows of data with 60 input features and a target class label of 0 (rock) or 1 (mine).

A naive model can achieve a classification accuracy of about 53.4 percent via repeated 10-fold cross-validation, which provides a lower-bound. A good model can achieve an accuracy of about 88.2 percent, providing an upper-bound.

You can learn more about the dataset here:

No need to download the dataset; we will download it automatically as part of the example.

First, we can download the dataset and split it into a randomly selected train and test set, holding 33 percent for test and using 67 percent for training.

The complete example is listed below.

# load the sonar dataset from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder # load dataset url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv' dataframe = read_csv(url, header=None) print(dataframe.shape) # split into input and output elements data = dataframe.values X, y = data[:, :-1], data[:, -1] print(X.shape, y.shape) # basic data preparation X = X.astype('float32') y = LabelEncoder().fit_transform(y) # separate into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

Running the example first downloads the dataset and summarizes the shape, showing the expected number of rows and columns.

The dataset is then split into input and output elements, then these elements are further split into train and test datasets.

(208, 61) (208, 60) (208,) (139, 60) (69, 60) (139,) (69,)

We can use AutoKeras to automatically discover an effective neural network model for this dataset.

This can be achieved by using the StructuredDataClassifier class and specifying the number of models to search. This defines the search to perform.

... # define the search search = StructuredDataClassifier(max_trials=15)

We can then execute the search using our loaded dataset.

... # perform the search search.fit(x=X_train, y=y_train, verbose=0)

This may take a few minutes and will report the progress of the search.

Next, we can evaluate the model on the test dataset to see how it performs on new data.

... # evaluate the model loss, acc = search.evaluate(X_test, y_test, verbose=0) print('Accuracy: %.3f' % acc)

We then use the model to make a prediction for a new row of data.

... # use the model to make a prediction row = [0.0200,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,0.1609,0.1582,0.2238,0.0645,0.0660,0.2273,0.3100,0.2999,0.5078,0.4797,0.5783,0.5071,0.4328,0.5550,0.6711,0.6415,0.7104,0.8080,0.6791,0.3857,0.1307,0.2604,0.5121,0.7547,0.8537,0.8507,0.6692,0.6097,0.4943,0.2744,0.0510,0.2834,0.2825,0.4256,0.2641,0.1386,0.1051,0.1343,0.0383,0.0324,0.0232,0.0027,0.0065,0.0159,0.0072,0.0167,0.0180,0.0084,0.0090,0.0032] X_new = asarray([row]).astype('float32') yhat = search.predict(X_new) print('Predicted: %.3f' % yhat[0])

We can retrieve the final model, which is an instance of a TensorFlow Keras model.

... # get the best performing model model = search.export_model()

We can then summarize the structure of the model to see what was selected.

... # summarize the loaded model model.summary()

Finally, we can save the model to file for later use, which can be loaded using the TensorFlow load_model() function.

... # save the best performing model to file model.save('model_sonar.h5')

Tying this together, the complete example of applying AutoKeras to find an effective neural network model for the Sonar dataset is listed below.

# use autokeras to find a model for the sonar dataset from numpy import asarray from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from autokeras import StructuredDataClassifier # load dataset url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv' dataframe = read_csv(url, header=None) print(dataframe.shape) # split into input and output elements data = dataframe.values X, y = data[:, :-1], data[:, -1] print(X.shape, y.shape) # basic data preparation X = X.astype('float32') y = LabelEncoder().fit_transform(y) # separate into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) # define the search search = StructuredDataClassifier(max_trials=15) # perform the search search.fit(x=X_train, y=y_train, verbose=0) # evaluate the model loss, acc = search.evaluate(X_test, y_test, verbose=0) print('Accuracy: %.3f' % acc) # use the model to make a prediction row = [0.0200,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,0.1609,0.1582,0.2238,0.0645,0.0660,0.2273,0.3100,0.2999,0.5078,0.4797,0.5783,0.5071,0.4328,0.5550,0.6711,0.6415,0.7104,0.8080,0.6791,0.3857,0.1307,0.2604,0.5121,0.7547,0.8537,0.8507,0.6692,0.6097,0.4943,0.2744,0.0510,0.2834,0.2825,0.4256,0.2641,0.1386,0.1051,0.1343,0.0383,0.0324,0.0232,0.0027,0.0065,0.0159,0.0072,0.0167,0.0180,0.0084,0.0090,0.0032] X_new = asarray([row]).astype('float32') yhat = search.predict(X_new) print('Predicted: %.3f' % yhat[0]) # get the best performing model model = search.export_model() # summarize the loaded model model.summary() # save the best performing model to file model.save('model_sonar.h5')

Running the example will report a lot of debug information about the progress of the search.

The models and results are all saved in a folder called “*structured_data_classifier*” in your current working directory.

... [Trial complete] [Trial summary] |-Trial ID: e8265ad768619fc3b69a85b026f70db6 |-Score: 0.9259259104728699 |-Best step: 0 > Hyperparameters: |-classification_head_1/dropout_rate: 0 |-optimizer: adam |-structured_data_block_1/dense_block_1/dropout_rate: 0.0 |-structured_data_block_1/dense_block_1/num_layers: 2 |-structured_data_block_1/dense_block_1/units_0: 32 |-structured_data_block_1/dense_block_1/units_1: 16 |-structured_data_block_1/dense_block_1/units_2: 512 |-structured_data_block_1/dense_block_1/use_batchnorm: False |-structured_data_block_1/dense_block_2/dropout_rate: 0.25 |-structured_data_block_1/dense_block_2/num_layers: 3 |-structured_data_block_1/dense_block_2/units_0: 32 |-structured_data_block_1/dense_block_2/units_1: 16 |-structured_data_block_1/dense_block_2/units_2: 16 |-structured_data_block_1/dense_block_2/use_batchnorm: False

The best-performing model is then evaluated on the hold-out test dataset.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the model achieved a classification accuracy of about 82.6 percent.

Accuracy: 0.826

Next, the architecture of the best-performing model is reported.

We can see a model with two hidden layers with dropout and ReLU activation.

Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 60)] 0 _________________________________________________________________ categorical_encoding (Catego (None, 60) 0 _________________________________________________________________ dense (Dense) (None, 256) 15616 _________________________________________________________________ re_lu (ReLU) (None, 256) 0 _________________________________________________________________ dropout (Dropout) (None, 256) 0 _________________________________________________________________ dense_1 (Dense) (None, 512) 131584 _________________________________________________________________ re_lu_1 (ReLU) (None, 512) 0 _________________________________________________________________ dropout_1 (Dropout) (None, 512) 0 _________________________________________________________________ dense_2 (Dense) (None, 1) 513 _________________________________________________________________ classification_head_1 (Sigmo (None, 1) 0 ================================================================= Total params: 147,713 Trainable params: 147,713 Non-trainable params: 0 _________________________________________________________________

AutoKeras can also be used for regression tasks, that is, predictive modeling problems where a numeric value is predicted.

We will use the auto insurance dataset that involves predicting the total payment from claims given the total number of claims. The dataset has 63 rows and one input and one output variable.

A naive model can achieve a mean absolute error (MAE) of about 66 using repeated 10-fold cross-validation, providing a lower-bound on expected performance. A good model can achieve a MAE of about 28, providing a performance upper-bound.

You can learn more about this dataset here:

We can load the dataset and split it into input and output elements and then train and test datasets.

The complete example is listed below.

# load the sonar dataset from pandas import read_csv from sklearn.model_selection import train_test_split # load dataset url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv' dataframe = read_csv(url, header=None) print(dataframe.shape) # split into input and output elements data = dataframe.values data = data.astype('float32') X, y = data[:, :-1], data[:, -1] print(X.shape, y.shape) # separate into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

Running the example loads the dataset, confirming the number of rows and columns, then splits the dataset into train and test sets.

(63, 2) (63, 1) (63,) (42, 1) (21, 1) (42,) (21,)

AutoKeras can be applied to a regression task using the StructuredDataRegressor class and configured for the number of models to trial.

... # define the search search = StructuredDataRegressor(max_trials=15, loss='mean_absolute_error')

The search can then be run and the best model saved, much like in the classification case.

... # define the search search = StructuredDataRegressor(max_trials=15, loss='mean_absolute_error') # perform the search search.fit(x=X_train, y=y_train, verbose=0)

We can then use the best-performing model and evaluate it on the hold out dataset, make a prediction on new data, and summarize its structure.

... # evaluate the model mae, _ = search.evaluate(X_test, y_test, verbose=0) print('MAE: %.3f' % mae) # use the model to make a prediction X_new = asarray([[108]]).astype('float32') yhat = search.predict(X_new) print('Predicted: %.3f' % yhat[0]) # get the best performing model model = search.export_model() # summarize the loaded model model.summary() # save the best performing model to file model.save('model_insurance.h5')

Tying this together, the complete example of using AutoKeras to discover an effective neural network model for the auto insurance dataset is listed below.

# use autokeras to find a model for the insurance dataset from numpy import asarray from pandas import read_csv from sklearn.model_selection import train_test_split from autokeras import StructuredDataRegressor # load dataset url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv' dataframe = read_csv(url, header=None) print(dataframe.shape) # split into input and output elements data = dataframe.values data = data.astype('float32') X, y = data[:, :-1], data[:, -1] print(X.shape, y.shape) # separate into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) # define the search search = StructuredDataRegressor(max_trials=15, loss='mean_absolute_error') # perform the search search.fit(x=X_train, y=y_train, verbose=0) # evaluate the model mae, _ = search.evaluate(X_test, y_test, verbose=0) print('MAE: %.3f' % mae) # use the model to make a prediction X_new = asarray([[108]]).astype('float32') yhat = search.predict(X_new) print('Predicted: %.3f' % yhat[0]) # get the best performing model model = search.export_model() # summarize the loaded model model.summary() # save the best performing model to file model.save('model_insurance.h5')

Running the example will report a lot of debug information about the progress of the search.

The models and results are all saved in a folder called “*structured_data_regressor*” in your current working directory.

... [Trial summary] |-Trial ID: ea28b767d13e958c3ace7e54e7cb5a14 |-Score: 108.62509155273438 |-Best step: 0 > Hyperparameters: |-optimizer: adam |-regression_head_1/dropout_rate: 0 |-structured_data_block_1/dense_block_1/dropout_rate: 0.0 |-structured_data_block_1/dense_block_1/num_layers: 2 |-structured_data_block_1/dense_block_1/units_0: 16 |-structured_data_block_1/dense_block_1/units_1: 1024 |-structured_data_block_1/dense_block_1/units_2: 128 |-structured_data_block_1/dense_block_1/use_batchnorm: True |-structured_data_block_1/dense_block_2/dropout_rate: 0.5 |-structured_data_block_1/dense_block_2/num_layers: 2 |-structured_data_block_1/dense_block_2/units_0: 256 |-structured_data_block_1/dense_block_2/units_1: 64 |-structured_data_block_1/dense_block_2/units_2: 1024 |-structured_data_block_1/dense_block_2/use_batchnorm: True

The best-performing model is then evaluated on the hold-out test dataset.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the model achieved a MAE of about 24.

MAE: 24.916

Next, the architecture of the best-performing model is reported.

We can see a model with two hidden layers with ReLU activation.

Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 1)] 0 _________________________________________________________________ categorical_encoding (Catego (None, 1) 0 _________________________________________________________________ dense (Dense) (None, 64) 128 _________________________________________________________________ re_lu (ReLU) (None, 64) 0 _________________________________________________________________ dense_1 (Dense) (None, 512) 33280 _________________________________________________________________ re_lu_1 (ReLU) (None, 512) 0 _________________________________________________________________ dense_2 (Dense) (None, 128) 65664 _________________________________________________________________ re_lu_2 (ReLU) (None, 128) 0 _________________________________________________________________ regression_head_1 (Dense) (None, 1) 129 ================================================================= Total params: 99,201 Trainable params: 99,201 Non-trainable params: 0 _________________________________________________________________

This section provides more resources on the topic if you are looking to go deeper.

- Automated machine learning, Wikipedia.
- Neural architecture search, Wikipedia.
- AutoKeras Homepage.
- AutoKeras GitHub Project.
- Auto-keras: An efficient neural architecture search system, 2019.
- Results for Standard Classification and Regression Machine Learning Datasets

In this tutorial, you discovered how to use AutoKeras to find good neural network models for classification and regression tasks.

Specifically, you learned:

- AutoKeras is an implementation of AutoML for deep learning that uses neural architecture search.
- How to use AutoKeras to find a top-performing model for a binary classification dataset.
- How to use AutoKeras to find a top-performing model for a regression dataset.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.

The post How to Use AutoKeras for Classification and Regression appeared first on Machine Learning Mastery.

]]>The post Multi-Label Classification with Deep Learning appeared first on Machine Learning Mastery.

]]>Unlike normal classification tasks where class labels are mutually exclusive, multi-label classification requires specialized machine learning algorithms that support predicting multiple mutually non-exclusive classes or “labels.”

Deep learning neural networks are an example of an algorithm that natively supports multi-label classification problems. Neural network models for multi-label classification tasks can be easily defined and evaluated using the Keras deep learning library.

In this tutorial, you will discover how to develop deep learning models for multi-label classification.

After completing this tutorial, you will know:

- Multi-label classification is a predictive modeling task that involves predicting zero or more mutually non-exclusive class labels.
- Neural network models can be configured for multi-label classification tasks.
- How to evaluate a neural network for multi-label classification and make a prediction for new data.

Let’s get started.

This tutorial is divided into three parts; they are:

- Multi-Label Classification
- Neural Networks for Multiple Labels
- Neural Network for Multi-Label Classification

Classification is a predictive modeling problem that involves outputting a class label given some input

It is different from regression tasks that involve predicting a numeric value.

Typically, a classification task involves predicting a single label. Alternately, it might involve predicting the likelihood across two or more class labels. In these cases, the classes are mutually exclusive, meaning the classification task assumes that the input belongs to one class only.

Some classification tasks require predicting more than one class label. This means that class labels or class membership are not mutually exclusive. These tasks are referred to as **multiple label classification**, or multi-label classification for short.

In multi-label classification, zero or more labels are required as output for each input sample, and the outputs are required simultaneously. The assumption is that the output labels are a function of the inputs.

We can create a synthetic multi-label classification dataset using the make_multilabel_classification() function in the scikit-learn library.

Our dataset will have 1,000 samples with 10 input features. The dataset will have three class label outputs for each sample and each class will have one or two values (0 or 1, e.g. present or not present).

The complete example of creating and summarizing the synthetic multi-label classification dataset is listed below.

# example of a multi-label classification task from sklearn.datasets import make_multilabel_classification # define dataset X, y = make_multilabel_classification(n_samples=1000, n_features=10, n_classes=3, n_labels=2, random_state=1) # summarize dataset shape print(X.shape, y.shape) # summarize first few examples for i in range(10): print(X[i], y[i])

Running the example creates the dataset and summarizes the shape of the input and output elements.

We can see that, as expected, there are 1,000 samples, each with 10 input features and three output features.

The first 10 rows of inputs and outputs are summarized and we can see that all inputs for this dataset are numeric and that output class labels have 0 or 1 values for each of the three class labels.

(1000, 10) (1000, 3) [ 3. 3. 6. 7. 8. 2. 11. 11. 1. 3.] [1 1 0] [7. 6. 4. 4. 6. 8. 3. 4. 6. 4.] [0 0 0] [ 5. 5. 13. 7. 6. 3. 6. 11. 4. 2.] [1 1 0] [1. 1. 5. 5. 7. 3. 4. 6. 4. 4.] [1 1 1] [ 4. 2. 3. 13. 7. 2. 4. 12. 1. 7.] [0 1 0] [ 4. 3. 3. 2. 5. 2. 3. 7. 2. 10.] [0 0 0] [ 3. 3. 3. 11. 6. 3. 4. 14. 1. 3.] [0 1 0] [ 2. 1. 7. 8. 4. 5. 10. 4. 6. 6.] [1 1 1] [ 5. 1. 9. 5. 3. 4. 11. 8. 1. 8.] [1 1 1] [ 2. 11. 7. 6. 2. 2. 9. 11. 9. 3.] [1 1 1]

Next, let’s look at how we can develop neural network models for multi-label classification tasks.

Some machine learning algorithms support multi-label classification natively.

Neural network models can be configured to support multi-label classification and can perform well, depending on the specifics of the classification task.

Multi-label classification can be supported directly by neural networks simply by specifying the number of target labels there is in the problem as the number of nodes in the output layer. For example, a task that has three output labels (classes) will require a neural network output layer with three nodes in the output layer.

Each node in the output layer must use the sigmoid activation. This will predict a probability of class membership for the label, a value between 0 and 1. Finally, the model must be fit with the binary cross-entropy loss function.

In summary, to configure a neural network model for multi-label classification, the specifics are:

- Number of nodes in the output layer matches the number of labels.
- Sigmoid activation for each node in the output layer.
- Binary cross-entropy loss function.

We can demonstrate this using the Keras deep learning library.

We will define a Multilayer Perceptron (MLP) model for the multi-label classification task defined in the previous section.

Each sample has 10 inputs and three outputs; therefore, the network requires an input layer that expects 10 inputs specified via the “*input_dim*” argument in the first hidden layer and three nodes in the output layer.

We will use the popular ReLU activation function in the hidden layer. The hidden layer has 20 nodes that were chosen after some trial and error. We will fit the model using binary cross-entropy loss and the Adam version of stochastic gradient descent.

The definition of the network for the multi-label classification task is listed below.

# define the model model = Sequential() model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu')) model.add(Dense(n_outputs, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam')

You may want to adapt this model for your own multi-label classification task; therefore, we can create a function to define and return the model where the number of input and output variables is provided as arguments.

# get the model def get_model(n_inputs, n_outputs): model = Sequential() model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu')) model.add(Dense(n_outputs, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam') return model

Now that we are familiar with how to define an MLP for multi-label classification, let’s explore how this model can be evaluated.

If the dataset is small, it is good practice to evaluate neural network models repeatedly on the same dataset and report the mean performance across the repeats.

This is because of the stochastic nature of the learning algorithm.

Additionally, it is good practice to use k-fold cross-validation instead of train/test splits of a dataset to get an unbiased estimate of model performance when making predictions on new data. Again, only if there is not too much data that the process can be completed in a reasonable time.

Taking this into account, we will evaluate the MLP model on the multi-output regression task using repeated k-fold cross-validation with 10 folds and three repeats.

The MLP model will predict the probability for each class label by default. This means it will predict three probabilities for each sample. These can be converted to crisp class labels by rounding the values to either 0 or 1. We can then calculate the classification accuracy for the crisp class labels.

... # make a prediction on the test set yhat = model.predict(X_test) # round probabilities to class labels yhat = yhat.round() # calculate accuracy acc = accuracy_score(y_test, yhat)

The scores are collected and can be summarized by reporting the mean and standard deviation across all repeats and cross-validation folds.

The *evaluate_model()* function below takes the dataset, evaluates the model, and returns a list of evaluation scores, in this case, accuracy scores.

# evaluate a model using repeated k-fold cross-validation def evaluate_model(X, y): results = list() n_inputs, n_outputs = X.shape[1], y.shape[1] # define evaluation procedure cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) # enumerate folds for train_ix, test_ix in cv.split(X): # prepare data X_train, X_test = X[train_ix], X[test_ix] y_train, y_test = y[train_ix], y[test_ix] # define model model = get_model(n_inputs, n_outputs) # fit model model.fit(X_train, y_train, verbose=0, epochs=100) # make a prediction on the test set yhat = model.predict(X_test) # round probabilities to class labels yhat = yhat.round() # calculate accuracy acc = accuracy_score(y_test, yhat) # store result print('>%.3f' % acc) results.append(acc) return results

We can then load our dataset and evaluate the model and report the mean performance.

Tying this together, the complete example is listed below.

# mlp for multi-label classification from numpy import mean from numpy import std from sklearn.datasets import make_multilabel_classification from sklearn.model_selection import RepeatedKFold from keras.models import Sequential from keras.layers import Dense from sklearn.metrics import accuracy_score # get the dataset def get_dataset(): X, y = make_multilabel_classification(n_samples=1000, n_features=10, n_classes=3, n_labels=2, random_state=1) return X, y # get the model def get_model(n_inputs, n_outputs): model = Sequential() model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu')) model.add(Dense(n_outputs, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam') return model # evaluate a model using repeated k-fold cross-validation def evaluate_model(X, y): results = list() n_inputs, n_outputs = X.shape[1], y.shape[1] # define evaluation procedure cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) # enumerate folds for train_ix, test_ix in cv.split(X): # prepare data X_train, X_test = X[train_ix], X[test_ix] y_train, y_test = y[train_ix], y[test_ix] # define model model = get_model(n_inputs, n_outputs) # fit model model.fit(X_train, y_train, verbose=0, epochs=100) # make a prediction on the test set yhat = model.predict(X_test) # round probabilities to class labels yhat = yhat.round() # calculate accuracy acc = accuracy_score(y_test, yhat) # store result print('>%.3f' % acc) results.append(acc) return results # load dataset X, y = get_dataset() # evaluate model results = evaluate_model(X, y) # summarize performance print('Accuracy: %.3f (%.3f)' % (mean(results), std(results)))

Running the example reports the classification accuracy for each fold and each repeat, to give an idea of the evaluation progress.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

At the end, the mean and standard deviation accuracy is reported. In this case, the model is shown to achieve an accuracy of about 81.2 percent.

You can use this code as a template for evaluating MLP models on your own multi-label classification tasks. The number of nodes and layers in the model can easily be adapted and tailored to the complexity of your dataset.

... >0.780 >0.820 >0.790 >0.810 >0.840 Accuracy: 0.812 (0.032)

Once a model configuration is chosen, we can use it to fit a final model on all available data and make a prediction for new data.

The example below demonstrates this by first fitting the MLP model on the entire multi-label classification dataset, then calling the *predict()* function on the saved model in order to make a prediction for a new row of data.

# use mlp for prediction on multi-label classification from numpy import asarray from sklearn.datasets import make_multilabel_classification from keras.models import Sequential from keras.layers import Dense # get the dataset def get_dataset(): X, y = make_multilabel_classification(n_samples=1000, n_features=10, n_classes=3, n_labels=2, random_state=1) return X, y # get the model def get_model(n_inputs, n_outputs): model = Sequential() model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu')) model.add(Dense(n_outputs, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam') return model # load dataset X, y = get_dataset() n_inputs, n_outputs = X.shape[1], y.shape[1] # get model model = get_model(n_inputs, n_outputs) # fit the model on all data model.fit(X, y, verbose=0, epochs=100) # make a prediction for new data row = [3, 3, 6, 7, 8, 2, 11, 11, 1, 3] newX = asarray([row]) yhat = model.predict(newX) print('Predicted: %s' % yhat[0])

Running the example fits the model and makes a prediction for a new row. As expected, the prediction contains three output variables required for the multi-label classification task: the probabilities of each class label.

Predicted: [0.9998627 0.9849341 0.00208042]

This section provides more resources on the topic if you are looking to go deeper.

- Multi-label classification, Wikipedia.
- sklearn.datasets.make_multilabel_classification API.
- Keras homepage.
- sklearn.model_selection.RepeatedStratifiedKFold API.

In this tutorial, you discovered how to develop deep learning models for multi-label classification.

Specifically, you learned:

- Multi-label classification is a predictive modeling task that involves predicting zero or more mutually non-exclusive class labels.
- Neural network models can be configured for multi-label classification tasks.
- How to evaluate a neural network for multi-label classification and make a prediction for new data.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.

The post Multi-Label Classification with Deep Learning appeared first on Machine Learning Mastery.

]]>The post Deep Learning Models for Multi-Output Regression appeared first on Machine Learning Mastery.

]]>Unlike normal regression where a single value is predicted for each sample, multi-output regression requires specialized machine learning algorithms that support outputting multiple variables for each prediction.

Deep learning neural networks are an example of an algorithm that natively supports multi-output regression problems. Neural network models for multi-output regression tasks can be easily defined and evaluated using the Keras deep learning library.

In this tutorial, you will discover how to develop deep learning models for multi-output regression.

After completing this tutorial, you will know:

- Multi-output regression is a predictive modeling task that involves two or more numerical output variables.
- Neural network models can be configured for multi-output regression tasks.
- How to evaluate a neural network for multi-output regression and make a prediction for new data.

Let’s get started.

This tutorial is divided into three parts; they are:

- Multi-Output Regression
- Neural Networks for Multi-Outputs
- Neural Network for Multi-Output Regression

Regression is a predictive modeling task that involves predicting a numerical output given some input.

It is different from classification tasks that involve predicting a class label.

Typically, a regression task involves predicting a single numeric value. Although, some tasks require predicting more than one numeric value. These tasks are referred to as **multiple-output regression**, or multi-output regression for short.

In multi-output regression, two or more outputs are required for each input sample, and the outputs are required simultaneously. The assumption is that the outputs are a function of the inputs.

We can create a synthetic multi-output regression dataset using the make_regression() function in the scikit-learn library.

Our dataset will have 1,000 samples with 10 input features, five of which will be relevant to the output and five of which will be redundant. The dataset will have three numeric outputs for each sample.

The complete example of creating and summarizing the synthetic multi-output regression dataset is listed below.

# example of a multi-output regression problem from sklearn.datasets import make_regression # create dataset X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=3, random_state=2) # summarize shape print(X.shape, y.shape)

Running the example creates the dataset and summarizes the shape of the input and output elements.

We can see that, as expected, there are 1,000 samples, each with 10 input features and three output features.

(1000, 10) (1000, 3)

Next, let’s look at how we can develop neural network models for multiple-output regression tasks.

Many machine learning algorithms support multi-output regression natively.

Popular examples are decision trees and ensembles of decision trees. A limitation of decision trees for multi-output regression is that the relationships between inputs and outputs can be blocky or highly structured based on the training data.

Neural network models also support multi-output regression and have the benefit of learning a continuous function that can model a more graceful relationship between changes in input and output.

Multi-output regression can be supported directly by neural networks simply by specifying the number of target variables there are in the problem as the number of nodes in the output layer. For example, a task that has three output variables will require a neural network output layer with three nodes in the output layer, each with the linear (default) activation function.

We can demonstrate this using the Keras deep learning library.

We will define a multilayer perceptron (MLP) model for the multi-output regression task defined in the previous section.

Each sample has 10 inputs and three outputs, therefore, the network requires an input layer that expects 10 inputs specified via the “*input_dim*” argument in the first hidden layer and three nodes in the output layer.

We will use the popular ReLU activation function in the hidden layer. The hidden layer has 20 nodes, which were chosen after some trial and error. We will fit the model using mean absolute error (MAE) loss and the Adam version of stochastic gradient descent.

The definition of the network for the multi-output regression task is listed below.

... # define the model model = Sequential() model.add(Dense(20, input_dim=10, kernel_initializer='he_uniform', activation='relu')) model.add(Dense(3)) model.compile(loss='mae', optimizer='adam')

You may want to adapt this model for your own multi-output regression task, therefore, we can create a function to define and return the model where the number of input and number of output variables are provided as arguments.

# get the model def get_model(n_inputs, n_outputs): model = Sequential() model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu')) model.add(Dense(n_outputs)) model.compile(loss='mae', optimizer='adam') return model

Now that we are familiar with how to define an MLP for multi-output regression, let’s explore how this model can be evaluated.

If the dataset is small, it is good practice to evaluate neural network models repeatedly on the same dataset and report the mean performance across the repeats.

This is because of the stochastic nature of the learning algorithm.

Additionally, it is good practice to use k-fold cross-validation instead of train/test splits of a dataset to get an unbiased estimate of model performance when making predictions on new data. Again, only if there is not too much data and the process can be completed in a reasonable time.

Taking this into account, we will evaluate the MLP model on the multi-output regression task using repeated k-fold cross-validation with 10 folds and three repeats.

Each fold the model is defined, fit, and evaluated. The scores are collected and can be summarized by reporting the mean and standard deviation.

The *evaluate_model()* function below takes the dataset, evaluates the model, and returns a list of evaluation scores, in this case, MAE scores.

# evaluate a model using repeated k-fold cross-validation def evaluate_model(X, y): results = list() n_inputs, n_outputs = X.shape[1], y.shape[1] # define evaluation procedure cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) # enumerate folds for train_ix, test_ix in cv.split(X): # prepare data X_train, X_test = X[train_ix], X[test_ix] y_train, y_test = y[train_ix], y[test_ix] # define model model = get_model(n_inputs, n_outputs) # fit model model.fit(X_train, y_train, verbose=0, epochs=100) # evaluate model on test set mae = model.evaluate(X_test, y_test, verbose=0) # store result print('>%.3f' % mae) results.append(mae) return results

We can then load our dataset and evaluate the model and report the mean performance.

Tying this together, the complete example is listed below.

# mlp for multi-output regression from numpy import mean from numpy import std from sklearn.datasets import make_regression from sklearn.model_selection import RepeatedKFold from keras.models import Sequential from keras.layers import Dense # get the dataset def get_dataset(): X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=3, random_state=2) return X, y # get the model def get_model(n_inputs, n_outputs): model = Sequential() model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu')) model.add(Dense(n_outputs)) model.compile(loss='mae', optimizer='adam') return model # evaluate a model using repeated k-fold cross-validation def evaluate_model(X, y): results = list() n_inputs, n_outputs = X.shape[1], y.shape[1] # define evaluation procedure cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) # enumerate folds for train_ix, test_ix in cv.split(X): # prepare data X_train, X_test = X[train_ix], X[test_ix] y_train, y_test = y[train_ix], y[test_ix] # define model model = get_model(n_inputs, n_outputs) # fit model model.fit(X_train, y_train, verbose=0, epochs=100) # evaluate model on test set mae = model.evaluate(X_test, y_test, verbose=0) # store result print('>%.3f' % mae) results.append(mae) return results # load dataset X, y = get_dataset() # evaluate model results = evaluate_model(X, y) # summarize performance print('MAE: %.3f (%.3f)' % (mean(results), std(results)))

Running the example reports the MAE for each fold and each repeat, to give an idea of the evaluation progress.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

At the end, the mean and standard deviation MAE is reported. In this case, the model is shown to achieve a MAE of about 8.184.

You can use this code as a template for evaluating MLP models on your own multi-output regression tasks. The number of nodes and layers in the model can easily be adapted and tailored to the complexity of your dataset.

... >8.054 >7.562 >9.026 >8.541 >6.744 MAE: 8.184 (1.032)

Once a model configuration is chosen, we can use it to fit a final model on all available data and make a prediction for new data.

The example below demonstrates this by first fitting the MLP model on the entire multi-output regression dataset, then calling the *predict()* function on the saved model in order to make a prediction for a new row of data.

# use mlp for prediction on multi-output regression from numpy import asarray from sklearn.datasets import make_regression from keras.models import Sequential from keras.layers import Dense # get the dataset def get_dataset(): X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=3, random_state=2) return X, y # get the model def get_model(n_inputs, n_outputs): model = Sequential() model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu')) model.add(Dense(n_outputs, kernel_initializer='he_uniform')) model.compile(loss='mae', optimizer='adam') return model # load dataset X, y = get_dataset() n_inputs, n_outputs = X.shape[1], y.shape[1] # get model model = get_model(n_inputs, n_outputs) # fit the model on all data model.fit(X, y, verbose=0, epochs=100) # make a prediction for new data row = [-0.99859353,2.19284309,-0.42632569,-0.21043258,-1.13655612,-0.55671602,-0.63169045,-0.87625098,-0.99445578,-0.3677487] newX = asarray([row]) yhat = model.predict(newX) print('Predicted: %s' % yhat[0])

Running the example fits the model and makes a prediction for a new row.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

As expected, the prediction contains three output variables required for the multi-output regression task.

Predicted: [-152.22713 -78.04891 -91.97194]

This section provides more resources on the topic if you are looking to go deeper.

In this tutorial, you discovered how to develop deep learning models for multi-output regression.

Specifically, you learned:

- Multi-output regression is a predictive modeling task that involves two or more numerical output variables.
- Neural network models can be configured for multi-output regression tasks.
- How to evaluate a neural network for multi-output regression and make a prediction for new data.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.

The post Deep Learning Models for Multi-Output Regression appeared first on Machine Learning Mastery.

]]>The post PyTorch Tutorial: How to Develop Deep Learning Models with Python appeared first on Machine Learning Mastery.

]]>PyTorch is the premier open-source deep learning framework developed and maintained by Facebook.

At its core, PyTorch is a mathematical library that allows you to perform efficient computation and automatic differentiation on graph-based models. Achieving this directly is challenging, although thankfully, the modern PyTorch API provides classes and idioms that allow you to easily develop a suite of deep learning models.

In this tutorial, you will discover a step-by-step guide to developing deep learning models in PyTorch.

After completing this tutorial, you will know:

- The difference between Torch and PyTorch and how to install and confirm PyTorch is working.
- The five-step life-cycle of PyTorch models and how to define, fit, and evaluate models.
- How to develop PyTorch deep learning models for regression, classification, and predictive modeling tasks.

Let’s get started.

The focus of this tutorial is on using the PyTorch API for common deep learning model development tasks; we will not be diving into the math and theory of deep learning. For that, I recommend starting with this excellent book.

The best way to learn deep learning in python is by doing. Dive in. You can circle back for more theory later.

I have designed each code example to use best practices and to be standalone so that you can copy and paste it directly into your project and adapt it to your specific needs. This will give you a massive head start over trying to figure out the API from official documentation alone.

It is a large tutorial, and as such, it is divided into three parts; they are:

- How to Install PyTorch
- What Are Torch and PyTorch?
- How to Install PyTorch
- How to Confirm PyTorch Is Installed

- PyTorch Deep Learning Model Life-Cycle
- Step 1: Prepare the Data
- Step 2: Define the Model
- Step 3: Train the Model
- Step 4: Evaluate the Model
- Step 5: Make Predictions

- How to Develop PyTorch Deep Learning Models
- How to Develop an MLP for Binary Classification
- How to Develop an MLP for Multiclass Classification
- How to Develop an MLP for Regression
- How to Develop a CNN for Image Classification

Work through this tutorial. It will take you 60 minutes, max!

**You do not need to understand everything (at least not right now)**. Your goal is to run through the tutorial end-to-end and get a result. You do not need to understand everything on the first pass. List down your questions as you go. Make heavy use of the API documentation to learn about all of the functions that you’re using.

**You do not need to know the math first**. Math is a compact way of describing how algorithms work, specifically tools from linear algebra, probability, and calculus. These are not the only tools that you can use to learn how algorithms work. You can also use code and explore algorithm behavior with different inputs and outputs. Knowing the math will not tell you what algorithm to choose or how to best configure it. You can only discover that through carefully controlled experiments.

**You do not need to know how the algorithms work**. It is important to know about the limitations and how to configure deep learning algorithms. But learning about algorithms can come later. You need to build up this algorithm knowledge slowly over a long period of time. Today, start by getting comfortable with the platform.

**You do not need to be a Python programmer**. The syntax of the Python language can be intuitive if you are new to it. Just like other languages, focus on function calls (e.g. function()) and assignments (e.g. a = “b”). This will get you most of the way. You are a developer; you know how to pick up the basics of a language really fast. Just get started and dive into the details later.

**You do not need to be a deep learning expert**. You can learn about the benefits and limitations of various algorithms later, and there are plenty of tutorials that you can read to brush up on the steps of a deep learning project.

In this section, you will discover what PyTorch is, how to install it, and how to confirm that it is installed correctly.

PyTorch is an open-source Python library for deep learning developed and maintained by Facebook.

The project started in 2016 and quickly became a popular framework among developers and researchers.

Torch (*Torch7*) is an open-source project for deep learning written in C and generally used via the Lua interface. It was a precursor project to PyTorch and is no longer actively developed. PyTorch includes “*Torch*” in the name, acknowledging the prior torch library with the “*Py*” prefix indicating the Python focus of the new project.

The PyTorch API is simple and flexible, making it a favorite for academics and researchers in the development of new deep learning models and applications. The extensive use has led to many extensions for specific applications (such as text, computer vision, and audio data), and may pre-trained models that can be used directly. As such, it may be the most popular library used by academics.

The flexibility of PyTorch comes at the cost of ease of use, especially for beginners, as compared to simpler interfaces like Keras. The choice to use PyTorch instead of Keras gives up some ease of use, a slightly steeper learning curve, and more code for more flexibility, and perhaps a more vibrant academic community.

Before installing PyTorch, ensure that you have Python installed, such as Python 3.6 or higher.

If you don’t have Python installed, you can install it using Anaconda. This tutorial will show you how:

There are many ways to install the PyTorch open-source deep learning library.

The most common, and perhaps simplest, way to install PyTorch on your workstation is by using pip.

For example, on the command line, you can type:

sudo pip install torch

Perhaps the most popular application of deep learning is for computer vision, and the PyTorch computer vision package is called “torchvision.”

Installing torchvision is also highly recommended and it can be installed as follows:

sudo pip install torchvision

If you prefer to use an installation method more specific to your platform or package manager, you can see a complete list of installation instructions here:

There is no need to set up the GPU now.

All examples in this tutorial will work just fine on a modern CPU. If you want to configure PyTorch for your GPU, you can do that after completing this tutorial. Don’t get distracted!

Once PyTorch is installed, it is important to confirm that the library was installed successfully and that you can start using it.

Don’t skip this step.

If PyTorch is not installed correctly or raises an error on this step, you won’t be able to run the examples later.

Create a new file called *versions.py* and copy and paste the following code into the file.

# check pytorch version import torch print(torch.__version__)

Save the file, then open your command line and change directory to where you saved the file.

Then type:

python versions.py

You should then see output like the following:

1.3.1

This confirms that PyTorch is installed correctly and that we are all using the same version.

This also shows you how to run a Python script from the command line. I recommend running all code from the command line in this manner, and not from a notebook or an IDE.

In this section, you will discover the life-cycle for a deep learning model and the PyTorch API that you can use to define models.

A model has a life-cycle, and this very simple knowledge provides the backbone for both modeling a dataset and understanding the PyTorch API.

The five steps in the life-cycle are as follows:

- 1. Prepare the Data.
- 2. Define the Model.
- 3. Train the Model.
- 4. Evaluate the Model.
- 5. Make Predictions.

Let’s take a closer look at each step in turn.

**Note**: There are many ways to achieve each of these steps using the PyTorch API, although I have aimed to show you the simplest, or most common, or most idiomatic.

If you discover a better approach, let me know in the comments below.

The first step is to load and prepare your data.

Neural network models require numerical input data and numerical output data.

You can use standard Python libraries to load and prepare tabular data, like CSV files. For example, Pandas can be used to load your CSV file, and tools from scikit-learn can be used to encode categorical data, such as class labels.

PyTorch provides the Dataset class that you can extend and customize to load your dataset.

For example, the constructor of your dataset object can load your data file (e.g. a CSV file). You can then override the *__len__()* function that can be used to get the length of the dataset (number of rows or samples), and the *__getitem__()* function that is used to get a specific sample by index.

When loading your dataset, you can also perform any required transforms, such as scaling or encoding.

A skeleton of a custom *Dataset* class is provided below.

# dataset definition class CSVDataset(Dataset): # load the dataset def __init__(self, path): # store the inputs and outputs self.X = ... self.y = ... # number of rows in the dataset def __len__(self): return len(self.X) # get a row at an index def __getitem__(self, idx): return [self.X[idx], self.y[idx]]

Once loaded, PyTorch provides the DataLoader class to navigate a *Dataset* instance during the training and evaluation of your model.

A *DataLoader* instance can be created for the training dataset, test dataset, and even a validation dataset.

The random_split() function can be used to split a dataset into train and test sets. Once split, a selection of rows from the *Dataset* can be provided to a DataLoader, along with the batch size and whether the data should be shuffled every epoch.

For example, we can define a *DataLoader* by passing in a selected sample of rows in the dataset.

... # create the dataset dataset = CSVDataset(...) # select rows from the dataset train, test = random_split(dataset, [[...], [...]]) # create a data loader for train and test sets train_dl = DataLoader(train, batch_size=32, shuffle=True) test_dl = DataLoader(test, batch_size=1024, shuffle=False)

Once defined, a *DataLoader* can be enumerated, yielding one batch worth of samples each iteration.

... # train the model for i, (inputs, targets) in enumerate(train_dl): ...

The next step is to define a model.

The idiom for defining a model in PyTorch involves defining a class that extends the Module class.

The constructor of your class defines the layers of the model and the forward() function is the override that defines how to forward propagate input through the defined layers of the model.

Many layers are available, such as Linear for fully connected layers, Conv2d for convolutional layers, and MaxPool2d for pooling layers.

Activation functions can also be defined as layers, such as ReLU, Softmax, and Sigmoid.

Below is an example of a simple MLP model with one layer.

# model definition class MLP(Module): # define model elements def __init__(self, n_inputs): super(MLP, self).__init__() self.layer = Linear(n_inputs, 1) self.activation = Sigmoid() # forward propagate input def forward(self, X): X = self.layer(X) X = self.activation(X) return X

The weights of a given layer can also be initialized after the layer is defined in the constructor.

Common examples include the Xavier and He weight initialization schemes. For example:

... xavier_uniform_(self.layer.weight)

The training process requires that you define a loss function and an optimization algorithm.

Common loss functions include the following:

- BCELoss: Binary cross-entropy loss for binary classification.
- CrossEntropyLoss: Categorical cross-entropy loss for multi-class classification.
- MSELoss: Mean squared loss for regression.

For more on loss functions generally, see the tutorial:

Stochastic gradient descent is used for optimization, and the standard algorithm is provided by the SGD class, although other versions of the algorithm are available, such as Adam.

# define the optimization criterion = MSELoss() optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)

Training the model involves enumerating the *DataLoader* for the training dataset.

First, a loop is required for the number of training epochs. Then an inner loop is required for the mini-batches for stochastic gradient descent.

... # enumerate epochs for epoch in range(100): # enumerate mini batches for i, (inputs, targets) in enumerate(train_dl): ...

Each update to the model involves the same general pattern comprised of:

- Clearing the last error gradient.
- A forward pass of the input through the model.
- Calculating the loss for the model output.
- Backpropagating the error through the model.
- Update the model in an effort to reduce loss.

For example:

... # clear the gradients optimizer.zero_grad() # compute the model output yhat = model(inputs) # calculate loss loss = criterion(yhat, targets) # credit assignment loss.backward() # update model weights optimizer.step()

Once the model is fit, it can be evaluated on the test dataset.

This can be achieved by using the *DataLoader* for the test dataset and collecting the predictions for the test set, then comparing the predictions to the expected values of the test set and calculating a performance metric.

... for i, (inputs, targets) in enumerate(test_dl): # evaluate the model on the test set yhat = model(inputs) ...

A fit model can be used to make a prediction on new data.

For example, you might have a single image or a single row of data and want to make a prediction.

This requires that you wrap the data in a PyTorch Tensor data structure.

A Tensor is just the PyTorch version of a NumPy array for holding data. It also allows you to perform the automatic differentiation tasks in the model graph, like calling *backward()* when training the model.

The prediction too will be a Tensor, although you can retrieve the NumPy array by detaching the Tensor from the automatic differentiation graph and calling the NumPy function.

... # convert row to data row = Variable(Tensor([row]).float()) # make prediction yhat = model(row) # retrieve numpy array yhat = yhat.detach().numpy()

Now that we are familiar with the PyTorch API at a high-level and the model life-cycle, let’s look at how we can develop some standard deep learning models from scratch.

In this section, you will discover how to develop, evaluate, and make predictions with standard deep learning models, including Multilayer Perceptrons (MLP) and Convolutional Neural Networks (CNN).

A Multilayer Perceptron model, or MLP for short, is a standard fully connected neural network model.

It is comprised of layers of nodes where each node is connected to all outputs from the previous layer and the output of each node is connected to all inputs for nodes in the next layer.

An MLP is a model with one or more fully connected layers. This model is appropriate for tabular data, that is data as it looks in a table or spreadsheet with one column for each variable and one row for each variable. There are three predictive modeling problems you may want to explore with an MLP; they are binary classification, multiclass classification, and regression.

Let’s fit a model on a real dataset for each of these cases.

**Note**: The models in this section are effective, but not optimized. See if you can improve their performance. Post your findings in the comments below.

We will use the Ionosphere binary (two class) classification dataset to demonstrate an MLP for binary classification.

This dataset involves predicting whether there is a structure in the atmosphere or not given radar returns.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

We will use a LabelEncoder to encode the string labels to integer values 0 and 1. The model will be fit on 67 percent of the data, and the remaining 33 percent will be used for evaluation, split using the train_test_split() function.

It is a good practice to use ‘*relu*‘ activation with a ‘*He Uniform*‘ weight initialization. This combination goes a long way to overcome the problem of vanishing gradients when training deep neural network models. For more on ReLU, see the tutorial:

The model predicts the probability of class 1 and uses the sigmoid activation function. The model is optimized using stochastic gradient descent and seeks to minimize the binary cross-entropy loss.

The complete example is listed below.

# pytorch mlp for binary classification from numpy import vstack from pandas import read_csv from sklearn.preprocessing import LabelEncoder from sklearn.metrics import accuracy_score from torch.utils.data import Dataset from torch.utils.data import DataLoader from torch.utils.data import random_split from torch import Tensor from torch.nn import Linear from torch.nn import ReLU from torch.nn import Sigmoid from torch.nn import Module from torch.optim import SGD from torch.nn import BCELoss from torch.nn.init import kaiming_uniform_ from torch.nn.init import xavier_uniform_ # dataset definition class CSVDataset(Dataset): # load the dataset def __init__(self, path): # load the csv file as a dataframe df = read_csv(path, header=None) # store the inputs and outputs self.X = df.values[:, :-1] self.y = df.values[:, -1] # ensure input data is floats self.X = self.X.astype('float32') # label encode target and ensure the values are floats self.y = LabelEncoder().fit_transform(self.y) self.y = self.y.astype('float32') self.y = self.y.reshape((len(self.y), 1)) # number of rows in the dataset def __len__(self): return len(self.X) # get a row at an index def __getitem__(self, idx): return [self.X[idx], self.y[idx]] # get indexes for train and test rows def get_splits(self, n_test=0.33): # determine sizes test_size = round(n_test * len(self.X)) train_size = len(self.X) - test_size # calculate the split return random_split(self, [train_size, test_size]) # model definition class MLP(Module): # define model elements def __init__(self, n_inputs): super(MLP, self).__init__() # input to first hidden layer self.hidden1 = Linear(n_inputs, 10) kaiming_uniform_(self.hidden1.weight, nonlinearity='relu') self.act1 = ReLU() # second hidden layer self.hidden2 = Linear(10, 8) kaiming_uniform_(self.hidden2.weight, nonlinearity='relu') self.act2 = ReLU() # third hidden layer and output self.hidden3 = Linear(8, 1) xavier_uniform_(self.hidden3.weight) self.act3 = Sigmoid() # forward propagate input def forward(self, X): # input to first hidden layer X = self.hidden1(X) X = self.act1(X) # second hidden layer X = self.hidden2(X) X = self.act2(X) # third hidden layer and output X = self.hidden3(X) X = self.act3(X) return X # prepare the dataset def prepare_data(path): # load the dataset dataset = CSVDataset(path) # calculate split train, test = dataset.get_splits() # prepare data loaders train_dl = DataLoader(train, batch_size=32, shuffle=True) test_dl = DataLoader(test, batch_size=1024, shuffle=False) return train_dl, test_dl # train the model def train_model(train_dl, model): # define the optimization criterion = BCELoss() optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9) # enumerate epochs for epoch in range(100): # enumerate mini batches for i, (inputs, targets) in enumerate(train_dl): # clear the gradients optimizer.zero_grad() # compute the model output yhat = model(inputs) # calculate loss loss = criterion(yhat, targets) # credit assignment loss.backward() # update model weights optimizer.step() # evaluate the model def evaluate_model(test_dl, model): predictions, actuals = list(), list() for i, (inputs, targets) in enumerate(test_dl): # evaluate the model on the test set yhat = model(inputs) # retrieve numpy array yhat = yhat.detach().numpy() actual = targets.numpy() actual = actual.reshape((len(actual), 1)) # round to class values yhat = yhat.round() # store predictions.append(yhat) actuals.append(actual) predictions, actuals = vstack(predictions), vstack(actuals) # calculate accuracy acc = accuracy_score(actuals, predictions) return acc # make a class prediction for one row of data def predict(row, model): # convert row to data row = Tensor([row]) # make prediction yhat = model(row) # retrieve numpy array yhat = yhat.detach().numpy() return yhat # prepare the data path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv' train_dl, test_dl = prepare_data(path) print(len(train_dl.dataset), len(test_dl.dataset)) # define the network model = MLP(34) # train the model train_model(train_dl, model) # evaluate the model acc = evaluate_model(test_dl, model) print('Accuracy: %.3f' % acc) # make a single prediction (expect class=1) row = [1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1,0.03760,0.85243,-0.17755,0.59755,-0.44945,0.60536,-0.38223,0.84356,-0.38542,0.58212,-0.32192,0.56971,-0.29674,0.36946,-0.47357,0.56811,-0.51171,0.41078,-0.46168,0.21266,-0.34090,0.42267,-0.54487,0.18641,-0.45300] yhat = predict(row, model) print('Predicted: %.3f (class=%d)' % (yhat, yhat.round()))

Running the example first reports the shape of the train and test datasets, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

**What result did you get?**

**Can you change the model to do better?**

Post your findings to the comments below.

In this case, we can see that the model achieved a classification accuracy of about 94 percent and then predicted a probability of 0.99 that the one row of data belong to class 1.

235 116 Accuracy: 0.948 Predicted: 0.998 (class=1)

We will use the Iris flowers multiclass classification dataset to demonstrate an MLP for multiclass classification.

This problem involves predicting the species of iris flower given measures of the flower.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

Given that it is a multiclass classification, the model must have one node for each class in the output layer and use the softmax activation function. The loss function is the cross entropy, which is appropriate for integer encoded class labels (e.g. 0 for one class, 1 for the next class, etc.).

The complete example of fitting and evaluating an MLP on the iris flowers dataset is listed below.

# pytorch mlp for multiclass classification from numpy import vstack from numpy import argmax from pandas import read_csv from sklearn.preprocessing import LabelEncoder from sklearn.metrics import accuracy_score from torch import Tensor from torch.utils.data import Dataset from torch.utils.data import DataLoader from torch.utils.data import random_split from torch.nn import Linear from torch.nn import ReLU from torch.nn import Softmax from torch.nn import Module from torch.optim import SGD from torch.nn import CrossEntropyLoss from torch.nn.init import kaiming_uniform_ from torch.nn.init import xavier_uniform_ # dataset definition class CSVDataset(Dataset): # load the dataset def __init__(self, path): # load the csv file as a dataframe df = read_csv(path, header=None) # store the inputs and outputs self.X = df.values[:, :-1] self.y = df.values[:, -1] # ensure input data is floats self.X = self.X.astype('float32') # label encode target and ensure the values are floats self.y = LabelEncoder().fit_transform(self.y) # number of rows in the dataset def __len__(self): return len(self.X) # get a row at an index def __getitem__(self, idx): return [self.X[idx], self.y[idx]] # get indexes for train and test rows def get_splits(self, n_test=0.33): # determine sizes test_size = round(n_test * len(self.X)) train_size = len(self.X) - test_size # calculate the split return random_split(self, [train_size, test_size]) # model definition class MLP(Module): # define model elements def __init__(self, n_inputs): super(MLP, self).__init__() # input to first hidden layer self.hidden1 = Linear(n_inputs, 10) kaiming_uniform_(self.hidden1.weight, nonlinearity='relu') self.act1 = ReLU() # second hidden layer self.hidden2 = Linear(10, 8) kaiming_uniform_(self.hidden2.weight, nonlinearity='relu') self.act2 = ReLU() # third hidden layer and output self.hidden3 = Linear(8, 3) xavier_uniform_(self.hidden3.weight) self.act3 = Softmax(dim=1) # forward propagate input def forward(self, X): # input to first hidden layer X = self.hidden1(X) X = self.act1(X) # second hidden layer X = self.hidden2(X) X = self.act2(X) # output layer X = self.hidden3(X) X = self.act3(X) return X # prepare the dataset def prepare_data(path): # load the dataset dataset = CSVDataset(path) # calculate split train, test = dataset.get_splits() # prepare data loaders train_dl = DataLoader(train, batch_size=32, shuffle=True) test_dl = DataLoader(test, batch_size=1024, shuffle=False) return train_dl, test_dl # train the model def train_model(train_dl, model): # define the optimization criterion = CrossEntropyLoss() optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9) # enumerate epochs for epoch in range(500): # enumerate mini batches for i, (inputs, targets) in enumerate(train_dl): # clear the gradients optimizer.zero_grad() # compute the model output yhat = model(inputs) # calculate loss loss = criterion(yhat, targets) # credit assignment loss.backward() # update model weights optimizer.step() # evaluate the model def evaluate_model(test_dl, model): predictions, actuals = list(), list() for i, (inputs, targets) in enumerate(test_dl): # evaluate the model on the test set yhat = model(inputs) # retrieve numpy array yhat = yhat.detach().numpy() actual = targets.numpy() # convert to class labels yhat = argmax(yhat, axis=1) # reshape for stacking actual = actual.reshape((len(actual), 1)) yhat = yhat.reshape((len(yhat), 1)) # store predictions.append(yhat) actuals.append(actual) predictions, actuals = vstack(predictions), vstack(actuals) # calculate accuracy acc = accuracy_score(actuals, predictions) return acc # make a class prediction for one row of data def predict(row, model): # convert row to data row = Tensor([row]) # make prediction yhat = model(row) # retrieve numpy array yhat = yhat.detach().numpy() return yhat # prepare the data path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv' train_dl, test_dl = prepare_data(path) print(len(train_dl.dataset), len(test_dl.dataset)) # define the network model = MLP(4) # train the model train_model(train_dl, model) # evaluate the model acc = evaluate_model(test_dl, model) print('Accuracy: %.3f' % acc) # make a single prediction row = [5.1,3.5,1.4,0.2] yhat = predict(row, model) print('Predicted: %s (class=%d)' % (yhat, argmax(yhat)))

Running the example first reports the shape of the train and test datasets, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

**What result did you get?
Can you change the model to do better?**

Post your findings to the comments below.

In this case, we can see that the model achieved a classification accuracy of about 98 percent and then predicted a probability of a row of data belonging to each class, although class 0 has the highest probability.

100 50 Accuracy: 0.980 Predicted: [[9.5524162e-01 4.4516966e-02 2.4138369e-04]] (class=0)

We will use the Boston housing regression dataset to demonstrate an MLP for regression predictive modeling.

This problem involves predicting house value based on properties of the house and neighborhood.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

This is a regression problem that involves predicting a single numeric value. As such, the output layer has a single node and uses the default or linear activation function (no activation function). The mean squared error (mse) loss is minimized when fitting the model.

Recall that this is regression, not classification; therefore, we cannot calculate classification accuracy. For more on this, see the tutorial:

The complete example of fitting and evaluating an MLP on the Boston housing dataset is listed below.

# pytorch mlp for regression from numpy import vstack from numpy import sqrt from pandas import read_csv from sklearn.metrics import mean_squared_error from torch.utils.data import Dataset from torch.utils.data import DataLoader from torch.utils.data import random_split from torch import Tensor from torch.nn import Linear from torch.nn import Sigmoid from torch.nn import Module from torch.optim import SGD from torch.nn import MSELoss from torch.nn.init import xavier_uniform_ # dataset definition class CSVDataset(Dataset): # load the dataset def __init__(self, path): # load the csv file as a dataframe df = read_csv(path, header=None) # store the inputs and outputs self.X = df.values[:, :-1].astype('float32') self.y = df.values[:, -1].astype('float32') # ensure target has the right shape self.y = self.y.reshape((len(self.y), 1)) # number of rows in the dataset def __len__(self): return len(self.X) # get a row at an index def __getitem__(self, idx): return [self.X[idx], self.y[idx]] # get indexes for train and test rows def get_splits(self, n_test=0.33): # determine sizes test_size = round(n_test * len(self.X)) train_size = len(self.X) - test_size # calculate the split return random_split(self, [train_size, test_size]) # model definition class MLP(Module): # define model elements def __init__(self, n_inputs): super(MLP, self).__init__() # input to first hidden layer self.hidden1 = Linear(n_inputs, 10) xavier_uniform_(self.hidden1.weight) self.act1 = Sigmoid() # second hidden layer self.hidden2 = Linear(10, 8) xavier_uniform_(self.hidden2.weight) self.act2 = Sigmoid() # third hidden layer and output self.hidden3 = Linear(8, 1) xavier_uniform_(self.hidden3.weight) # forward propagate input def forward(self, X): # input to first hidden layer X = self.hidden1(X) X = self.act1(X) # second hidden layer X = self.hidden2(X) X = self.act2(X) # third hidden layer and output X = self.hidden3(X) return X # prepare the dataset def prepare_data(path): # load the dataset dataset = CSVDataset(path) # calculate split train, test = dataset.get_splits() # prepare data loaders train_dl = DataLoader(train, batch_size=32, shuffle=True) test_dl = DataLoader(test, batch_size=1024, shuffle=False) return train_dl, test_dl # train the model def train_model(train_dl, model): # define the optimization criterion = MSELoss() optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9) # enumerate epochs for epoch in range(100): # enumerate mini batches for i, (inputs, targets) in enumerate(train_dl): # clear the gradients optimizer.zero_grad() # compute the model output yhat = model(inputs) # calculate loss loss = criterion(yhat, targets) # credit assignment loss.backward() # update model weights optimizer.step() # evaluate the model def evaluate_model(test_dl, model): predictions, actuals = list(), list() for i, (inputs, targets) in enumerate(test_dl): # evaluate the model on the test set yhat = model(inputs) # retrieve numpy array yhat = yhat.detach().numpy() actual = targets.numpy() actual = actual.reshape((len(actual), 1)) # store predictions.append(yhat) actuals.append(actual) predictions, actuals = vstack(predictions), vstack(actuals) # calculate mse mse = mean_squared_error(actuals, predictions) return mse # make a class prediction for one row of data def predict(row, model): # convert row to data row = Tensor([row]) # make prediction yhat = model(row) # retrieve numpy array yhat = yhat.detach().numpy() return yhat # prepare the data path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv' train_dl, test_dl = prepare_data(path) print(len(train_dl.dataset), len(test_dl.dataset)) # define the network model = MLP(13) # train the model train_model(train_dl, model) # evaluate the model mse = evaluate_model(test_dl, model) print('MSE: %.3f, RMSE: %.3f' % (mse, sqrt(mse))) # make a single prediction (expect class=1) row = [0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98] yhat = predict(row, model) print('Predicted: %.3f' % yhat)

Running the example first reports the shape of the train and test datasets, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

**What result did you get?
Can you change the model to do better?**

Post your findings to the comments below.

In this case, we can see that the model achieved a MSE of about 82, which is an RMSE of about nine (units are thousands of dollars). A value of 21 is then predicted for the single example.

339 167 MSE: 82.576, RMSE: 9.087 Predicted: 21.909

Convolutional Neural Networks, or CNNs for short, are a type of network designed for image input.

They are comprised of models with convolutional layers that extract features (called feature maps) and pooling layers that distill features down to the most salient elements.

CNNs are best suited to image classification tasks, although they can be used on a wide array of tasks that take images as input.

A popular image classification task is the MNIST handwritten digit classification. It involves tens of thousands of handwritten digits that must be classified as a number between 0 and 9.

The torchvision API provides a convenience function to download and load this dataset directly.

The example below loads the dataset and plots the first few images.

# load mnist dataset in pytorch from torch.utils.data import DataLoader from torchvision.datasets import MNIST from torchvision.transforms import Compose from torchvision.transforms import ToTensor from matplotlib import pyplot # define location to save or load the dataset path = '~/.torch/datasets/mnist' # define the transforms to apply to the data trans = Compose([ToTensor()]) # download and define the datasets train = MNIST(path, train=True, download=True, transform=trans) test = MNIST(path, train=False, download=True, transform=trans) # define how to enumerate the datasets train_dl = DataLoader(train, batch_size=32, shuffle=True) test_dl = DataLoader(test, batch_size=32, shuffle=True) # get one batch of images i, (inputs, targets) = next(enumerate(train_dl)) # plot some images for i in range(25): # define subplot pyplot.subplot(5, 5, i+1) # plot raw pixel data pyplot.imshow(inputs[i][0], cmap='gray') # show the figure pyplot.show()

Running the example loads the MNIST dataset, then summarizes the default train and test datasets.

Train: X=(60000, 28, 28), y=(60000,) Test: X=(10000, 28, 28), y=(10000,)

A plot is then created showing a grid of examples of handwritten images in the training dataset.

We can train a CNN model to classify the images in the MNIST dataset.

Note that the images are arrays of grayscale pixel data, therefore, we must add a channel dimension to the data before we can use the images as input to the model.

It is a good idea to scale the pixel values from the default range of 0-255 to have a zero mean and a standard deviation of 1. For more on scaling pixel values, see the tutorial:

The complete example of fitting and evaluating a CNN model on the MNIST dataset is listed below.

# pytorch cnn for multiclass classification from numpy import vstack from numpy import argmax from pandas import read_csv from sklearn.metrics import accuracy_score from torchvision.datasets import MNIST from torchvision.transforms import Compose from torchvision.transforms import ToTensor from torchvision.transforms import Normalize from torch.utils.data import DataLoader from torch.nn import Conv2d from torch.nn import MaxPool2d from torch.nn import Linear from torch.nn import ReLU from torch.nn import Softmax from torch.nn import Module from torch.optim import SGD from torch.nn import CrossEntropyLoss from torch.nn.init import kaiming_uniform_ from torch.nn.init import xavier_uniform_ # model definition class CNN(Module): # define model elements def __init__(self, n_channels): super(CNN, self).__init__() # input to first hidden layer self.hidden1 = Conv2d(n_channels, 32, (3,3)) kaiming_uniform_(self.hidden1.weight, nonlinearity='relu') self.act1 = ReLU() # first pooling layer self.pool1 = MaxPool2d((2,2), stride=(2,2)) # second hidden layer self.hidden2 = Conv2d(32, 32, (3,3)) kaiming_uniform_(self.hidden2.weight, nonlinearity='relu') self.act2 = ReLU() # second pooling layer self.pool2 = MaxPool2d((2,2), stride=(2,2)) # fully connected layer self.hidden3 = Linear(5*5*32, 100) kaiming_uniform_(self.hidden3.weight, nonlinearity='relu') self.act3 = ReLU() # output layer self.hidden4 = Linear(100, 10) xavier_uniform_(self.hidden4.weight) self.act4 = Softmax(dim=1) # forward propagate input def forward(self, X): # input to first hidden layer X = self.hidden1(X) X = self.act1(X) X = self.pool1(X) # second hidden layer X = self.hidden2(X) X = self.act2(X) X = self.pool2(X) # flatten X = X.view(-1, 4*4*50) # third hidden layer X = self.hidden3(X) X = self.act3(X) # output layer X = self.hidden4(X) X = self.act4(X) return X # prepare the dataset def prepare_data(path): # define standardization trans = Compose([ToTensor(), Normalize((0.1307,), (0.3081,))]) # load dataset train = MNIST(path, train=True, download=True, transform=trans) test = MNIST(path, train=False, download=True, transform=trans) # prepare data loaders train_dl = DataLoader(train, batch_size=64, shuffle=True) test_dl = DataLoader(test, batch_size=1024, shuffle=False) return train_dl, test_dl # train the model def train_model(train_dl, model): # define the optimization criterion = CrossEntropyLoss() optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9) # enumerate epochs for epoch in range(10): # enumerate mini batches for i, (inputs, targets) in enumerate(train_dl): # clear the gradients optimizer.zero_grad() # compute the model output yhat = model(inputs) # calculate loss loss = criterion(yhat, targets) # credit assignment loss.backward() # update model weights optimizer.step() # evaluate the model def evaluate_model(test_dl, model): predictions, actuals = list(), list() for i, (inputs, targets) in enumerate(test_dl): # evaluate the model on the test set yhat = model(inputs) # retrieve numpy array yhat = yhat.detach().numpy() actual = targets.numpy() # convert to class labels yhat = argmax(yhat, axis=1) # reshape for stacking actual = actual.reshape((len(actual), 1)) yhat = yhat.reshape((len(yhat), 1)) # store predictions.append(yhat) actuals.append(actual) predictions, actuals = vstack(predictions), vstack(actuals) # calculate accuracy acc = accuracy_score(actuals, predictions) return acc # prepare the data path = '~/.torch/datasets/mnist' train_dl, test_dl = prepare_data(path) print(len(train_dl.dataset), len(test_dl.dataset)) # define the network model = CNN(1) # # train the model train_model(train_dl, model) # evaluate the model acc = evaluate_model(test_dl, model) print('Accuracy: %.3f' % acc)

Running the example first reports the shape of the train and test datasets, then fits the model and evaluates it on the test dataset.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

**What result did you get?
Can you change the model to do better?**

Post your findings to the comments below.

In this case, we can see that the model achieved a classification accuracy of about 98 percent on the test dataset. We can then see that the model predicted class 5 for the first image in the training set.

60000 10000 Accuracy: 0.985

This section provides more resources on the topic if you are looking to go deeper.

- Deep Learning, 2016.
- Programming PyTorch for Deep Learning: Creating and Deploying Deep Learning Applications, 2018.
- Deep Learning with PyTorch, 2020.
- Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD, 2020.

- PyTorch Homepage.
- PyTorch Documentation
- PyTorch Installation Guide
- PyTorch, Wikipedia.
- PyTorch on GitHub.

In this tutorial, you discovered a step-by-step guide to developing deep learning models in PyTorch.

Specifically, you learned:

- The difference between Torch and PyTorch and how to install and confirm PyTorch is working.
- The five-step life-cycle of PyTorch models and how to define, fit, and evaluate models.
- How to develop PyTorch deep learning models for regression, classification, and predictive modeling tasks.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post PyTorch Tutorial: How to Develop Deep Learning Models with Python appeared first on Machine Learning Mastery.

]]>The post Neural Networks are Function Approximation Algorithms appeared first on Machine Learning Mastery.

]]>Given a dataset comprised of inputs and outputs, we assume that there is an unknown underlying function that is consistent in mapping inputs to outputs in the target domain and resulted in the dataset. We then use supervised learning algorithms to approximate this function.

Neural networks are an example of a supervised machine learning algorithm that is perhaps best understood in the context of function approximation. This can be demonstrated with examples of neural networks approximating simple one-dimensional functions that aid in developing the intuition for what is being learned by the model.

In this tutorial, you will discover the intuition behind neural networks as function approximation algorithms.

After completing this tutorial, you will know:

- Training a neural network on data approximates the unknown underlying mapping function from inputs to outputs.
- One dimensional input and output datasets provide a useful basis for developing the intuitions for function approximation.
- How to develop and evaluate a small neural network for function approximation.

**Kick-start your project** with my new book Deep Learning With Python, including *step-by-step tutorials* and the *Python source code* files for all examples.

Let’s get started.

This tutorial is divided into three parts; they are:

- What Is Function Approximation
- Definition of a Simple Function
- Approximating a Simple Function

Function approximation is a technique for estimating an unknown underlying function using historical or available observations from the domain.

Artificial neural networks learn to approximate a function.

In supervised learning, a dataset is comprised of inputs and outputs, and the supervised learning algorithm learns how to best map examples of inputs to examples of outputs.

We can think of this mapping as being governed by a mathematical function, called the **mapping function**, and it is this function that a supervised learning algorithm seeks to best approximate.

Neural networks are an example of a supervised learning algorithm and seek to approximate the function represented by your data. This is achieved by calculating the error between the predicted outputs and the expected outputs and minimizing this error during the training process.

It is best to think of feedforward networks as function approximation machines that are designed to achieve statistical generalization, occasionally drawing some insights from what we know about the brain, rather than as models of brain function.

— Page 169, Deep Learning, 2016.

We say “*approximate*” because although we suspect such a mapping function exists, we don’t know anything about it.

The true function that maps inputs to outputs is unknown and is often referred to as the **target function**. It is the target of the learning process, the function we are trying to approximate using only the data that is available. If we knew the target function, we would not need to approximate it, i.e. we would not need a supervised machine learning algorithm. Therefore, function approximation is only a useful tool when the underlying target mapping function is unknown.

All we have are observations from the domain that contain examples of inputs and outputs. This implies things about the size and quality of the data; for example:

- The more examples we have, the more we might be able to figure out about the mapping function.
- The less noise we have in observations, the more crisp approximation we can make of the mapping function.

So why do we like using neural networks for function approximation?

The reason is that they are a **universal approximator**. In theory, they can be used to approximate any function.

… the universal approximation theorem states that a feedforward network with a linear output layer and at least one hidden layer with any “squashing” activation function (such as the logistic sigmoid activation function) can approximate any […] function from one finite-dimensional space to another with any desired non-zero amount of error, provided that the network is given enough hidden units

— Page 198, Deep Learning, 2016.

Regression predictive modeling involves predicting a numerical quantity given inputs. Classification predictive modeling involves predicting a class label given inputs.

Both of these predictive modeling problems can be seen as examples of function approximation.

To make this concrete, we can review a worked example.

In the next section, let’s define a simple function that we can later approximate.

We can define a simple function with one numerical input variable and one numerical output variable and use this as the basis for understanding neural networks for function approximation.

We can define a domain of numbers as our input, such as floating-point values from -50 to 50.

We can then select a mathematical operation to apply to the inputs to get the output values. The selected mathematical operation will be the mapping function, and because we are choosing it, we will know what it is. In practice, this is not the case and is the reason why we would use a supervised learning algorithm like a neural network to learn or discover the mapping function.

In this case, we will use the square of the input as the mapping function, defined as:

- y = x^2

Where *y* is the output variable and *x* is the input variable.

We can develop an intuition for this mapping function by enumerating the values in the range of our input variable and calculating the output value for each input and plotting the result.

The example below implements this in Python.

# example of creating a univariate dataset with a given mapping function from matplotlib import pyplot # define the input data x = [i for i in range(-50,51)] # define the output data y = [i**2.0 for i in x] # plot the input versus the output pyplot.scatter(x,y) pyplot.title('Input (x) versus Output (y)') pyplot.xlabel('Input Variable (x)') pyplot.ylabel('Output Variable (y)') pyplot.show()

Running the example first creates a list of integer values across the entire input domain.

The output values are then calculated using the mapping function, then a plot is created with the input values on the x-axis and the output values on the y-axis.

The input and output variables represent our dataset.

Next, we can then pretend to forget that we know what the mapping function is and use a neural network to re-learn or re-discover the mapping function.

We can fit a neural network model on examples of inputs and outputs and see if the model can learn the mapping function.

This is a very simple mapping function, so we would expect a small neural network could learn it quickly.

We will define the network using the Keras deep learning library and use some data preparation tools from the scikit-learn library.

First, let’s define the dataset.

... # define the dataset x = asarray([i for i in range(-50,51)]) y = asarray([i**2.0 for i in x]) print(x.min(), x.max(), y.min(), y.max())

Next, we can reshape the data so that the input and output variables are columns with one observation per row, as is expected when using supervised learning models.

... # reshape arrays into into rows and cols x = x.reshape((len(x), 1)) y = y.reshape((len(y), 1))

Next, we will need to scale the inputs and the outputs.

The inputs will have a range between -50 and 50, whereas the outputs will have a range between -50^2 (2500) and 0^2 (0). Large input and output values can make training neural networks unstable, therefore, it is a good idea to scale data first.

We can use the MinMaxScaler to separately normalize the input values and the output values to values in the range between 0 and 1.

... # separately scale the input and output variables scale_x = MinMaxScaler() x = scale_x.fit_transform(x) scale_y = MinMaxScaler() y = scale_y.fit_transform(y) print(x.min(), x.max(), y.min(), y.max())

We can now define a neural network model.

With some trial and error, I chose a model with two hidden layers and 10 nodes in each layer. Perhaps experiment with other configurations to see if you can do better.

... # design the neural network model model = Sequential() model.add(Dense(10, input_dim=1, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(10, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1))

We will fit the model using a mean squared loss and use the efficient adam version of stochastic gradient descent to optimize the model.

This means the model will seek to minimize the mean squared error between the predictions made and the expected output values (*y*) while it tries to approximate the mapping function.

... # define the loss function and optimization algorithm model.compile(loss='mse', optimizer='adam')

We don’t have a lot of data (e.g. about 100 rows), so we will fit the model for 500 epochs and use a small batch size of 10.

Again, these values were found after a little trial and error; try different values and see if you can do better.

... # ft the model on the training dataset model.fit(x, y, epochs=500, batch_size=10, verbose=0)

Once fit, we can evaluate the model.

We will make a prediction for each example in the dataset and calculate the error. A perfect approximation would be 0.0. This is not possible in general because of noise in the observations, incomplete data, and complexity of the unknown underlying mapping function.

In this case, it is possible because we have all observations, there is no noise in the data, and the underlying function is not complex.

First, we can make the prediction.

... # make predictions for the input data yhat = model.predict(x)

We then must invert the scaling that we performed.

This is so the error is reported in the original units of the target variable.

... # inverse transforms x_plot = scale_x.inverse_transform(x) y_plot = scale_y.inverse_transform(y) yhat_plot = scale_y.inverse_transform(yhat)

We can then calculate and report the prediction error in the original units of the target variable.

... # report model error print('MSE: %.3f' % mean_squared_error(y_plot, yhat_plot))

Finally, we can create a scatter plot of the real mapping of inputs to outputs and compare it to the mapping of inputs to the predicted outputs and see what the approximation of the mapping function looks like spatially.

This is helpful for developing the intuition behind what neural networks are learning.

... # plot x vs yhat pyplot.scatter(x_plot,yhat_plot, label='Predicted') pyplot.title('Input (x) versus Output (y)') pyplot.xlabel('Input Variable (x)') pyplot.ylabel('Output Variable (y)') pyplot.legend() pyplot.show()

Tying this together, the complete example is listed below.

# example of fitting a neural net on x vs x^2 from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error from keras.models import Sequential from keras.layers import Dense from numpy import asarray from matplotlib import pyplot # define the dataset x = asarray([i for i in range(-50,51)]) y = asarray([i**2.0 for i in x]) print(x.min(), x.max(), y.min(), y.max()) # reshape arrays into into rows and cols x = x.reshape((len(x), 1)) y = y.reshape((len(y), 1)) # separately scale the input and output variables scale_x = MinMaxScaler() x = scale_x.fit_transform(x) scale_y = MinMaxScaler() y = scale_y.fit_transform(y) print(x.min(), x.max(), y.min(), y.max()) # design the neural network model model = Sequential() model.add(Dense(10, input_dim=1, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(10, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1)) # define the loss function and optimization algorithm model.compile(loss='mse', optimizer='adam') # ft the model on the training dataset model.fit(x, y, epochs=500, batch_size=10, verbose=0) # make predictions for the input data yhat = model.predict(x) # inverse transforms x_plot = scale_x.inverse_transform(x) y_plot = scale_y.inverse_transform(y) yhat_plot = scale_y.inverse_transform(yhat) # report model error print('MSE: %.3f' % mean_squared_error(y_plot, yhat_plot)) # plot x vs y pyplot.scatter(x_plot,y_plot, label='Actual') # plot x vs yhat pyplot.scatter(x_plot,yhat_plot, label='Predicted') pyplot.title('Input (x) versus Output (y)') pyplot.xlabel('Input Variable (x)') pyplot.ylabel('Output Variable (y)') pyplot.legend() pyplot.show()

Running the example first reports the range of values for the input and output variables, then the range of the same variables after scaling. This confirms that the scaling operation was performed as we expected.

The model is then fit and evaluated on the dataset.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the mean squared error is about 1,300, in squared units. If we calculate the square root, this gives us the root mean squared error (RMSE) in the original units. We can see that the average error is about 36 units, which is fine, but not great.

**What results did you get?** Can you do better?

Let me know in the comments below.

-50 50 0.0 2500.0 0.0 1.0 0.0 1.0 MSE: 1300.776

A scatter plot is then created comparing the inputs versus the real outputs, and the inputs versus the predicted outputs.

The difference between these two data series is the error in the approximation of the mapping function. We can see that the approximation is reasonable; it captures the general shape. We can see that there are errors, especially around the 0 input values.

This suggests that there is plenty of room for improvement, such as using a different activation function or different network architecture to better approximate the mapping function.

This section provides more resources on the topic if you are looking to go deeper.

- Deep Learning, 2016.

In this tutorial, you discovered the intuition behind neural networks as function approximation algorithms.

Specifically, you learned:

- Training a neural network on data approximates the unknown underlying mapping function from inputs to outputs.
- One dimensional input and output datasets provide a useful basis for developing the intuitions for function approximation.
- How to develop and evaluate a small neural network for function approximation.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post Neural Networks are Function Approximation Algorithms appeared first on Machine Learning Mastery.

]]>The post TensorFlow 2 Tutorial: Get Started in Deep Learning With tf.keras appeared first on Machine Learning Mastery.

]]>TensorFlow is the premier open-source deep learning framework developed and maintained by Google. Although using TensorFlow directly can be challenging, the modern tf.keras API beings the simplicity and ease of use of Keras to the TensorFlow project.

Using tf.keras allows you to design, fit, evaluate, and use deep learning models to make predictions in just a few lines of code. It makes common deep learning tasks, such as classification and regression predictive modeling, accessible to average developers looking to get things done.

In this tutorial, you will discover a step-by-step guide to developing deep learning models in TensorFlow using the tf.keras API.

After completing this tutorial, you will know:

- The difference between Keras and tf.keras and how to install and confirm TensorFlow is working.
- The 5-step life-cycle of tf.keras models and how to use the sequential and functional APIs.
- How to develop MLP, CNN, and RNN models with tf.keras for regression, classification, and time series forecasting.
- How to use the advanced features of the tf.keras API to inspect and diagnose your model.
- How to improve the performance of your tf.keras model by reducing overfitting and accelerating training.

This is a large tutorial, and a lot of fun. You might want to bookmark it.

The examples are small and focused; you can finish this tutorial in about 60 minutes.

**Kick-start your project** with my new book Deep Learning With Python, including *step-by-step tutorials* and the *Python source code* files for all examples.

Let’s get started.

**Update Jun/2020**: Updated for changes to the API in TensorFlow 2.2.0.

This tutorial is designed to be your complete introduction to tf.keras for your deep learning project.

The focus is on using the API for common deep learning model development tasks; we will not be diving into the math and theory of deep learning. For that, I recommend starting with this excellent book.

The best way to learn deep learning in python is by doing. Dive in. You can circle back for more theory later.

I have designed each code example to use best practices and to be standalone so that you can copy and paste it directly into your project and adapt it to your specific needs. This will give you a massive head start over trying to figure out the API from official documentation alone.

It is a large tutorial and as such, it is divided into five parts; they are:

- Install TensorFlow and tf.keras
- What Are Keras and tf.keras?
- How to Install TensorFlow
- How to Confirm TensorFlow Is Installed

- Deep Learning Model Life-Cycle
- The 5-Step Model Life-Cycle
- Sequential Model API (Simple)
- Functional Model API (Advanced)

- How to Develop Deep Learning Models
- Develop Multilayer Perceptron Models
- Develop Convolutional Neural Network Models
- Develop Recurrent Neural Network Models

- How to Use Advanced Model Features
- How to Visualize a Deep Learning Model
- How to Plot Model Learning Curves
- How to Save and Load Your Model

- How to Get Better Model Performance
- How to Reduce Overfitting With Dropout
- How to Accelerate Training With Batch Normalization
- How to Halt Training at the Right Time With Early Stopping

Work through the tutorial at your own pace.

**You do not need to understand everything (at least not right now)**. Your goal is to run through the tutorial end-to-end and get results. You do not need to understand everything on the first pass. List down your questions as you go. Make heavy use of the API documentation to learn about all of the functions that you’re using.

**You do not need to know the math first**. Math is a compact way of describing how algorithms work, specifically tools from linear algebra, probability, and statistics. These are not the only tools that you can use to learn how algorithms work. You can also use code and explore algorithm behavior with different inputs and outputs. Knowing the math will not tell you what algorithm to choose or how to best configure it. You can only discover that through careful, controlled experiments.

**You do not need to know how the algorithms work**. It is important to know about the limitations and how to configure deep learning algorithms. But learning about algorithms can come later. You need to build up this algorithm knowledge slowly over a long period of time. Today, start by getting comfortable with the platform.

**You do not need to be a Python programmer**. The syntax of the Python language can be intuitive if you are new to it. Just like other languages, focus on function calls (e.g. function()) and assignments (e.g. a = “b”). This will get you most of the way. You are a developer, so you know how to pick up the basics of a language really fast. Just get started and dive into the details later.

**You do not need to be a deep learning expert**. You can learn about the benefits and limitations of various algorithms later, and there are plenty of posts that you can read later to brush up on the steps of a deep learning project and the importance of evaluating model skill using cross-validation.

In this section, you will discover what tf.keras is, how to install it, and how to confirm that it is installed correctly.

Keras is an open-source deep learning library written in Python.

The project was started in 2015 by Francois Chollet. It quickly became a popular framework for developers, becoming one of, if not the most, popular deep learning libraries.

During the period of 2015-2019, developing deep learning models using mathematical libraries like TensorFlow, Theano, and PyTorch was cumbersome, requiring tens or even hundreds of lines of code to achieve the simplest tasks. The focus of these libraries was on research, flexibility, and speed, not ease of use.

Keras was popular because the API was clean and simple, allowing standard deep learning models to be defined, fit, and evaluated in just a few lines of code.

A secondary reason Keras took-off was because it allowed you to use any one among the range of popular deep learning mathematical libraries as the backend (e.g. used to perform the computation), such as TensorFlow, Theano, and later, CNTK. This allowed the power of these libraries to be harnessed (e.g. GPUs) with a very clean and simple interface.

In 2019, Google released a new version of their TensorFlow deep learning library (TensorFlow 2) that integrated the Keras API directly and promoted this interface as the default or standard interface for deep learning development on the platform.

This integration is commonly referred to as the *tf.keras* interface or API (“*tf*” is short for “*TensorFlow*“). This is to distinguish it from the so-called standalone Keras open source project.

**Standalone Keras**. The standalone open source project that supports TensorFlow, Theano and CNTK backends.**tf.keras**. The Keras API integrated into TensorFlow 2.

The Keras API implementation in Keras is referred to as “*tf.keras*” because this is the Python idiom used when referencing the API. First, the TensorFlow module is imported and named “*tf*“; then, Keras API elements are accessed via calls to *tf.keras*; for example:

# example of tf.keras python idiom import tensorflow as tf # use keras API model = tf.keras.Sequential() ...

I generally don’t use this idiom myself; I don’t think it reads cleanly.

Given that TensorFlow was the de facto standard backend for the Keras open source project, the integration means that a single library can now be used instead of two separate libraries. Further, the standalone Keras project now recommends all future Keras development use the *tf.keras* API.

At this time, we recommend that Keras users who use multi-backend Keras with the TensorFlow backend switch to tf.keras in TensorFlow 2.0. tf.keras is better maintained and has better integration with TensorFlow features (eager execution, distribution support and other).

— Keras Project Homepage, Accessed December 2019.

Before installing TensorFlow, ensure that you have Python installed, such as Python 3.6 or higher.

If you don’t have Python installed, you can install it using Anaconda. This tutorial will show you how:

There are many ways to install the TensorFlow open-source deep learning library.

The most common, and perhaps the simplest, way to install TensorFlow on your workstation is by using *pip*.

For example, on the command line, you can type:

sudo pip install tensorflow

If you prefer to use an installation method more specific to your platform or package manager, you can see a complete list of installation instructions here:

There is no need to set up the GPU now.

All examples in this tutorial will work just fine on a modern CPU. If you want to configure TensorFlow for your GPU, you can do that after completing this tutorial. Don’t get distracted!

Once TensorFlow is installed, it is important to confirm that the library was installed successfully and that you can start using it.

*Don’t skip this step*.

If TensorFlow is not installed correctly or raises an error on this step, you won’t be able to run the examples later.

Create a new file called *versions.py* and copy and paste the following code into the file.

# check version import tensorflow print(tensorflow.__version__)

Save the file, then open your command line and change directory to where you saved the file.

Then type:

python versions.py

You should then see output like the following:

2.2.0

This confirms that TensorFlow is installed correctly and that we are all using the same version.

**What version did you get? **

Post your output in the comments below.

This also shows you how to run a Python script from the command line. I recommend running all code from the command line in this manner, and not from a notebook or an IDE.

Sometimes when you use the *tf.keras* API, you may see warnings printed.

This might include messages that your hardware supports features that your TensorFlow installation was not configured to use.

Some examples on my workstation include:

Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA XLA service 0x7fde3f2e6180 executing computations on platform Host. Devices: StreamExecutor device (0): Host, Default Version

They are not your fault. **You did nothing wrong**.

These are information messages and they will not prevent the execution of your code. You can safely ignore messages of this type for now.

It’s an intentional design decision made by the TensorFlow team to show these warning messages. A downside of this decision is that it confuses beginners and it trains developers to ignore all messages, including those that potentially may impact the execution.

Now that you know what tf.keras is, how to install TensorFlow, and how to confirm your development environment is working, let’s look at the life-cycle of deep learning models in TensorFlow.

In this section, you will discover the life-cycle for a deep learning model and the two tf.keras APIs that you can use to define models.

A model has a life-cycle, and this very simple knowledge provides the backbone for both modeling a dataset and understanding the tf.keras API.

The five steps in the life-cycle are as follows:

- Define the model.
- Compile the model.
- Fit the model.
- Evaluate the model.
- Make predictions.

Let’s take a closer look at each step in turn.

Defining the model requires that you first select the type of model that you need and then choose the architecture or network topology.

From an API perspective, this involves defining the layers of the model, configuring each layer with a number of nodes and activation function, and connecting the layers together into a cohesive model.

Models can be defined either with the Sequential API or the Functional API, and we will take a look at this in the next section.

... # define the model model = ...

Compiling the model requires that you first select a loss function that you want to optimize, such as mean squared error or cross-entropy.

It also requires that you select an algorithm to perform the optimization procedure, typically stochastic gradient descent, or a modern variation, such as Adam. It may also require that you select any performance metrics to keep track of during the model training process.

From an API perspective, this involves calling a function to compile the model with the chosen configuration, which will prepare the appropriate data structures required for the efficient use of the model you have defined.

The optimizer can be specified as a string for a known optimizer class, e.g. ‘*sgd*‘ for stochastic gradient descent, or you can configure an instance of an optimizer class and use that.

For a list of supported optimizers, see this:

... # compile the model opt = SGD(learning_rate=0.01, momentum=0.9) model.compile(optimizer=opt, loss='binary_crossentropy')

The three most common loss functions are:

- ‘
*binary_crossentropy*‘ for binary classification. - ‘
*sparse_categorical_crossentropy*‘ for multi-class classification. - ‘
*mse*‘ (mean squared error) for regression.

... # compile the model model.compile(optimizer='sgd', loss='mse')

For a list of supported loss functions, see:

Metrics are defined as a list of strings for known metric functions or a list of functions to call to evaluate predictions.

For a list of supported metrics, see:

... # compile the model model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])

Fitting the model requires that you first select the training configuration, such as the number of epochs (loops through the training dataset) and the batch size (number of samples in an epoch used to estimate model error).

Training applies the chosen optimization algorithm to minimize the chosen loss function and updates the model using the backpropagation of error algorithm.

Fitting the model is the slow part of the whole process and can take seconds to hours to days, depending on the complexity of the model, the hardware you’re using, and the size of the training dataset.

From an API perspective, this involves calling a function to perform the training process. This function will block (not return) until the training process has finished.

... # fit the model model.fit(X, y, epochs=100, batch_size=32)

For help on how to choose the batch size, see this tutorial:

While fitting the model, a progress bar will summarize the status of each epoch and the overall training process. This can be simplified to a simple report of model performance each epoch by setting the “*verbose*” argument to 2. All output can be turned off during training by setting “*verbose*” to 0.

... # fit the model model.fit(X, y, epochs=100, batch_size=32, verbose=0)

Evaluating the model requires that you first choose a holdout dataset used to evaluate the model. This should be data not used in the training process so that we can get an unbiased estimate of the performance of the model when making predictions on new data.

The speed of model evaluation is proportional to the amount of data you want to use for the evaluation, although it is much faster than training as the model is not changed.

From an API perspective, this involves calling a function with the holdout dataset and getting a loss and perhaps other metrics that can be reported.

... # evaluate the model loss = model.evaluate(X, y, verbose=0)

Making a prediction is the final step in the life-cycle. It is why we wanted the model in the first place.

It requires you have new data for which a prediction is required, e.g. where you do not have the target values.

From an API perspective, you simply call a function to make a prediction of a class label, probability, or numerical value: whatever you designed your model to predict.

You may want to save the model and later load it to make predictions. You may also choose to fit a model on all of the available data before you start using it.

Now that we are familiar with the model life-cycle, let’s take a look at the two main ways to use the tf.keras API to build models: sequential and functional.

... # make a prediction yhat = model.predict(X)

The sequential model API is the simplest and is the API that I recommend, especially when getting started.

It is referred to as “*sequential*” because it involves defining a Sequential class and adding layers to the model one by one in a linear manner, from input to output.

The example below defines a Sequential MLP model that accepts eight inputs, has one hidden layer with 10 nodes and then an output layer with one node to predict a numerical value.

# example of a model defined with the sequential api from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense # define the model model = Sequential() model.add(Dense(10, input_shape=(8,))) model.add(Dense(1))

Note that the visible layer of the network is defined by the “*input_shape*” argument on the first hidden layer. That means in the above example, the model expects the input for one sample to be a vector of eight numbers.

The sequential API is easy to use because you keep calling *model.add()* until you have added all of your layers.

For example, here is a deep MLP with five hidden layers.

# example of a model defined with the sequential api from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense # define the model model = Sequential() model.add(Dense(100, input_shape=(8,))) model.add(Dense(80)) model.add(Dense(30)) model.add(Dense(10)) model.add(Dense(5)) model.add(Dense(1))

The functional API is more complex but is also more flexible.

It involves explicitly connecting the output of one layer to the input of another layer. Each connection is specified.

First, an input layer must be defined via the *Input* class, and the shape of an input sample is specified. We must retain a reference to the input layer when defining the model.

... # define the layers x_in = Input(shape=(8,))

Next, a fully connected layer can be connected to the input by calling the layer and passing the input layer. This will return a reference to the output connection in this new layer.

... x = Dense(10)(x_in)

We can then connect this to an output layer in the same manner.

... x_out = Dense(1)(x)

Once connected, we define a Model object and specify the input and output layers. The complete example is listed below.

# example of a model defined with the functional api from tensorflow.keras import Model from tensorflow.keras import Input from tensorflow.keras.layers import Dense # define the layers x_in = Input(shape=(8,)) x = Dense(10)(x_in) x_out = Dense(1)(x) # define the model model = Model(inputs=x_in, outputs=x_out)

As such, it allows for more complicated model designs, such as models that may have multiple input paths (separate vectors) and models that have multiple output paths (e.g. a word and a number).

The functional API can be a lot of fun when you get used to it.

For more on the functional API, see:

Now that we are familiar with the model life-cycle and the two APIs that can be used to define models, let’s look at developing some standard models.

In this section, you will discover how to develop, evaluate, and make predictions with standard deep learning models, including Multilayer Perceptrons (MLP), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs).

A Multilayer Perceptron model, or MLP for short, is a standard fully connected neural network model.

It is comprised of layers of nodes where each node is connected to all outputs from the previous layer and the output of each node is connected to all inputs for nodes in the next layer.

An MLP is created by with one or more *Dense* layers. This model is appropriate for tabular data, that is data as it looks in a table or spreadsheet with one column for each variable and one row for each variable. There are three predictive modeling problems you may want to explore with an MLP; they are binary classification, multiclass classification, and regression.

Let’s fit a model on a real dataset for each of these cases.

Note, the models in this section are effective, but not optimized. See if you can improve their performance. Post your findings in the comments below.

We will use the Ionosphere binary (two-class) classification dataset to demonstrate an MLP for binary classification.

This dataset involves predicting whether a structure is in the atmosphere or not given radar returns.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

We will use a LabelEncoder to encode the string labels to integer values 0 and 1. The model will be fit on 67 percent of the data, and the remaining 33 percent will be used for evaluation, split using the train_test_split() function.

It is a good practice to use ‘*relu*‘ activation with a ‘*he_normal*‘ weight initialization. This combination goes a long way to overcome the problem of vanishing gradients when training deep neural network models. For more on ReLU, see the tutorial:

The model predicts the probability of class 1 and uses the sigmoid activation function. The model is optimized using the adam version of stochastic gradient descent and seeks to minimize the cross-entropy loss.

The complete example is listed below.

# mlp for binary classification from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv' df = read_csv(path, header=None) # split into input and output columns X, y = df.values[:, :-1], df.values[:, -1] # ensure all data are floating point values X = X.astype('float32') # encode strings to integer y = LabelEncoder().fit_transform(y) # split into train and test datasets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) # determine the number of input features n_features = X_train.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(8, activation='relu', kernel_initializer='he_normal')) model.add(Dense(1, activation='sigmoid')) # compile the model model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # fit the model model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0) # evaluate the model loss, acc = model.evaluate(X_test, y_test, verbose=0) print('Test Accuracy: %.3f' % acc) # make a prediction row = [1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1,0.03760,0.85243,-0.17755,0.59755,-0.44945,0.60536,-0.38223,0.84356,-0.38542,0.58212,-0.32192,0.56971,-0.29674,0.36946,-0.47357,0.56811,-0.51171,0.41078,-0.46168,0.21266,-0.34090,0.42267,-0.54487,0.18641,-0.45300] yhat = model.predict([row]) print('Predicted: %.3f' % yhat)

Running the example first reports the shape of the dataset, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

**What results did you get?** Can you change the model to do better?

Post your findings to the comments below.

In this case, we can see that the model achieved a classification accuracy of about 94 percent and then predicted a probability of 0.9 that the one row of data belongs to class 1.

(235, 34) (116, 34) (235,) (116,) Test Accuracy: 0.940 Predicted: 0.991

We will use the Iris flowers multiclass classification dataset to demonstrate an MLP for multiclass classification.

This problem involves predicting the species of iris flower given measures of the flower.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

Given that it is a multiclass classification, the model must have one node for each class in the output layer and use the softmax activation function. The loss function is the ‘*sparse_categorical_crossentropy*‘, which is appropriate for integer encoded class labels (e.g. 0 for one class, 1 for the next class, etc.)

The complete example of fitting and evaluating an MLP on the iris flowers dataset is listed below.

# mlp for multiclass classification from numpy import argmax from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv' df = read_csv(path, header=None) # split into input and output columns X, y = df.values[:, :-1], df.values[:, -1] # ensure all data are floating point values X = X.astype('float32') # encode strings to integer y = LabelEncoder().fit_transform(y) # split into train and test datasets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) # determine the number of input features n_features = X_train.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(8, activation='relu', kernel_initializer='he_normal')) model.add(Dense(3, activation='softmax')) # compile the model model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # fit the model model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0) # evaluate the model loss, acc = model.evaluate(X_test, y_test, verbose=0) print('Test Accuracy: %.3f' % acc) # make a prediction row = [5.1,3.5,1.4,0.2] yhat = model.predict([row]) print('Predicted: %s (class=%d)' % (yhat, argmax(yhat)))

Running the example first reports the shape of the dataset, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

**What results did you get?** Can you change the model to do better?

Post your findings to the comments below.

In this case, we can see that the model achieved a classification accuracy of about 98 percent and then predicted a probability of a row of data belonging to each class, although class 0 has the highest probability.

(100, 4) (50, 4) (100,) (50,) Test Accuracy: 0.980 Predicted: [[0.8680804 0.12356871 0.00835086]] (class=0)

We will use the Boston housing regression dataset to demonstrate an MLP for regression predictive modeling.

This problem involves predicting house value based on properties of the house and neighborhood.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

This is a regression problem that involves predicting a single numerical value. As such, the output layer has a single node and uses the default or linear activation function (no activation function). The mean squared error (mse) loss is minimized when fitting the model.

Recall that this is a regression, not classification; therefore, we cannot calculate classification accuracy. For more on this, see the tutorial:

The complete example of fitting and evaluating an MLP on the Boston housing dataset is listed below.

# mlp for regression from numpy import sqrt from pandas import read_csv from sklearn.model_selection import train_test_split from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv' df = read_csv(path, header=None) # split into input and output columns X, y = df.values[:, :-1], df.values[:, -1] # split into train and test datasets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) # determine the number of input features n_features = X_train.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(8, activation='relu', kernel_initializer='he_normal')) model.add(Dense(1)) # compile the model model.compile(optimizer='adam', loss='mse') # fit the model model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0) # evaluate the model error = model.evaluate(X_test, y_test, verbose=0) print('MSE: %.3f, RMSE: %.3f' % (error, sqrt(error))) # make a prediction row = [0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98] yhat = model.predict([row]) print('Predicted: %.3f' % yhat)

Running the example first reports the shape of the dataset then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

**What results did you get?** Can you change the model to do better?

Post your findings to the comments below.

In this case, we can see that the model achieved an MSE of about 60 which is an RMSE of about 7 (units are thousands of dollars). A value of about 26 is then predicted for the single example.

(339, 13) (167, 13) (339,) (167,) MSE: 60.751, RMSE: 7.794 Predicted: 26.983

Convolutional Neural Networks, or CNNs for short, are a type of network designed for image input.

They are comprised of models with convolutional layers that extract features (called feature maps) and pooling layers that distill features down to the most salient elements.

CNNs are most well-suited to image classification tasks, although they can be used on a wide array of tasks that take images as input.

A popular image classification task is the MNIST handwritten digit classification. It involves tens of thousands of handwritten digits that must be classified as a number between 0 and 9.

The tf.keras API provides a convenience function to download and load this dataset directly.

The example below loads the dataset and plots the first few images.

# example of loading and plotting the mnist dataset from tensorflow.keras.datasets.mnist import load_data from matplotlib import pyplot # load dataset (trainX, trainy), (testX, testy) = load_data() # summarize loaded dataset print('Train: X=%s, y=%s' % (trainX.shape, trainy.shape)) print('Test: X=%s, y=%s' % (testX.shape, testy.shape)) # plot first few images for i in range(25): # define subplot pyplot.subplot(5, 5, i+1) # plot raw pixel data pyplot.imshow(trainX[i], cmap=pyplot.get_cmap('gray')) # show the figure pyplot.show()

Running the example loads the MNIST dataset, then summarizes the default train and test datasets.

Train: X=(60000, 28, 28), y=(60000,) Test: X=(10000, 28, 28), y=(10000,)

A plot is then created showing a grid of examples of handwritten images in the training dataset.

We can train a CNN model to classify the images in the MNIST dataset.

Note that the images are arrays of grayscale pixel data; therefore, we must add a channel dimension to the data before we can use the images as input to the model. The reason is that CNN models expect images in a channels-last format, that is each example to the network has the dimensions [rows, columns, channels], where channels represent the color channels of the image data.

It is also a good idea to scale the pixel values from the default range of 0-255 to 0-1 when training a CNN. For more on scaling pixel values, see the tutorial:

The complete example of fitting and evaluating a CNN model on the MNIST dataset is listed below.

# example of a cnn for image classification from numpy import asarray from numpy import unique from numpy import argmax from tensorflow.keras.datasets.mnist import load_data from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.layers import Conv2D from tensorflow.keras.layers import MaxPool2D from tensorflow.keras.layers import Flatten from tensorflow.keras.layers import Dropout # load dataset (x_train, y_train), (x_test, y_test) = load_data() # reshape data to have a single channel x_train = x_train.reshape((x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)) x_test = x_test.reshape((x_test.shape[0], x_test.shape[1], x_test.shape[2], 1)) # determine the shape of the input images in_shape = x_train.shape[1:] # determine the number of classes n_classes = len(unique(y_train)) print(in_shape, n_classes) # normalize pixel values x_train = x_train.astype('float32') / 255.0 x_test = x_test.astype('float32') / 255.0 # define model model = Sequential() model.add(Conv2D(32, (3,3), activation='relu', kernel_initializer='he_uniform', input_shape=in_shape)) model.add(MaxPool2D((2, 2))) model.add(Flatten()) model.add(Dense(100, activation='relu', kernel_initializer='he_uniform')) model.add(Dropout(0.5)) model.add(Dense(n_classes, activation='softmax')) # define loss and optimizer model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # fit the model model.fit(x_train, y_train, epochs=10, batch_size=128, verbose=0) # evaluate the model loss, acc = model.evaluate(x_test, y_test, verbose=0) print('Accuracy: %.3f' % acc) # make a prediction image = x_train[0] yhat = model.predict(asarray([image])) print('Predicted: class=%d' % argmax(yhat))

Running the example first reports the shape of the dataset, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single image.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

**What results did you get?** Can you change the model to do better?

Post your findings to the comments below.

First, the shape of each image is reported along with the number of classes; we can see that each image is 28×28 pixels and there are 10 classes as we expected.

In this case, we can see that the model achieved a classification accuracy of about 98 percent on the test dataset. We can then see that the model predicted class 5 for the first image in the training set.

(28, 28, 1) 10 Accuracy: 0.987 Predicted: class=5

Recurrent Neural Networks, or RNNs for short, are designed to operate upon sequences of data.

They have proven to be very effective for natural language processing problems where sequences of text are provided as input to the model. RNNs have also seen some modest success for time series forecasting and speech recognition.

The most popular type of RNN is the Long Short-Term Memory network, or LSTM for short. LSTMs can be used in a model to accept a sequence of input data and make a prediction, such as assign a class label or predict a numerical value like the next value or values in the sequence.

We will use the car sales dataset to demonstrate an LSTM RNN for univariate time series forecasting.

This problem involves predicting the number of car sales per month.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

We will frame the problem to take a window of the last five months of data to predict the current month’s data.

To achieve this, we will define a new function named *split_sequence()* that will split the input sequence into windows of data appropriate for fitting a supervised learning model, like an LSTM.

For example, if the sequence was:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Then the samples for training the model will look like:

Input Output 1, 2, 3, 4, 5 6 2, 3, 4, 5, 6 7 3, 4, 5, 6, 7 8 ...

We will use the last 12 months of data as the test dataset.

LSTMs expect each sample in the dataset to have two dimensions; the first is the number of time steps (in this case it is 5), and the second is the number of observations per time step (in this case it is 1).

Because it is a regression type problem, we will use a linear activation function (no activation

function) in the output layer and optimize the mean squared error loss function. We will also evaluate the model using the mean absolute error (MAE) metric.

The complete example of fitting and evaluating an LSTM for a univariate time series forecasting problem is listed below.

# lstm for time series forecasting from numpy import sqrt from numpy import asarray from pandas import read_csv from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.layers import LSTM # split a univariate sequence into samples def split_sequence(sequence, n_steps): X, y = list(), list() for i in range(len(sequence)): # find the end of this pattern end_ix = i + n_steps # check if we are beyond the sequence if end_ix > len(sequence)-1: break # gather input and output parts of the pattern seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] X.append(seq_x) y.append(seq_y) return asarray(X), asarray(y) # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv' df = read_csv(path, header=0, index_col=0, squeeze=True) # retrieve the values values = df.values.astype('float32') # specify the window size n_steps = 5 # split into samples X, y = split_sequence(values, n_steps) # reshape into [samples, timesteps, features] X = X.reshape((X.shape[0], X.shape[1], 1)) # split into train/test n_test = 12 X_train, X_test, y_train, y_test = X[:-n_test], X[-n_test:], y[:-n_test], y[-n_test:] print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) # define model model = Sequential() model.add(LSTM(100, activation='relu', kernel_initializer='he_normal', input_shape=(n_steps,1))) model.add(Dense(50, activation='relu', kernel_initializer='he_normal')) model.add(Dense(50, activation='relu', kernel_initializer='he_normal')) model.add(Dense(1)) # compile the model model.compile(optimizer='adam', loss='mse', metrics=['mae']) # fit the model model.fit(X_train, y_train, epochs=350, batch_size=32, verbose=2, validation_data=(X_test, y_test)) # evaluate the model mse, mae = model.evaluate(X_test, y_test, verbose=0) print('MSE: %.3f, RMSE: %.3f, MAE: %.3f' % (mse, sqrt(mse), mae)) # make a prediction row = asarray([18024.0, 16722.0, 14385.0, 21342.0, 17180.0]).reshape((1, n_steps, 1)) yhat = model.predict(row) print('Predicted: %.3f' % (yhat))

Running the example first reports the shape of the dataset, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single example.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

**What results did you get?** Can you change the model to do better?

Post your findings to the comments below.

First, the shape of the train and test datasets is displayed, confirming that the last 12 examples are used for model evaluation.

In this case, the model achieved an MAE of about 2,800 and predicted the next value in the sequence from the test set as 13,199, where the expected value is 14,577 (pretty close).

(91, 5, 1) (12, 5, 1) (91,) (12,) MSE: 12755421.000, RMSE: 3571.473, MAE: 2856.084 Predicted: 13199.325

**Note**: it is good practice to scale and make the series stationary the data prior to fitting the model. I recommend this as an extension in order to achieve better performance. For more on preparing time series data for modeling, see the tutorial:

In this section, you will discover how to use some of the slightly more advanced model features, such as reviewing learning curves and saving models for later use.

The architecture of deep learning models can quickly become large and complex.

As such, it is important to have a clear idea of the connections and data flow in your model. This is especially important if you are using the functional API to ensure you have indeed connected the layers of the model in the way you intended.

There are two tools you can use to visualize your model: a text description and a plot.

A text description of your model can be displayed by calling the summary() function on your model.

The example below defines a small model with three layers and then summarizes the structure.

# example of summarizing a model from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(8,))) model.add(Dense(8, activation='relu', kernel_initializer='he_normal')) model.add(Dense(1, activation='sigmoid')) # summarize the model model.summary()

Running the example prints a summary of each layer, as well as a total summary.

This is an invaluable diagnostic for checking the output shapes and number of parameters (weights) in your model.

Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 10) 90 _________________________________________________________________ dense_1 (Dense) (None, 8) 88 _________________________________________________________________ dense_2 (Dense) (None, 1) 9 ================================================================= Total params: 187 Trainable params: 187 Non-trainable params: 0 _________________________________________________________________

You can create a plot of your model by calling the plot_model() function.

This will create an image file that contains a box and line diagram of the layers in your model.

The example below creates a small three-layer model and saves a plot of the model architecture to ‘*model.png*‘ that includes input and output shapes.

# example of plotting a model from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.utils import plot_model # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(8,))) model.add(Dense(8, activation='relu', kernel_initializer='he_normal')) model.add(Dense(1, activation='sigmoid')) # summarize the model plot_model(model, 'model.png', show_shapes=True)

Running the example creates a plot of the model showing a box for each layer with shape information, and arrows that connect the layers, showing the flow of data through the network.

Learning curves are a plot of neural network model performance over time, such as calculated at the end of each training epoch.

Plots of learning curves provide insight into the learning dynamics of the model, such as whether the model is learning well, whether it is underfitting the training dataset, or whether it is overfitting the training dataset.

For a gentle introduction to learning curves and how to use them to diagnose learning dynamics of models, see the tutorial:

You can easily create learning curves for your deep learning models.

First, you must update your call to the fit function to include reference to a validation dataset. This is a portion of the training set not used to fit the model, and is instead used to evaluate the performance of the model during training.

You can split the data manually and specify the *validation_data* argument, or you can use the *validation_split* argument and specify a percentage split of the training dataset and let the API perform the split for you. The latter is simpler for now.

The fit function will return a *history* object that contains a trace of performance metrics recorded at the end of each training epoch. This includes the chosen loss function and each configured metric, such as accuracy, and each loss and metric is calculated for the training and validation datasets.

A learning curve is a plot of the loss on the training dataset and the validation dataset. We can create this plot from the *history* object using the Matplotlib library.

The example below fits a small neural network on a synthetic binary classification problem. A validation split of 30 percent is used to evaluate the model during training and the cross-entropy loss on the train and validation datasets are then graphed using a line plot.

# example of plotting learning curves from sklearn.datasets import make_classification from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.optimizers import SGD from matplotlib import pyplot # create the dataset X, y = make_classification(n_samples=1000, n_classes=2, random_state=1) # determine the number of input features n_features = X.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(1, activation='sigmoid')) # compile the model sgd = SGD(learning_rate=0.001, momentum=0.8) model.compile(optimizer=sgd, loss='binary_crossentropy') # fit the model history = model.fit(X, y, epochs=100, batch_size=32, verbose=0, validation_split=0.3) # plot learning curves pyplot.title('Learning Curves') pyplot.xlabel('Epoch') pyplot.ylabel('Cross Entropy') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='val') pyplot.legend() pyplot.show()

Running the example fits the model on the dataset. At the end of the run, the *history* object is returned and used as the basis for creating the line plot.

The cross-entropy loss for the training dataset is accessed via the ‘*loss*‘ key and the loss on the validation dataset is accessed via the ‘*val_loss*‘ key on the history attribute of the history object.

Training and evaluating models is great, but we may want to use a model later without retraining it each time.

This can be achieved by saving the model to file and later loading it and using it to make predictions.

This can be achieved using the *save()* function on the model to save the model. It can be loaded later using the load_model() function.

The model is saved in H5 format, an efficient array storage format. As such, you must ensure that the h5py library is installed on your workstation. This can be achieved using *pip*; for example:

pip install h5py

The example below fits a simple model on a synthetic binary classification problem and then saves the model file.

# example of saving a fit model from sklearn.datasets import make_classification from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.optimizers import SGD # create the dataset X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=1) # determine the number of input features n_features = X.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(1, activation='sigmoid')) # compile the model sgd = SGD(learning_rate=0.001, momentum=0.8) model.compile(optimizer=sgd, loss='binary_crossentropy') # fit the model model.fit(X, y, epochs=100, batch_size=32, verbose=0, validation_split=0.3) # save model to file model.save('model.h5')

Running the example fits the model and saves it to file with the name ‘*model.h5*‘.

We can then load the model and use it to make a prediction, or continue training it, or do whatever we wish with it.

The example below loads the model and uses it to make a prediction.

# example of loading a saved model from sklearn.datasets import make_classification from tensorflow.keras.models import load_model # create the dataset X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=1) # load the model from file model = load_model('model.h5') # make a prediction row = [1.91518414, 1.14995454, -1.52847073, 0.79430654] yhat = model.predict([row]) print('Predicted: %.3f' % yhat[0])

Running the example loads the image from file, then uses it to make a prediction on a new row of data and prints the result.

Predicted: 0.831

In this section, you will discover some of the techniques that you can use to improve the performance of your deep learning models.

A big part of improving deep learning performance involves avoiding overfitting by slowing down the learning process or stopping the learning process at the right time.

Dropout is a clever regularization method that reduces overfitting of the training dataset and makes the model more robust.

This is achieved during training, where some number of layer outputs are randomly ignored or “*dropped out*.” This has the effect of making the layer look like – and be treated like – a layer with a different number of nodes and connectivity to the prior layer.

Dropout has the effect of making the training process noisy, forcing nodes within a layer to probabilistically take on more or less responsibility for the inputs.

For more on how dropout works, see this tutorial:

You can add dropout to your models as a new layer prior to the layer that you want to have input connections dropped-out.

This involves adding a layer called Dropout() that takes an argument that specifies the probability that each output from the previous to drop. E.g. 0.4 means 40 percent of inputs will be dropped each update to the model.

You can add Dropout layers in MLP, CNN, and RNN models, although there are also specialized versions of dropout for use with CNN and RNN models that you might also want to explore.

The example below fits a small neural network model on a synthetic binary classification problem.

A dropout layer with 50 percent dropout is inserted between the first hidden layer and the output layer.

# example of using dropout from sklearn.datasets import make_classification from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.layers import Dropout from matplotlib import pyplot # create the dataset X, y = make_classification(n_samples=1000, n_classes=2, random_state=1) # determine the number of input features n_features = X.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid')) # compile the model model.compile(optimizer='adam', loss='binary_crossentropy') # fit the model model.fit(X, y, epochs=100, batch_size=32, verbose=0)

The scale and distribution of inputs to a layer can greatly impact how easy or quickly that layer can be trained.

This is generally why it is a good idea to scale input data prior to modeling it with a neural network model.

Batch normalization is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks.

For more on how batch normalization works, see this tutorial:

You can use batch normalization in your network by adding a batch normalization layer prior to the layer that you wish to have standardized inputs. You can use batch normalization with MLP, CNN, and RNN models.

This can be achieved by adding the BatchNormalization layer directly.

The example below defines a small MLP network for a binary classification prediction problem with a batch normalization layer between the first hidden layer and the output layer.

# example of using batch normalization from sklearn.datasets import make_classification from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.layers import BatchNormalization from matplotlib import pyplot # create the dataset X, y = make_classification(n_samples=1000, n_classes=2, random_state=1) # determine the number of input features n_features = X.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(BatchNormalization()) model.add(Dense(1, activation='sigmoid')) # compile the model model.compile(optimizer='adam', loss='binary_crossentropy') # fit the model model.fit(X, y, epochs=100, batch_size=32, verbose=0)

Also, tf.keras has a range of other normalization layers you might like to explore; see:

Neural networks are challenging to train.

Too little training and the model is underfit; too much training and the model overfits the training dataset. Both cases result in a model that is less effective than it could be.

One approach to solving this problem is to use early stopping. This involves monitoring the loss on the training dataset and a validation dataset (a subset of the training set not used to fit the model). As soon as loss for the validation set starts to show signs of overfitting, the training process can be stopped.

For more on early stopping, see the tutorial:

Early stopping can be used with your model by first ensuring that you have a validation dataset. You can define the validation dataset manually via the *validation_data* argument to the *fit()* function, or you can use the *validation_split* and specify the amount of the training dataset to hold back for validation.

You can then define an EarlyStopping and instruct it on which performance measure to monitor, such as ‘*val_loss*‘ for loss on the validation dataset, and the number of epochs to observed overfitting before taking action, e.g. 5.

This configured EarlyStopping callback can then be provided to the *fit()* function via the “*callbacks*” argument that takes a list of callbacks.

This allows you to set the number of epochs to a large number and be confident that training will end as soon as the model starts overfitting. You might also like to create a learning curve to discover more insights into the learning dynamics of the run and when training was halted.

The example below demonstrates a small neural network on a synthetic binary classification problem that uses early stopping to halt training as soon as the model starts overfitting (after about 50 epochs).

# example of using early stopping from sklearn.datasets import make_classification from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.callbacks import EarlyStopping # create the dataset X, y = make_classification(n_samples=1000, n_classes=2, random_state=1) # determine the number of input features n_features = X.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(1, activation='sigmoid')) # compile the model model.compile(optimizer='adam', loss='binary_crossentropy') # configure early stopping es = EarlyStopping(monitor='val_loss', patience=5) # fit the model history = model.fit(X, y, epochs=200, batch_size=32, verbose=0, validation_split=0.3, callbacks=[es])

The tf.keras API provides a number of callbacks that you might like to explore; you can learn more here:

This section provides more resources on the topic if you are looking to go deeper.

- How to Control the Stability of Training Neural Networks With the Batch Size
- A Gentle Introduction to the Rectified Linear Unit (ReLU)
- Difference Between Classification and Regression in Machine Learning
- How to Manually Scale Image Pixel Data for Deep Learning
- 4 Common Machine Learning Data Transforms for Time Series Forecasting
- How to use Learning Curves to Diagnose Machine Learning Model Performance
- A Gentle Introduction to Dropout for Regularizing Deep Neural Networks
- A Gentle Introduction to Batch Normalization for Deep Neural Networks
- A Gentle Introduction to Early Stopping to Avoid Overtraining Neural Networks

- Deep Learning, 2016.

- Install TensorFlow 2 Guide.
- TensorFlow Core: Keras
- Tensorflow Core: Keras Overview Guide
- The Keras functional API in TensorFlow
- Save and load models
- Normalization Layers Guide.

In this tutorial, you discovered a step-by-step guide to developing deep learning models in TensorFlow using the tf.keras API.

Specifically, you learned:

- The difference between Keras and tf.keras and how to install and confirm TensorFlow is working.
- The 5-step life-cycle of tf.keras models and how to use the sequential and functional APIs.
- How to develop MLP, CNN, and RNN models with tf.keras for regression, classification, and time series forecasting.
- How to use the advanced features of the tf.keras API to inspect and diagnose your model.
- How to improve the performance of your tf.keras model by reducing overfitting and accelerating training.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.

The post TensorFlow 2 Tutorial: Get Started in Deep Learning With tf.keras appeared first on Machine Learning Mastery.

]]>The post 3 Ways to Encode Categorical Variables for Deep Learning appeared first on Machine Learning Mastery.

]]>This means that if your data contains categorical data, you must encode it to numbers before you can fit and evaluate a model.

The two most popular techniques are an **integer encoding** and a **one hot encoding**, although a newer technique called **learned embedding** may provide a useful middle ground between these two methods.

In this tutorial, you will discover how to encode categorical data when developing neural network models in Keras.

After completing this tutorial, you will know:

- The challenge of working with categorical data when using machine learning and deep learning models.
- How to integer encode and one hot encode categorical variables for modeling.
- How to learn an embedding distributed representation as part of a neural network for categorical variables.

**Kick-start your project** with my new book Deep Learning With Python, including *step-by-step tutorials* and the *Python source code* files for all examples.

Let’s get started.

This tutorial is divided into five parts; they are:

- The Challenge With Categorical Data
- Breast Cancer Categorical Dataset
- How to Ordinal Encode Categorical Data
- How to One Hot Encode Categorical Data
- How to Use a Learned Embedding for Categorical Data

A categorical variable is a variable whose values take on the value of labels.

For example, the variable may be “*color*” and may take on the values “*red*,” “*green*,” and “*blue*.”

Sometimes, the categorical data may have an ordered relationship between the categories, such as “*first*,” “*second*,” and “*third*.” This type of categorical data is referred to as ordinal and the additional ordering information can be useful.

Machine learning algorithms and deep learning neural networks require that input and output variables are numbers.

This means that categorical data must be encoded to numbers before we can use it to fit and evaluate a model.

There are many ways to encode categorical variables for modeling, although the three most common are as follows:

**Integer Encoding**: Where each unique label is mapped to an integer.**One Hot Encoding**: Where each label is mapped to a binary vector.**Learned Embedding**: Where a distributed representation of the categories is learned.

We will take a closer look at how to encode categorical data for training a deep learning neural network in Keras using each one of these methods.

As the basis of this tutorial, we will use the so-called “Breast cancer” dataset that has been widely studied in machine learning since the 1980s.

The dataset classifies breast cancer patient data as either a recurrence or no recurrence of cancer. There are 286 examples and nine input variables. It is a binary classification problem.

A reasonable classification accuracy score on this dataset is between 68% and 73%. We will aim for this region, but note that the models in this tutorial are not optimized: *they are designed to demonstrate encoding schemes*.

You can download the dataset and save the file as “*breast-cancer.csv*” in your current working directory.

Looking at the data, we can see that all nine input variables are categorical.

Specifically, all variables are quoted strings; some are ordinal and some are not.

'40-49','premeno','15-19','0-2','yes','3','right','left_up','no','recurrence-events' '50-59','ge40','15-19','0-2','no','1','right','central','no','no-recurrence-events' '50-59','ge40','35-39','0-2','no','2','left','left_low','no','recurrence-events' '40-49','premeno','35-39','0-2','yes','3','right','left_low','yes','no-recurrence-events' '40-49','premeno','30-34','3-5','yes','2','left','right_up','no','recurrence-events' ...

We can load this dataset into memory using the Pandas library.

... # load the dataset as a pandas DataFrame data = read_csv(filename, header=None) # retrieve numpy array dataset = data.values

Once loaded, we can split the columns into input (*X*) and output (*y*) for modeling.

... # split into input (X) and output (y) variables X = dataset[:, :-1] y = dataset[:,-1]

Finally, we can force all fields in the input data to be string, just in case Pandas tried to map some automatically to numbers (it does try).

We can also reshape the output variable to be one column (e.g. a 2D shape).

... # format all fields as string X = X.astype(str) # reshape target to be a 2d array y = y.reshape((len(y), 1))

We can tie all of this together into a helpful function that we can reuse later.

# load the dataset def load_dataset(filename): # load the dataset as a pandas DataFrame data = read_csv(filename, header=None) # retrieve numpy array dataset = data.values # split into input (X) and output (y) variables X = dataset[:, :-1] y = dataset[:,-1] # format all fields as string X = X.astype(str) # reshape target to be a 2d array y = y.reshape((len(y), 1)) return X, y

Once loaded, we can split the data into training and test sets so that we can fit and evaluate a deep learning model.

We will use the train_test_split() function from scikit-learn and use 67% of the data for training and 33% for testing.

... # load the dataset X, y = load_dataset('breast-cancer.csv') # split into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

Tying all of these elements together, the complete example of loading, splitting, and summarizing the raw categorical dataset is listed below.

# load and summarize the dataset from pandas import read_csv from sklearn.model_selection import train_test_split # load the dataset def load_dataset(filename): # load the dataset as a pandas DataFrame data = read_csv(filename, header=None) # retrieve numpy array dataset = data.values # split into input (X) and output (y) variables X = dataset[:, :-1] y = dataset[:,-1] # format all fields as string X = X.astype(str) # reshape target to be a 2d array y = y.reshape((len(y), 1)) return X, y # load the dataset X, y = load_dataset('breast-cancer.csv') # split into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) # summarize print('Train', X_train.shape, y_train.shape) print('Test', X_test.shape, y_test.shape)

Running the example reports the size of the input and output elements of the train and test sets.

We can see that we have 191 examples for training and 95 for testing.

Train (191, 9) (191, 1) Test (95, 9) (95, 1)

Now that we are familiar with the dataset, let’s look at how we can encode it for modeling.

An ordinal encoding involves mapping each unique label to an integer value.

As such, it is sometimes referred to simply as an integer encoding.

This type of encoding is really only appropriate if there is a known relationship between the categories.

This relationship does exist for some of the variables in the dataset, and ideally, this should be harnessed when preparing the data.

In this case, we will ignore any possible existing ordinal relationship and assume all variables are categorical. It can still be helpful to use an ordinal encoding, at least as a point of reference with other encoding schemes.

We can use the OrdinalEncoder() from scikit-learn to encode each variable to integers. This is a flexible class and does allow the order of the categories to be specified as arguments if any such order is known.

**Note**: I will leave it as an exercise for you to update the example below to try specifying the order for those variables that have a natural ordering and see if it has an impact on model performance.

The best practice when encoding variables is to fit the encoding on the training dataset, then apply it to the train and test datasets.

The function below, named *prepare_inputs()*, takes the input data for the train and test sets and encodes it using an ordinal encoding.

# prepare input data def prepare_inputs(X_train, X_test): oe = OrdinalEncoder() oe.fit(X_train) X_train_enc = oe.transform(X_train) X_test_enc = oe.transform(X_test) return X_train_enc, X_test_enc

We also need to prepare the target variable.

It is a binary classification problem, so we need to map the two class labels to 0 and 1.

This is a type of ordinal encoding, and scikit-learn provides the LabelEncoder class specifically designed for this purpose. We could just as easily use the OrdinalEncoder and achieve the same result, although the *LabelEncoder* is designed for encoding a single variable.

The *prepare_targets()* integer encodes the output data for the train and test sets.

# prepare target def prepare_targets(y_train, y_test): le = LabelEncoder() le.fit(y_train) y_train_enc = le.transform(y_train) y_test_enc = le.transform(y_test) return y_train_enc, y_test_enc

We can call these functions to prepare our data.

... # prepare input data X_train_enc, X_test_enc = prepare_inputs(X_train, X_test) # prepare output data y_train_enc, y_test_enc = prepare_targets(y_train, y_test)

We can now define a neural network model.

We will use the same general model in all of these examples. Specifically, a MultiLayer Perceptron (MLP) neural network with one hidden layer with 10 nodes, and one node in the output layer for making binary classifications.

Without going into too much detail, the code below defines the model, fits it on the training dataset, and then evaluates it on the test dataset.

... # define the model model = Sequential() model.add(Dense(10, input_dim=X_train_enc.shape[1], activation='relu', kernel_initializer='he_normal')) model.add(Dense(1, activation='sigmoid')) # compile the keras model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # fit the keras model on the dataset model.fit(X_train_enc, y_train_enc, epochs=100, batch_size=16, verbose=2) # evaluate the keras model _, accuracy = model.evaluate(X_test_enc, y_test_enc, verbose=0) print('Accuracy: %.2f' % (accuracy*100))

If you are new to developing neural networks in Keras, I recommend this tutorial:

Tying all of this together, the complete example of preparing the data with an ordinal encoding and fitting and evaluating a neural network on the data is listed below.

# example of ordinal encoding for a neural network from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from sklearn.preprocessing import OrdinalEncoder from keras.models import Sequential from keras.layers import Dense # load the dataset def load_dataset(filename): # load the dataset as a pandas DataFrame data = read_csv(filename, header=None) # retrieve numpy array dataset = data.values # split into input (X) and output (y) variables X = dataset[:, :-1] y = dataset[:,-1] # format all fields as string X = X.astype(str) # reshape target to be a 2d array y = y.reshape((len(y), 1)) return X, y # prepare input data def prepare_inputs(X_train, X_test): oe = OrdinalEncoder() oe.fit(X_train) X_train_enc = oe.transform(X_train) X_test_enc = oe.transform(X_test) return X_train_enc, X_test_enc # prepare target def prepare_targets(y_train, y_test): le = LabelEncoder() le.fit(y_train) y_train_enc = le.transform(y_train) y_test_enc = le.transform(y_test) return y_train_enc, y_test_enc # load the dataset X, y = load_dataset('breast-cancer.csv') # split into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) # prepare input data X_train_enc, X_test_enc = prepare_inputs(X_train, X_test) # prepare output data y_train_enc, y_test_enc = prepare_targets(y_train, y_test) # define the model model = Sequential() model.add(Dense(10, input_dim=X_train_enc.shape[1], activation='relu', kernel_initializer='he_normal')) model.add(Dense(1, activation='sigmoid')) # compile the keras model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # fit the keras model on the dataset model.fit(X_train_enc, y_train_enc, epochs=100, batch_size=16, verbose=2) # evaluate the keras model _, accuracy = model.evaluate(X_test_enc, y_test_enc, verbose=0) print('Accuracy: %.2f' % (accuracy*100))

Running the example will fit the model in just a few seconds on any modern hardware (no GPU required).

The loss and the accuracy of the model are reported at the end of each training epoch, and finally, the accuracy of the model on the test dataset is reported.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the model achieved an accuracy of about 70% on the test dataset.

Not bad, given that an ordinal relationship only exists for some of the input variables, and for those where it does, it was not honored in the encoding.

... Epoch 95/100 - 0s - loss: 0.5349 - acc: 0.7696 Epoch 96/100 - 0s - loss: 0.5330 - acc: 0.7539 Epoch 97/100 - 0s - loss: 0.5316 - acc: 0.7592 Epoch 98/100 - 0s - loss: 0.5302 - acc: 0.7696 Epoch 99/100 - 0s - loss: 0.5291 - acc: 0.7644 Epoch 100/100 - 0s - loss: 0.5277 - acc: 0.7644 Accuracy: 70.53

This provides a good starting point when working with categorical data.

A better and more general approach is to use a one hot encoding.

A one hot encoding is appropriate for categorical data where no relationship exists between categories.

It involves representing each categorical variable with a binary vector that has one element for each unique label and marking the class label with a 1 and all other elements 0.

For example, if our variable was “*color*” and the labels were “*red*,” “*green*,” and “*blue*,” we would encode each of these labels as a three-element binary vector as follows:

- Red: [1, 0, 0]
- Green: [0, 1, 0]
- Blue: [0, 0, 1]

Then each label in the dataset would be replaced with a vector (one column becomes three). This is done for all categorical variables so that our nine input variables or columns become 43 in the case of the breast cancer dataset.

The scikit-learn library provides the OneHotEncoder to automatically one hot encode one or more variables.

The *prepare_inputs()* function below provides a drop-in replacement function for the example in the previous section. Instead of using an *OrdinalEncoder*, it uses a *OneHotEncoder*.

# prepare input data def prepare_inputs(X_train, X_test): ohe = OneHotEncoder() ohe.fit(X_train) X_train_enc = ohe.transform(X_train) X_test_enc = ohe.transform(X_test) return X_train_enc, X_test_enc

Tying this together, the complete example of one hot encoding the breast cancer categorical dataset and modeling it with a neural network is listed below.

# example of one hot encoding for a neural network from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from sklearn.preprocessing import OneHotEncoder from keras.models import Sequential from keras.layers import Dense # load the dataset def load_dataset(filename): # load the dataset as a pandas DataFrame data = read_csv(filename, header=None) # retrieve numpy array dataset = data.values # split into input (X) and output (y) variables X = dataset[:, :-1] y = dataset[:,-1] # format all fields as string X = X.astype(str) # reshape target to be a 2d array y = y.reshape((len(y), 1)) return X, y # prepare input data def prepare_inputs(X_train, X_test): ohe = OneHotEncoder() ohe.fit(X_train) X_train_enc = ohe.transform(X_train) X_test_enc = ohe.transform(X_test) return X_train_enc, X_test_enc # prepare target def prepare_targets(y_train, y_test): le = LabelEncoder() le.fit(y_train) y_train_enc = le.transform(y_train) y_test_enc = le.transform(y_test) return y_train_enc, y_test_enc # load the dataset X, y = load_dataset('breast-cancer.csv') # split into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) # prepare input data X_train_enc, X_test_enc = prepare_inputs(X_train, X_test) # prepare output data y_train_enc, y_test_enc = prepare_targets(y_train, y_test) # define the model model = Sequential() model.add(Dense(10, input_dim=X_train_enc.shape[1], activation='relu', kernel_initializer='he_normal')) model.add(Dense(1, activation='sigmoid')) # compile the keras model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # fit the keras model on the dataset model.fit(X_train_enc, y_train_enc, epochs=100, batch_size=16, verbose=2) # evaluate the keras model _, accuracy = model.evaluate(X_test_enc, y_test_enc, verbose=0) print('Accuracy: %.2f' % (accuracy*100))

The example one hot encodes the input categorical data, and also label encodes the target variable as we did in the previous section. The same neural network model is then fit on the prepared dataset.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, the model performs reasonably well, achieving an accuracy of about 72%, close to what was seen in the previous section.

A more fair comparison would be to run each configuration 10 or 30 times and compare performance using the mean accuracy. Recall, that we are more focused on how to encode categorical data in this tutorial rather than getting the best score on this specific dataset.

... Epoch 95/100 - 0s - loss: 0.3837 - acc: 0.8272 Epoch 96/100 - 0s - loss: 0.3823 - acc: 0.8325 Epoch 97/100 - 0s - loss: 0.3814 - acc: 0.8325 Epoch 98/100 - 0s - loss: 0.3795 - acc: 0.8325 Epoch 99/100 - 0s - loss: 0.3788 - acc: 0.8325 Epoch 100/100 - 0s - loss: 0.3773 - acc: 0.8325 Accuracy: 72.63

Ordinal and one hot encoding are perhaps the two most popular methods.

A newer technique is similar to one hot encoding and was designed for use with neural networks, called a learned embedding.

A learned embedding, or simply an “*embedding*,” is a distributed representation for categorical data.

Each category is mapped to a distinct vector, and the properties of the vector are adapted or learned while training a neural network. The vector space provides a projection of the categories, allowing those categories that are close or related to cluster together naturally.

This provides both the benefits of an ordinal relationship by allowing any such relationships to be learned from data, and a one hot encoding in providing a vector representation for each category. Unlike one hot encoding, the input vectors are not sparse (do not have lots of zeros). The downside is that it requires learning as part of the model and the creation of many more input variables (columns).

The technique was originally developed to provide a distributed representation for words, e.g. allowing similar words to have similar vector representations. As such, the technique is often referred to as a word embedding, and in the case of text data, algorithms have been developed to learn a representation independent of a neural network. For more on this topic, see the post:

An additional benefit of using an embedding is that the learned vectors that each category is mapped to can be fit in a model that has modest skill, but the vectors can be extracted and used generally as input for the category on a range of different models and applications. That is, they can be learned and reused.

Embeddings can be used in Keras via the *Embedding* layer.

For an example of learning word embeddings for text data in Keras, see the post:

One embedding layer is required for each categorical variable, and the embedding expects the categories to be ordinal encoded, although no relationship between the categories is assumed.

Each embedding also requires the number of dimensions to use for the distributed representation (vector space). It is common in natural language applications to use 50, 100, or 300 dimensions. For our small example, we will fix the number of dimensions at 10, but this is arbitrary; you should experimenter with other values.

First, we can prepare the input data using an ordinal encoding.

The model we will develop will have one separate embedding for each input variable. Therefore, the model will take nine different input datasets. As such, we will split the input variables and ordinal encode (integer encoding) each separately using the *LabelEncoder* and return a list of separate prepared train and test input datasets.

The *prepare_inputs()* function below implements this, enumerating over each input variable, integer encoding each correctly using best practices, and returning lists of encoded train and test variables (or one-variable datasets) that can be used as input for our model later.

# prepare input data def prepare_inputs(X_train, X_test): X_train_enc, X_test_enc = list(), list() # label encode each column for i in range(X_train.shape[1]): le = LabelEncoder() le.fit(X_train[:, i]) # encode train_enc = le.transform(X_train[:, i]) test_enc = le.transform(X_test[:, i]) # store X_train_enc.append(train_enc) X_test_enc.append(test_enc) return X_train_enc, X_test_enc

Now we can construct the model.

We must construct the model differently in this case because we will have nine input layers, with nine embeddings the outputs of which (the nine different 10-element vectors) need to be concatenated into one long vector before being passed as input to the dense layers.

We can achieve this using the functional Keras API. If you are new to the Keras functional API, see the post:

First, we can enumerate each variable and construct an input layer and connect it to an embedding layer, and store both layers in lists. We need a reference to all of the input layers when defining the model, and we need a reference to each embedding layer to concentrate them with a merge layer.

... # prepare each input head in_layers = list() em_layers = list() for i in range(len(X_train_enc)): # calculate the number of unique inputs n_labels = len(unique(X_train_enc[i])) # define input layer in_layer = Input(shape=(1,)) # define embedding layer em_layer = Embedding(n_labels, 10)(in_layer) # store layers in_layers.append(in_layer) em_layers.append(em_layer)

We can then merge all of the embedding layers, define the hidden layer and output layer, then define the model.

... # concat all embeddings merge = concatenate(em_layers) dense = Dense(10, activation='relu', kernel_initializer='he_normal')(merge) output = Dense(1, activation='sigmoid')(dense) model = Model(inputs=in_layers, outputs=output)

When using a model with multiple inputs, we will need to specify a list that has one dataset for each input, e.g. a list of nine arrays each with one column in the case of our dataset. Thankfully, this is the format we returned from our *prepare_inputs()* function.

Therefore, fitting and evaluating the model looks like it does in the previous section.

Additionally, we will plot the model by calling the *plot_model()* function and save it to file. This requires that pygraphviz and pydot are installed, which can be a pain on some systems. **If you have trouble**, just comment out the import statement and call to *plot_model()*.

... # compile the keras model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # plot graph plot_model(model, show_shapes=True, to_file='embeddings.png') # fit the keras model on the dataset model.fit(X_train_enc, y_train_enc, epochs=20, batch_size=16, verbose=2) # evaluate the keras model _, accuracy = model.evaluate(X_test_enc, y_test_enc, verbose=0) print('Accuracy: %.2f' % (accuracy*100))

Tying this all together, the complete example of using a separate embedding for each categorical input variable in a multi-input layer model is listed below.

# example of learned embedding encoding for a neural network from numpy import unique from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Embedding from keras.layers.merge import concatenate from keras.utils import plot_model # load the dataset def load_dataset(filename): # load the dataset as a pandas DataFrame data = read_csv(filename, header=None) # retrieve numpy array dataset = data.values # split into input (X) and output (y) variables X = dataset[:, :-1] y = dataset[:,-1] # format all fields as string X = X.astype(str) # reshape target to be a 2d array y = y.reshape((len(y), 1)) return X, y # prepare input data def prepare_inputs(X_train, X_test): X_train_enc, X_test_enc = list(), list() # label encode each column for i in range(X_train.shape[1]): le = LabelEncoder() le.fit(X_train[:, i]) # encode train_enc = le.transform(X_train[:, i]) test_enc = le.transform(X_test[:, i]) # store X_train_enc.append(train_enc) X_test_enc.append(test_enc) return X_train_enc, X_test_enc # prepare target def prepare_targets(y_train, y_test): le = LabelEncoder() le.fit(y_train) y_train_enc = le.transform(y_train) y_test_enc = le.transform(y_test) return y_train_enc, y_test_enc # load the dataset X, y = load_dataset('breast-cancer.csv') # split into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) # prepare input data X_train_enc, X_test_enc = prepare_inputs(X_train, X_test) # prepare output data y_train_enc, y_test_enc = prepare_targets(y_train, y_test) # make output 3d y_train_enc = y_train_enc.reshape((len(y_train_enc), 1, 1)) y_test_enc = y_test_enc.reshape((len(y_test_enc), 1, 1)) # prepare each input head in_layers = list() em_layers = list() for i in range(len(X_train_enc)): # calculate the number of unique inputs n_labels = len(unique(X_train_enc[i])) # define input layer in_layer = Input(shape=(1,)) # define embedding layer em_layer = Embedding(n_labels, 10)(in_layer) # store layers in_layers.append(in_layer) em_layers.append(em_layer) # concat all embeddings merge = concatenate(em_layers) dense = Dense(10, activation='relu', kernel_initializer='he_normal')(merge) output = Dense(1, activation='sigmoid')(dense) model = Model(inputs=in_layers, outputs=output) # compile the keras model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # plot graph plot_model(model, show_shapes=True, to_file='embeddings.png') # fit the keras model on the dataset model.fit(X_train_enc, y_train_enc, epochs=20, batch_size=16, verbose=2) # evaluate the keras model _, accuracy = model.evaluate(X_test_enc, y_test_enc, verbose=0) print('Accuracy: %.2f' % (accuracy*100))

Running the example prepares the data as described above, fits the model, and reports the performance.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, the model performs reasonably well, matching what we saw for the one hot encoding in the previous section.

As the learned vectors were trained in a skilled model, it is possible to save them and use them as a general representation for these variables in other models that operate on the same data. A useful and compelling reason to explore this encoding.

... Epoch 15/20 - 0s - loss: 0.4891 - acc: 0.7696 Epoch 16/20 - 0s - loss: 0.4845 - acc: 0.7749 Epoch 17/20 - 0s - loss: 0.4783 - acc: 0.7749 Epoch 18/20 - 0s - loss: 0.4763 - acc: 0.7906 Epoch 19/20 - 0s - loss: 0.4696 - acc: 0.7906 Epoch 20/20 - 0s - loss: 0.4660 - acc: 0.7958 Accuracy: 72.63

To confirm our understanding of the model, a plot is created and saved to the file embeddings.png in the current working directory.

The plot shows the nine inputs each mapped to a 10 element vector, meaning that the actual input to the model is a 90 element vector.

**Note**: Click to the image to see the large version.

This section lists some common questions and answers when encoding categorical data.

Or, what if I have a mixture of categorical and ordinal data?

You will need to prepare or encode each variable (column) in your dataset separately, then concatenate all of the prepared variables back together into a single array for fitting or evaluating the model.

Or, what if I concatenate many one hot encoded vectors to create a many thousand element input vector?

You can use a one hot encoding up to thousands and tens of thousands of categories. Also, having large vectors as input sounds intimidating, but the models can generally handle it.

Try an embedding; it offers the benefit of a smaller vector space (a projection) and the representation can have more meaning.

This is unknowable.

Test each technique (and more) on your dataset with your chosen model and discover what works best for your case.

This section provides more resources on the topic if you are looking to go deeper.

- Develop Your First Neural Network in Python Step-By-Step
- Why One-Hot Encode Data in Machine Learning?
- Data Preparation for Gradient Boosting with XGBoost in Python
- What Are Word Embeddings for Text?
- How to Use Word Embedding Layers for Deep Learning with Keras
- How to Use the Keras Functional API for Deep Learning

- sklearn.model_selection.train_test_split API.
- sklearn.preprocessing.OrdinalEncoder API.
- sklearn.preprocessing.LabelEncoder API.
- Embedding Keras API.
- Visualization Keras API.

- Breast Cancer Data Set, UCI Machine Learning Repository.
- Breast Cancer Raw Dataset
- Breast Cancer Description

In this tutorial, you discovered how to encode categorical data when developing neural network models in Keras.

Specifically, you learned:

- The challenge of working with categorical data when using machine learning and deep learning models.
- How to integer encode and one hot encode categorical variables for modeling.
- How to learn an embedding distributed representation as part of a neural network for categorical variables.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post 3 Ways to Encode Categorical Variables for Deep Learning appeared first on Machine Learning Mastery.

]]>The post What is Deep Learning? appeared first on Machine Learning Mastery.

]]>If you are just starting out in the field of deep learning or you had some experience with neural networks some time ago, you may be confused. I know I was confused initially and so were many of my colleagues and friends who learned and used neural networks in the 1990s and early 2000s.

The leaders and experts in the field have ideas of what deep learning is and these specific and nuanced perspectives shed a lot of light on what deep learning is all about.

In this post, you will discover exactly what deep learning is by hearing from a range of experts and leaders in the field.

**Kick-start your project** with my new book Deep Learning With Python, including *step-by-step tutorials* and the *Python source code* files for all examples.

Let’s dive in.

Andrew Ng from Coursera and Chief Scientist at Baidu Research formally founded Google Brain that eventually resulted in the productization of deep learning technologies across a large number of Google services.

He has spoken and written a lot about what deep learning is and is a good place to start.

In early talks on deep learning, Andrew described deep learning in the context of traditional artificial neural networks. In the 2013 talk titled “Deep Learning, Self-Taught Learning and Unsupervised Feature Learning” he described the idea of deep learning as:

Using brain simulations, hope to:

– Make learning algorithms much better and easier to use.

– Make revolutionary advances in machine learning and AI.

I believe this is our best shot at progress towards real AI

Later his comments became more nuanced.

The core of deep learning according to Andrew is that we now have fast enough computers and enough data to actually train large neural networks. When discussing why now is the time that deep learning is taking off at ExtractConf 2015 in a talk titled “What data scientists should know about deep learning“, he commented:

very large neural networks we can now have and … huge amounts of data that we have access to

He also commented on the important point that it is all about scale. That as we construct larger neural networks and train them with more and more data, their performance continues to increase. This is generally different to other machine learning techniques that reach a plateau in performance.

for most flavors of the old generations of learning algorithms … performance will plateau. … deep learning … is the first class of algorithms … that is scalable. … performance just keeps getting better as you feed them more data

He provides a nice cartoon of this in his slides:

Finally, he is clear to point out that the benefits from deep learning that we are seeing in practice come from supervised learning. From the 2015 ExtractConf talk, he commented:

almost all the value today of deep learning is through supervised learning or learning from labeled data

Earlier at a talk to Stanford University titled “Deep Learning” in 2014 he made a similar comment:

one reason that deep learning has taken off like crazy is because it is fantastic at supervised learning

Andrew often mentions that we should and will see more benefits coming from the unsupervised side of the tracks as the field matures to deal with the **abundance of unlabeled data available**.

Jeff Dean is a Wizard and Google Senior Fellow in the Systems and Infrastructure Group at Google and has been involved and perhaps partially responsible for the scaling and adoption of deep learning within Google. Jeff was involved in the Google Brain project and the development of large-scale deep learning software DistBelief and later TensorFlow.

In a 2016 talk titled “Deep Learning for Building Intelligent Computer Systems” he made a comment in the similar vein, that deep learning is really all about large neural networks.

When you hear the term deep learning, just think of a large deep neural net. Deep refers to the number of layers typically and so this kind of the popular term that’s been adopted in the press. I think of them as deep neural networks generally.

He has given this talk a few times, and in a modified set of slides for the same talk, he highlights the **scalability of neural networks** indicating that results get better with more data and larger models, that in turn require more computation to train.

In addition to scalability, another often cited benefit of deep learning models is their ability to perform automatic feature extraction from raw data, also called feature learning.

Yoshua Bengio is another leader in deep learning although began with a strong interest in the automatic feature learning that large neural networks are capable of achieving.

He describes deep learning in terms of the algorithms ability to discover and learn good representations using feature learning. In his 2012 paper titled “Deep Learning of Representations for Unsupervised and Transfer Learning” he commented:

Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higher-level learned features defined in terms of lower-level features

An elaborated perspective of deep learning along these lines is provided in his 2009 technical report titled “Learning deep architectures for AI” where he emphasizes the importance the hierarchy in feature learning.

Deep learning methods aim at learning feature hierarchies with features from higher levels of the hierarchy formed by the composition of lower level features. Automatically learning features at multiple levels of abstraction allow a system to learn complex functions mapping the input to the output directly from data, without depending completely on human-crafted features.

In the soon to be published book titled “Deep Learning” co-authored with Ian Goodfellow and Aaron Courville, they define deep learning in terms of the depth of the architecture of the models.

The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep, with many layers. For this reason, we call this approach to AI deep learning.

This is an important book and will likely become the definitive resource for the field for some time. The book goes on to describe multilayer perceptrons as an algorithm used in the field of deep learning, giving the idea that deep learning has subsumed artificial neural networks.

The quintessential example of a deep learning model is the feedforward deep network or multilayer perceptron (MLP).

Peter Norvig is the Director of Research at Google and famous for his textbook on AI titled “Artificial Intelligence: A Modern Approach“.

In a 2016 talk he gave titled “Deep Learning and Understandability versus Software Engineering and Verification” he defined deep learning in a very similar way to Yoshua, focusing on the power of abstraction permitted by using a deeper network structure.

a kind of learning where the representation you form have several levels of abstraction, rather than a direct input to output

Why Not Just “

Geoffrey Hinton is a pioneer in the field of artificial neural networks and co-published the first paper on the backpropagation algorithm for training multilayer perceptron networks.

He may have started the introduction of the phrasing “*deep*” to describe the development of large artificial neural networks.

He co-authored a paper in 2006 titled “A Fast Learning Algorithm for Deep Belief Nets” in which they describe an approach to training “deep” (as in a many layered network) of restricted Boltzmann machines.

Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

This paper and the related paper Geoff co-authored titled “Deep Boltzmann Machines” on an undirected deep network were well received by the community (now cited many hundreds of times) because they were successful examples of greedy layer-wise training of networks, allowing many more layers in feedforward networks.

In a co-authored article in Science titled “Reducing the Dimensionality of Data with Neural Networks” they stuck with the same description of “deep” to describe their approach to developing networks with many more layers than was previously typical.

We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

In the same article, they make an interesting comment that meshes with Andrew Ng’s comment about the recent increase in compute power and access to large datasets that has unleashed the untapped capability of neural networks when used at larger scale.

It has been obvious since the 1980s that backpropagation through deep autoencoders would be very effective for nonlinear dimensionality reduction, provided that computers were fast enough, data sets were big enough, and the initial weights were close enough to a good solution. All three conditions are now satisfied.

In a talk to the Royal Society in 2016 titled “Deep Learning“, Geoff commented that Deep Belief Networks were the start of deep learning in 2006 and that the first successful application of this new wave of deep learning was to speech recognition in 2009 titled “Acoustic Modeling using Deep Belief Networks“, achieving state of the art results.

It was the results that made the speech recognition and the neural network communities take notice, the use “deep” as a differentiator on previous neural network techniques that probably resulted in the name change.

The descriptions of deep learning in the Royal Society talk are very backpropagation centric as you would expect. Interesting, he gives 4 reasons why backpropagation (read “deep learning”) did not take off last time around in the 1990s. The first two points match comments by Andrew Ng above about datasets being too small and computers being too slow.

Deep learning excels on problem domains where the inputs (and even output) are analog. Meaning, they are not a few quantities in a tabular format but instead are **images of pixel data, documents of text data or files of audio data**.

Yann LeCun is the director of Facebook Research and is the father of the network architecture that excels at object recognition in image data called the Convolutional Neural Network (CNN). This technique is seeing great success because like multilayer perceptron feedforward neural networks, the technique scales with data and model size and can be trained with backpropagation.

This biases his definition of deep learning as the development of very large CNNs, which have had great success on object recognition in photographs.

In a 2016 talk at Lawrence Livermore National Laboratory titled “Accelerating Understanding: Deep Learning, Intelligent Applications, and GPUs” he described deep learning generally as learning hierarchical representations and defines it as a scalable approach to building object recognition systems:

deep learning [is] … a pipeline of modules all of which are trainable. … deep because [has] multiple stages in the process of recognizing an object and all of those stages are part of the training”

Jurgen Schmidhuber is the father of another popular algorithm that like MLPs and CNNs also scales with model size and dataset size and can be trained with backpropagation, but is instead tailored to learning sequence data, called the Long Short-Term Memory Network (LSTM), a type of recurrent neural network.

We do see some confusion in the phrasing of the field as “deep learning”. In his 2014 paper titled “Deep Learning in Neural Networks: An Overview” he does comment on the problematic naming of the field and the differentiation of deep from shallow learning. He also interestingly describes depth in terms of the complexity of the problem rather than the model used to solve the problem.

At which problem depth does Shallow Learning end, and Deep Learning begin? Discussions with DL experts have not yet yielded a conclusive response to this question. […], let me just define for the purposes of this overview: problems of depth > 10 require Very Deep Learning.

Demis Hassabis is the founder of DeepMind, later acquired by Google. DeepMind made the breakthrough of combining deep learning techniques with reinforcement learning to handle complex learning problems like game playing, famously demonstrated in playing Atari games and the game Go with Alpha Go.

In keeping with the naming, they called their new technique a Deep Q-Network, combining Deep Learning with Q-Learning. They also name the broader field of study “Deep Reinforcement Learning”.

In their 2015 nature paper titled “Human-level control through deep reinforcement learning” they comment on the important role of deep neural networks in their breakthrough and highlight the need for hierarchical abstraction.

To achieve this,we developed a novel agent, a deep Q-network (DQN), which is able to combine reinforcement learning with a class of artificial neural network known as deep neural networks. Notably, recent advances in deep neural networks, in which several layers of nodes are used to build up progressively more abstract representations of the data, have made it possible for artificial neural networks to learn concepts such as object categories directly from raw sensory data.

Finally, in what may be considered a defining paper in the field, Yann LeCun, Yoshua Bengio and Geoffrey Hinton published a paper in Nature titled simply “Deep Learning“. In it, they open with a clean definition of deep learning highlighting the multi-layered approach.

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction.

Later the multi-layered approach is described in terms of **representation learning and abstraction**.

Deep-learning methods are representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level. […] The key aspect of deep learning is that these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure.

This is a nice and generic a description, and could easily describe most artificial neural network algorithms. It is also a good note to end on.

In this post you discovered that deep learning is just very big neural networks on a lot more data, requiring bigger computers.

Although early approaches published by Hinton and collaborators focus on greedy layerwise training and unsupervised methods like autoencoders, modern state-of-the-art deep learning is focused on training deep (many layered) neural network models using the backpropagation algorithm. The most popular techniques are:

- Multilayer Perceptron Networks.
- Convolutional Neural Networks.
- Long Short-Term Memory Recurrent Neural Networks.

I hope this has cleared up what deep learning is and how leading definitions fit together under the one umbrella.

**If you have any questions about deep learning** or about this post, ask your questions in the comments below and I will do my best to answer them.

The post What is Deep Learning? appeared first on Machine Learning Mastery.

]]>The post Your First Deep Learning Project in Python with Keras Step-By-Step appeared first on Machine Learning Mastery.

]]>It wraps the efficient numerical computation libraries Theano and TensorFlow and allows you to define and train neural network models in just a few lines of code.

In this tutorial, you will discover how to create your first deep learning neural network model in Python using Keras.

**Kick-start your project** with my new book Deep Learning With Python, including *step-by-step tutorials* and the *Python source code* files for all examples.

*Let’s get started.*

**Update Feb/2017**: Updated prediction example so rounding works in Python 2 and 3.**Update Mar/2017**: Updated example for the latest versions of Keras and TensorFlow.**Update Mar/2018**: Added alternate link to download the dataset.**Update Jul/2019**: Expanded and added more useful resources.**Update Sep/2019**: Updated for Keras v2.2.5 API.**Update Oct/2019**: Updated for Keras v2.3.0 API and TensorFlow v2.0.0.**Update Aug/2020**: Updated for Keras v2.4.3 and TensorFlow v2.3.

There is not a lot of code required, but we are going to step over it slowly so that you will know how to create your own models in the future.

*The steps you are going to cover in this tutorial are as follows:*

- Load Data.
- Define Keras Model.
- Compile Keras Model.
- Fit Keras Model.
- Evaluate Keras Model.
- Tie It All Together.
- Make Predictions

**This Keras tutorial has a few requirements:**

- You have Python 2 or 3 installed and configured.
- You have SciPy (including NumPy) installed and configured.
- You have Keras and a backend (Theano or TensorFlow) installed and configured.

If you need help with your environment, see the tutorial:

Create a new file called **keras_first_network.py** and type or copy-and-paste the code into the file as you go.

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

The first step is to define the functions and classes we intend to use in this tutorial.

We will use the NumPy library to load our dataset and we will use two classes from the Keras library to define our model.

The imports required are listed below.

# first neural network with keras tutorial from numpy import loadtxt from keras.models import Sequential from keras.layers import Dense ...

We can now load our dataset.

In this Keras tutorial, we are going to use the Pima Indians onset of diabetes dataset. This is a standard machine learning dataset from the UCI Machine Learning repository. It describes patient medical record data for Pima Indians and whether they had an onset of diabetes within five years.

As such, it is a binary classification problem (onset of diabetes as 1 or not as 0). All of the input variables that describe each patient are numerical. This makes it easy to use directly with neural networks that expect numerical input and output values, and ideal for our first neural network in Keras.

The dataset is available from here:

Download the dataset and place it in your local working directory, the same location as your python file.

Save it with the filename:

pima-indians-diabetes.csv

Take a look inside the file, you should see rows of data like the following:

6,148,72,35,0,33.6,0.627,50,1 1,85,66,29,0,26.6,0.351,31,0 8,183,64,0,0,23.3,0.672,32,1 1,89,66,23,94,28.1,0.167,21,0 0,137,40,35,168,43.1,2.288,33,1 ...

We can now load the file as a matrix of numbers using the NumPy function loadtxt().

There are eight input variables and one output variable (the last column). We will be learning a model to map rows of input variables (X) to an output variable (y), which we often summarize as *y = f(X)*.

The variables can be summarized as follows:

Input Variables (X):

- Number of times pregnant
- Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- Diastolic blood pressure (mm Hg)
- Triceps skin fold thickness (mm)
- 2-Hour serum insulin (mu U/ml)
- Body mass index (weight in kg/(height in m)^2)
- Diabetes pedigree function
- Age (years)

Output Variables (y):

- Class variable (0 or 1)

Once the CSV file is loaded into memory, we can split the columns of data into input and output variables.

The data will be stored in a 2D array where the first dimension is rows and the second dimension is columns, e.g. [rows, columns].

We can split the array into two arrays by selecting subsets of columns using the standard NumPy slice operator or “:” We can select the first 8 columns from index 0 to index 7 via the slice 0:8. We can then select the output column (the 9th variable) via index 8.

... # load the dataset dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',') # split into input (X) and output (y) variables X = dataset[:,0:8] y = dataset[:,8] ...

We are now ready to define our neural network model.

**Note**, the dataset has 9 columns and the range 0:8 will select columns from 0 to 7, stopping before index 8. If this is new to you, then you can learn more about array slicing and ranges in this post:

Models in Keras are defined as a sequence of layers.

We create a *Sequential model* and add layers one at a time until we are happy with our network architecture.

The first thing to get right is to ensure the input layer has the right number of input features. This can be specified when creating the first layer with the **input_dim** argument and setting it to 8 for the 8 input variables.

How do we know the number of layers and their types?

This is a very hard question. There are heuristics that we can use and often the best network structure is found through a process of trial and error experimentation (I explain more about this here). Generally, you need a network large enough to capture the structure of the problem.

In this example, we will use a fully-connected network structure with three layers.

Fully connected layers are defined using the Dense class. We can specify the number of neurons or nodes in the layer as the first argument, and specify the activation function using the **activation** argument.

We will use the rectified linear unit activation function referred to as ReLU on the first two layers and the Sigmoid function in the output layer.

It used to be the case that Sigmoid and Tanh activation functions were preferred for all layers. These days, better performance is achieved using the ReLU activation function. We use a sigmoid on the output layer to ensure our network output is between 0 and 1 and easy to map to either a probability of class 1 or snap to a hard classification of either class with a default threshold of 0.5.

We can piece it all together by adding each layer:

- The model expects rows of data with 8 variables (the
*input_dim=8*argument) - The first hidden layer has 12 nodes and uses the relu activation function.
- The second hidden layer has 8 nodes and uses the relu activation function.
- The output layer has one node and uses the sigmoid activation function.

... # define the keras model model = Sequential() model.add(Dense(12, input_dim=8, activation='relu')) model.add(Dense(8, activation='relu')) model.add(Dense(1, activation='sigmoid')) ...

**Note**, the most confusing thing here is that the shape of the input to the model is defined as an argument on the first hidden layer. This means that the line of code that adds the first Dense layer is doing 2 things, defining the input or visible layer and the first hidden layer.

Now that the model is defined, *we can compile it*.

Compiling the model uses the efficient numerical libraries under the covers (the so-called backend) such as Theano or TensorFlow. The backend automatically chooses the best way to represent the network for training and making predictions to run on your hardware, such as CPU or GPU or even distributed.

When compiling, we must specify some additional properties required when training the network. Remember training a network means finding the best set of weights to map inputs to outputs in our dataset.

We must specify the loss function to use to evaluate a set of weights, the optimizer is used to search through different weights for the network and any optional metrics we would like to collect and report during training.

In this case, we will use cross entropy as the **loss** argument. This loss is for a binary classification problems and is defined in Keras as “**binary_crossentropy**“. You can learn more about choosing loss functions based on your problem here:

We will define the **optimizer** as the efficient stochastic gradient descent algorithm “**adam**“. This is a popular version of gradient descent because it automatically tunes itself and gives good results in a wide range of problems. To learn more about the Adam version of stochastic gradient descent see the post:

Finally, because it is a classification problem, we will collect and report the classification accuracy, defined via the **metrics** argument.

... # compile the keras model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) ...

We have defined our model and compiled it ready for efficient computation.

Now it is time to execute the model on some data.

We can train or fit our model on our loaded data by calling the **fit()** function on the model.

Training occurs over epochs and each epoch is split into batches.

**Epoch**: One pass through all of the rows in the training dataset.**Batch**: One or more samples considered by the model within an epoch before weights are updated.

One epoch is comprised of one or more batches, based on the chosen batch size and the model is fit for many epochs. For more on the difference between epochs and batches, see the post:

The training process will run for a fixed number of iterations through the dataset called epochs, that we must specify using the **epochs** argument. We must also set the number of dataset rows that are considered before the model weights are updated within each epoch, called the batch size and set using the **batch_size** argument.

For this problem, we will run for a small number of epochs (150) and use a relatively small batch size of 10.

These configurations can be chosen experimentally by trial and error. We want to train the model enough so that it learns a good (or good enough) mapping of rows of input data to the output classification. The model will always have some error, but the amount of error will level out after some point for a given model configuration. This is called model convergence.

... # fit the keras model on the dataset model.fit(X, y, epochs=150, batch_size=10) ...

This is where the work happens on your CPU or GPU.

No GPU is required for this example, but if you’re interested in how to run large models on GPU hardware cheaply in the cloud, see this post:

We have trained our neural network on the entire dataset and we can evaluate the performance of the network on the same dataset.

This will only give us an idea of how well we have modeled the dataset (e.g. train accuracy), but no idea of how well the algorithm might perform on new data. We have done this for simplicity, but ideally, you could separate your data into train and test datasets for training and evaluation of your model.

You can evaluate your model on your training dataset using the **evaluate()** function on your model and pass it the same input and output used to train the model.

This will generate a prediction for each input and output pair and collect scores, including the average loss and any metrics you have configured, such as accuracy.

The **evaluate()** function will return a list with two values. The first will be the loss of the model on the dataset and the second will be the accuracy of the model on the dataset. We are only interested in reporting the accuracy, so we will ignore the loss value.

... # evaluate the keras model _, accuracy = model.evaluate(X, y) print('Accuracy: %.2f' % (accuracy*100))

You have just seen how you can easily create your first neural network model in Keras.

Let’s tie it all together into a complete code example.

# first neural network with keras tutorial from numpy import loadtxt from keras.models import Sequential from keras.layers import Dense # load the dataset dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',') # split into input (X) and output (y) variables X = dataset[:,0:8] y = dataset[:,8] # define the keras model model = Sequential() model.add(Dense(12, input_dim=8, activation='relu')) model.add(Dense(8, activation='relu')) model.add(Dense(1, activation='sigmoid')) # compile the keras model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # fit the keras model on the dataset model.fit(X, y, epochs=150, batch_size=10) # evaluate the keras model _, accuracy = model.evaluate(X, y) print('Accuracy: %.2f' % (accuracy*100))

You can copy all of the code into your Python file and save it as “**keras_first_network.py**” in the same directory as your data file “**pima-indians-diabetes.csv**“. You can then run the Python file as a script from your command line (command prompt) as follows:

python keras_first_network.py

Running this example, you should see a message for each of the 150 epochs printing the loss and accuracy, followed by the final evaluation of the trained model on the training dataset.

It takes about 10 seconds to execute on my workstation running on the CPU.

Ideally, we would like the loss to go to zero and accuracy to go to 1.0 (e.g. 100%). This is not possible for any but the most trivial machine learning problems. Instead, we will always have some error in our model. The goal is to choose a model configuration and training configuration that achieve the lowest loss and highest accuracy possible for a given dataset.

... 768/768 [==============================] - 0s 63us/step - loss: 0.4817 - acc: 0.7708 Epoch 147/150 768/768 [==============================] - 0s 63us/step - loss: 0.4764 - acc: 0.7747 Epoch 148/150 768/768 [==============================] - 0s 63us/step - loss: 0.4737 - acc: 0.7682 Epoch 149/150 768/768 [==============================] - 0s 64us/step - loss: 0.4730 - acc: 0.7747 Epoch 150/150 768/768 [==============================] - 0s 63us/step - loss: 0.4754 - acc: 0.7799 768/768 [==============================] - 0s 38us/step Accuracy: 76.56

**Note,** if you try running this example in an IPython or Jupyter notebook you may get an error.

The reason is the output progress bars during training. You can easily turn these off by setting **verbose=0** in the call to the **fit()** and **evaluate()** functions, for example:

... # fit the keras model on the dataset without progress bars model.fit(X, y, epochs=150, batch_size=10, verbose=0) # evaluate the keras model _, accuracy = model.evaluate(X, y, verbose=0) ...

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

**What score did you get?**

Post your results in the comments below.

Neural networks are a stochastic algorithm, meaning that the same algorithm on the same data can train a different model with different skill each time the code is run. This is a feature, not a bug. You can learn more about this in the post:

The variance in the performance of the model means that to get a reasonable approximation of how well your model is performing, you may need to fit it many times and calculate the average of the accuracy scores. For more on this approach to evaluating neural networks, see the post:

For example, below are the accuracy scores from re-running the example 5 times:

Accuracy: 75.00 Accuracy: 77.73 Accuracy: 77.60 Accuracy: 78.12 Accuracy: 76.17

We can see that all accuracy scores are around 77% and the average is 76.924%.

The number one question I get asked is:

After I train my model, how can I use it to make predictions on new data?

Great question.

We can adapt the above example and use it to generate predictions on the training dataset, pretending it is a new dataset we have not seen before.

Making predictions is as easy as calling the **predict()** function on the model. We are using a sigmoid activation function on the output layer, so the predictions will be a probability in the range between 0 and 1. We can easily convert them into a crisp binary prediction for this classification task by rounding them.

For example:

... # make probability predictions with the model predictions = model.predict(X) # round predictions rounded = [round(x[0]) for x in predictions]

Alternately, we can call the **predict_classes()** function on the model to predict crisp classes directly, for example:

... # make class predictions with the model predictions = model.predict_classes(X)

The complete example below makes predictions for each example in the dataset, then prints the input data, predicted class and expected class for the first 5 examples in the dataset.

# first neural network with keras make predictions from numpy import loadtxt from keras.models import Sequential from keras.layers import Dense # load the dataset dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',') # split into input (X) and output (y) variables X = dataset[:,0:8] y = dataset[:,8] # define the keras model model = Sequential() model.add(Dense(12, input_dim=8, activation='relu')) model.add(Dense(8, activation='relu')) model.add(Dense(1, activation='sigmoid')) # compile the keras model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # fit the keras model on the dataset model.fit(X, y, epochs=150, batch_size=10, verbose=0) # make class predictions with the model predictions = model.predict_classes(X) # summarize the first 5 cases for i in range(5): print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))

Running the example does not show the progress bar as before as we have set the verbose argument to 0.

After the model is fit, predictions are made for all examples in the dataset, and the input rows and predicted class value for the first 5 examples is printed and compared to the expected class value.

We can see that most rows are correctly predicted. In fact, we would expect about 76.9% of the rows to be correctly predicted based on our estimated performance of the model in the previous section.

[6.0, 148.0, 72.0, 35.0, 0.0, 33.6, 0.627, 50.0] => 0 (expected 1) [1.0, 85.0, 66.0, 29.0, 0.0, 26.6, 0.351, 31.0] => 0 (expected 0) [8.0, 183.0, 64.0, 0.0, 0.0, 23.3, 0.672, 32.0] => 1 (expected 1) [1.0, 89.0, 66.0, 23.0, 94.0, 28.1, 0.167, 21.0] => 0 (expected 0) [0.0, 137.0, 40.0, 35.0, 168.0, 43.1, 2.288, 33.0] => 1 (expected 1)

If you would like to know more about how to make predictions with Keras models, see the post:

In this post, you discovered how to create your first neural network model using the powerful Keras Python library for deep learning.

Specifically, you learned the six key steps in using Keras to create a neural network or deep learning model, step-by-step including:

- How to load data.
- How to define a neural network in Keras.
- How to compile a Keras model using the efficient numerical backend.
- How to train a model on data.
- How to evaluate a model on data.
- How to make predictions with the model.

Do you have any questions about Keras or about this tutorial?

Ask your question in the comments and I will do my best to answer.

Well done, you have successfully developed your first neural network using the Keras deep learning library in Python.

This section provides some extensions to this tutorial that you might want to explore.

**Tune the Model.**Change the configuration of the model or training process and see if you can improve the performance of the model, e.g. achieve better than 76% accuracy.**Save the Model**. Update the tutorial to save the model to file, then load it later and use it to make predictions (see this tutorial).**Summarize the Model**. Update the tutorial to summarize the model and create a plot of model layers (see this tutorial).**Separate Train and Test Datasets**. Split the loaded dataset into a train and test set (split based on rows) and use one set to train the model and the other set to estimate the performance of the model on new data.**Plot Learning Curves**. The fit() function returns a history object that summarizes the loss and accuracy at the end of each epoch. Create line plots of this data, called learning curves (see this tutorial).**Learn a New Dataset**. Update the tutorial to use a different tabular dataset, perhaps from the UCI Machine Learning Repository.**Use Functional API**. Update the tutorial to use the Keras Functional API for defining the model (see this tutorial).

Are you looking for some more Deep Learning tutorials with Python and Keras?

Take a look at some of these:

- 5 Step Life-Cycle for Neural Network Models in Keras
- Multi-Class Classification Tutorial with the Keras Deep Learning Library
- Regression Tutorial with the Keras Deep Learning Library in Python
- How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras

- Deep Learning (Textbook), 2016.
- Deep Learning with Python (my book).

**How did you go? Do you have any questions about deep learning?**

Post your questions in the comments below and I will do my best to help.

The post Your First Deep Learning Project in Python with Keras Step-By-Step appeared first on Machine Learning Mastery.

]]>