The post How to Calculate Precision, Recall, F1, and More for Deep Learning Models appeared first on MachineLearningMastery.com.

]]>This is critical, as the reported performance allows you to both choose between candidate models and to communicate to stakeholders about how good the model is at solving the problem.

The Keras deep learning API model is very limited in terms of the metrics that you can use to report the model performance.

I am frequently asked questions, such as:

How can I calculate the precision and recall for my model?

And:

How can I calculate the F1-score or confusion matrix for my model?

In this tutorial, you will discover how to calculate metrics to evaluate your deep learning neural network model with a step-by-step example.

After completing this tutorial, you will know:

- How to use the scikit-learn metrics API to evaluate a deep learning model.
- How to make both class and probability predictions with a final model required by the scikit-learn API.
- How to calculate precision, recall, F1-score, ROC AUC, and more with the scikit-learn API for a model.

**Kick-start your project** with my new book Deep Learning With Python, including *step-by-step tutorials* and the *Python source code* files for all examples.

Let’s get started.

**Mar/2019**: First publish**Update Jan/2020**: Updated API for Keras 2.3 and TensorFlow 2.0.

This tutorial is divided into three parts; they are:

- Binary Classification Problem
- Multilayer Perceptron Model
- How to Calculate Model Metrics

We will use a standard binary classification problem as the basis for this tutorial, called the “*two circles*” problem.

It is called the two circles problem because the problem is comprised of points that when plotted, show two concentric circles, one for each class. As such, this is an example of a binary classification problem. The problem has two inputs that can be interpreted as x and y coordinates on a graph. Each point belongs to either the inner or outer circle.

The make_circles() function in the scikit-learn library allows you to generate samples from the two circles problem. The “*n_samples*” argument allows you to specify the number of samples to generate, divided evenly between the two classes. The “*noise*” argument allows you to specify how much random statistical noise is added to the inputs or coordinates of each point, making the classification task more challenging. The “*random_state*” argument specifies the seed for the pseudorandom number generator, ensuring that the same samples are generated each time the code is run.

The example below generates 1,000 samples, with 0.1 statistical noise and a seed of 1.

# generate 2d classification dataset X, y = make_circles(n_samples=1000, noise=0.1, random_state=1)

Once generated, we can create a plot of the dataset to get an idea of how challenging the classification task is.

The example below generates samples and plots them, coloring each point according to the class, where points belonging to class 0 (outer circle) are colored blue and points that belong to class 1 (inner circle) are colored orange.

# Example of generating samples from the two circle problem from sklearn.datasets import make_circles from matplotlib import pyplot from numpy import where # generate 2d classification dataset X, y = make_circles(n_samples=1000, noise=0.1, random_state=1) # scatter plot, dots colored by class value for i in range(2): samples_ix = where(y == i) pyplot.scatter(X[samples_ix, 0], X[samples_ix, 1]) pyplot.show()

Running the example generates the dataset and plots the points on a graph, clearly showing two concentric circles for points belonging to class 0 and class 1.

We will develop a Multilayer Perceptron, or MLP, model to address the binary classification problem.

This model is not optimized for the problem, but it is skillful (better than random).

After the samples for the dataset are generated, we will split them into two equal parts: one for training the model and one for evaluating the trained model.

# split into train and test n_test = 500 trainX, testX = X[:n_test, :], X[n_test:, :] trainy, testy = y[:n_test], y[n_test:]

Next, we can define our MLP model. The model is simple, expecting 2 input variables from the dataset, a single hidden layer with 100 nodes, and a ReLU activation function, then an output layer with a single node and a sigmoid activation function.

The model will predict a value between 0 and 1 that will be interpreted as to whether the input example belongs to class 0 or class 1.

# define model model = Sequential() model.add(Dense(100, input_shape=(2,), activation='relu')) model.add(Dense(1, activation='sigmoid'))

The model will be fit using the binary cross entropy loss function and we will use the efficient Adam version of stochastic gradient descent. The model will also monitor the classification accuracy metric.

# compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

We will fit the model for 300 training epochs with the default batch size of 32 samples and evaluate the performance of the model at the end of each training epoch on the test dataset.

# fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=300, verbose=0)

At the end of training, we will evaluate the final model once more on the train and test datasets and report the classification accuracy.

# evaluate the model _, train_acc = model.evaluate(trainX, trainy, verbose=0) _, test_acc = model.evaluate(testX, testy, verbose=0)

Finally, the performance of the model on the train and test sets recorded during training will be graphed using a line plot, one for each of the loss and the classification accuracy.

# plot loss during training pyplot.subplot(211) pyplot.title('Loss') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() # plot accuracy during training pyplot.subplot(212) pyplot.title('Accuracy') pyplot.plot(history.history['accuracy'], label='train') pyplot.plot(history.history['val_accuracy'], label='test') pyplot.legend() pyplot.show()

Tying all of these elements together, the complete code listing of training and evaluating an MLP on the two circles problem is listed below.

# multilayer perceptron model for the two circles problem from sklearn.datasets import make_circles from keras.models import Sequential from keras.layers import Dense from matplotlib import pyplot # generate dataset X, y = make_circles(n_samples=1000, noise=0.1, random_state=1) # split into train and test n_test = 500 trainX, testX = X[:n_test, :], X[n_test:, :] trainy, testy = y[:n_test], y[n_test:] # define model model = Sequential() model.add(Dense(100, input_shape=(2,), activation='relu')) model.add(Dense(1, activation='sigmoid')) # compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=300, verbose=0) # evaluate the model _, train_acc = model.evaluate(trainX, trainy, verbose=0) _, test_acc = model.evaluate(testX, testy, verbose=0) print('Train: %.3f, Test: %.3f' % (train_acc, test_acc)) # plot loss during training pyplot.subplot(211) pyplot.title('Loss') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() # plot accuracy during training pyplot.subplot(212) pyplot.title('Accuracy') pyplot.plot(history.history['accuracy'], label='train') pyplot.plot(history.history['val_accuracy'], label='test') pyplot.legend() pyplot.show()

Running the example fits the model very quickly on the CPU (no GPU is required).

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

The model is evaluated, reporting the classification accuracy on the train and test sets of about 83% and 85% respectively.

Train: 0.838, Test: 0.850

A figure is created showing two line plots: one for the learning curves of the loss on the train and test sets and one for the classification on the train and test sets.

The plots suggest that the model has a good fit on the problem.

Perhaps you need to evaluate your deep learning neural network model using additional metrics that are not supported by the Keras metrics API.

The Keras metrics API is limited and you may want to calculate metrics such as precision, recall, F1, and more.

One approach to calculating new metrics is to implement them yourself in the Keras API and have Keras calculate them for you during model training and during model evaluation.

For help with this approach, see the tutorial:

This can be technically challenging.

A much simpler alternative is to use your final model to make a prediction for the test dataset, then calculate any metric you wish using the scikit-learn metrics API.

Three metrics, in addition to classification accuracy, that are commonly required for a neural network model on a binary classification problem are:

- Precision
- Recall
- F1 Score

In this section, we will calculate these three metrics, as well as classification accuracy using the scikit-learn metrics API, and we will also calculate three additional metrics that are less common but may be useful. They are:

- Cohen’s Kappa
- ROC AUC
- Confusion Matrix.

This is not a complete list of metrics for classification models supported by scikit-learn; nevertheless, calculating these metrics will show you how to calculate any metrics you may require using the scikit-learn API.

For a full list of supported metrics, see:

The example in this section will calculate metrics for an MLP model, but the same code for calculating metrics can be used for other models, such as RNNs and CNNs.

We can use the same code from the previous sections for preparing the dataset, as well as defining and fitting the model. To make the example simpler, we will put the code for these steps into simple function.

First, we can define a function called *get_data()* that will generate the dataset and split it into train and test sets.

# generate and prepare the dataset def get_data(): # generate dataset X, y = make_circles(n_samples=1000, noise=0.1, random_state=1) # split into train and test n_test = 500 trainX, testX = X[:n_test, :], X[n_test:, :] trainy, testy = y[:n_test], y[n_test:] return trainX, trainy, testX, testy

Next, we will define a function called *get_model()* that will define the MLP model and fit it on the training dataset.

# define and fit the model def get_model(trainX, trainy): # define model model = Sequential() model.add(Dense(100, input_shape=(2,), activation='relu')) model.add(Dense(1, activation='sigmoid')) # compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # fit model model.fit(trainX, trainy, epochs=300, verbose=0) return model

We can then call the *get_data()* function to prepare the dataset and the *get_model()* function to fit and return the model.

# generate data trainX, trainy, testX, testy = get_data() # fit model model = get_model(trainX, trainy)

Now that we have a model fit on the training dataset, we can evaluate it using metrics from the scikit-learn metrics API.

First, we must use the model to make predictions. Most of the metric functions require a comparison between the true class values (e.g. *testy*) and the predicted class values (*yhat_classes*). We can predict the class values directly with our model using the *predict_classes()* function on the model.

Some metrics, like the ROC AUC, require a prediction of class probabilities (*yhat_probs*). These can be retrieved by calling the *predict()* function on the model.

For more help with making predictions using a Keras model, see the post:

We can make the class and probability predictions with the model.

# predict probabilities for test set yhat_probs = model.predict(testX, verbose=0) # predict crisp classes for test set yhat_classes = model.predict_classes(testX, verbose=0)

The predictions are returned in a two-dimensional array, with one row for each example in the test dataset and one column for the prediction.

The scikit-learn metrics API expects a 1D array of actual and predicted values for comparison, therefore, we must reduce the 2D prediction arrays to 1D arrays.

# reduce to 1d array yhat_probs = yhat_probs[:, 0] yhat_classes = yhat_classes[:, 0]

We are now ready to calculate metrics for our deep learning neural network model. We can start by calculating the classification accuracy, precision, recall, and F1 scores.

# accuracy: (tp + tn) / (p + n) accuracy = accuracy_score(testy, yhat_classes) print('Accuracy: %f' % accuracy) # precision tp / (tp + fp) precision = precision_score(testy, yhat_classes) print('Precision: %f' % precision) # recall: tp / (tp + fn) recall = recall_score(testy, yhat_classes) print('Recall: %f' % recall) # f1: 2 tp / (2 tp + fp + fn) f1 = f1_score(testy, yhat_classes) print('F1 score: %f' % f1)

Notice that calculating a metric is as simple as choosing the metric that interests us and calling the function passing in the true class values (*testy*) and the predicted class values (*yhat_classes*).

We can also calculate some additional metrics, such as the Cohen’s kappa, ROC AUC, and confusion matrix.

Notice that the ROC AUC requires the predicted class probabilities (*yhat_probs*) as an argument instead of the predicted classes (*yhat_classes*).

# kappa kappa = cohen_kappa_score(testy, yhat_classes) print('Cohens kappa: %f' % kappa) # ROC AUC auc = roc_auc_score(testy, yhat_probs) print('ROC AUC: %f' % auc) # confusion matrix matrix = confusion_matrix(testy, yhat_classes) print(matrix)

Now that we know how to calculate metrics for a deep learning neural network using the scikit-learn API, we can tie all of these elements together into a complete example, listed below.

# demonstration of calculating metrics for a neural network model using sklearn from sklearn.datasets import make_circles from sklearn.metrics import accuracy_score from sklearn.metrics import precision_score from sklearn.metrics import recall_score from sklearn.metrics import f1_score from sklearn.metrics import cohen_kappa_score from sklearn.metrics import roc_auc_score from sklearn.metrics import confusion_matrix from keras.models import Sequential from keras.layers import Dense # generate and prepare the dataset def get_data(): # generate dataset X, y = make_circles(n_samples=1000, noise=0.1, random_state=1) # split into train and test n_test = 500 trainX, testX = X[:n_test, :], X[n_test:, :] trainy, testy = y[:n_test], y[n_test:] return trainX, trainy, testX, testy # define and fit the model def get_model(trainX, trainy): # define model model = Sequential() model.add(Dense(100, input_shape=(2,), activation='relu')) model.add(Dense(1, activation='sigmoid')) # compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # fit model model.fit(trainX, trainy, epochs=300, verbose=0) return model # generate data trainX, trainy, testX, testy = get_data() # fit model model = get_model(trainX, trainy) # predict probabilities for test set yhat_probs = model.predict(testX, verbose=0) # predict crisp classes for test set yhat_classes = model.predict_classes(testX, verbose=0) # reduce to 1d array yhat_probs = yhat_probs[:, 0] yhat_classes = yhat_classes[:, 0] # accuracy: (tp + tn) / (p + n) accuracy = accuracy_score(testy, yhat_classes) print('Accuracy: %f' % accuracy) # precision tp / (tp + fp) precision = precision_score(testy, yhat_classes) print('Precision: %f' % precision) # recall: tp / (tp + fn) recall = recall_score(testy, yhat_classes) print('Recall: %f' % recall) # f1: 2 tp / (2 tp + fp + fn) f1 = f1_score(testy, yhat_classes) print('F1 score: %f' % f1) # kappa kappa = cohen_kappa_score(testy, yhat_classes) print('Cohens kappa: %f' % kappa) # ROC AUC auc = roc_auc_score(testy, yhat_probs) print('ROC AUC: %f' % auc) # confusion matrix matrix = confusion_matrix(testy, yhat_classes) print(matrix)

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example prepares the dataset, fits the model, then calculates and reports the metrics for the model evaluated on the test dataset.

Accuracy: 0.842000 Precision: 0.836576 Recall: 0.853175 F1 score: 0.844794 Cohens kappa: 0.683929 ROC AUC: 0.923739 [[206 42] [ 37 215]]

If you need help interpreting a given metric, perhaps start with the “Classification Metrics Guide” in the scikit-learn API documentation: Classification Metrics Guide

Also, checkout the Wikipedia page for your metric; for example: Precision and recall, Wikipedia.

This section provides more resources on the topic if you are looking to go deeper.

- How to Use Metrics for Deep Learning With Keras in Python
- How to Generate Test Datasets in Python With scikit-learn
- How to Make Predictions With Keras

- sklearn.metrics: Metrics API
- Classification Metrics Guide
- Keras Metrics API
- sklearn.datasets.make_circles API

- Evaluation of binary classifiers, Wikipedia.
- Confusion Matrix, Wikipedia.
- Precision and recall, Wikipedia.

In this tutorial, you discovered how to calculate metrics to evaluate your deep learning neural network model with a step-by-step example.

Specifically, you learned:

- How to use the scikit-learn metrics API to evaluate a deep learning model.
- How to make both class and probability predictions with a final model required by the scikit-learn API.
- How to calculate precision, recall, F1-score, ROC, AUC, and more with the scikit-learn API for a model.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Calculate Precision, Recall, F1, and More for Deep Learning Models appeared first on MachineLearningMastery.com.

]]>The post How to Make Predictions with Keras appeared first on MachineLearningMastery.com.

]]>There is some confusion amongst beginners about how exactly to do this. I often see questions such as:

How do I make predictions with my model in Keras?

In this tutorial, you will discover exactly how you can make classification and regression predictions with a finalized deep learning model with the Keras Python library.

After completing this tutorial, you will know:

- How to finalize a model in order to make it ready for making predictions.
- How to make class and probability predictions for classification problems in Keras.
- How to make regression predictions in in Keras.

**Kick-start your project** with my new book Deep Learning With Python, including *step-by-step tutorials* and the *Python source code* files for all examples.

Let’s get started.

**Apr/2018**: First publish**Updated Jan/2020**: Updated for changes in scikit-learn v0.22 API.**Updated Aug/2022**: Updated for TensorFlow 2.x syntax

This tutorial is divided into 3 parts; they are:

- Finalize Model
- Classification Predictions
- Regression Predictions

Before you can make predictions, you must train a final model.

You may have trained models using k-fold cross validation or train/test splits of your data. This was done in order to give you an estimate of the skill of the model on out of sample data, e.g. new data.

These models have served their purpose and can now be discarded.

You now must train a final model on all of your available data. You can learn more about how to train a final model here:

Classification problems are those where the model learns a mapping between input features and an output feature that is a label, such as “*spam*” and “*not spam*“.

Below is an example of a finalized neural network model in Keras developed for a simple two-class (binary) classification problem.

If developing a neural network model in Keras is new to you, see this Keras tutorial.

# example of training a final classification model from keras.models import Sequential from keras.layers import Dense from sklearn.datasets import make_blobs from sklearn.preprocessing import MinMaxScaler # generate 2d classification dataset X, y = make_blobs(n_samples=100, centers=2, n_features=2, random_state=1) scalar = MinMaxScaler() scalar.fit(X) X = scalar.transform(X) # define and fit the final model model = Sequential() model.add(Dense(4, input_shape=(2,), activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam') model.fit(X, y, epochs=200, verbose=0)

After finalizing, you may want to save the model to file, e.g. via the Keras API. Once saved, you can load the model any time and use it to make predictions. For an example of this, see the post:

For simplicity, we will skip this step for the examples in this tutorial.

There are two types of classification predictions we may wish to make with our finalized model; they are class predictions and probability predictions.

A class prediction is given the finalized model and one or more data instances, predict the class for the data instances.

We do not know the outcome classes for the new data. That is why we need the model in the first place.

We can predict the class for new data instances using our finalized classification model in Keras using the *predict_classes()* function. Note that this function is only available on *Sequential* models, not those models developed using the functional API.

For example, we have one or more data instances in an array called *Xnew*. This can be passed to the *predict_classes()* function on our model in order to predict the class values for each instance in the array.

Xnew = [[...], [...]] ynew = model.predict_classes(Xnew)

Let’s make this concrete with an example:

# example making new class predictions for a classification problem from keras.models import Sequential from keras.layers import Dense from sklearn.datasets import make_blobs from sklearn.preprocessing import MinMaxScaler # generate 2d classification dataset X, y = make_blobs(n_samples=100, centers=2, n_features=2, random_state=1) scalar = MinMaxScaler() scalar.fit(X) X = scalar.transform(X) # define and fit the final model model = Sequential() model.add(Dense(4, input_shape=(2,), activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam') model.fit(X, y, epochs=500, verbose=0) # new instances where we do not know the answer Xnew, _ = make_blobs(n_samples=3, centers=2, n_features=2, random_state=1) Xnew = scalar.transform(Xnew) # make a prediction ynew = model.predict_classes(Xnew) # show the inputs and predicted outputs for i in range(len(Xnew)): print("X=%s, Predicted=%s" % (Xnew[i], ynew[i]))

Running the example predicts the class for the three new data instances, then prints the data and the predictions together.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

X=[0.89337759 0.65864154], Predicted=[0] X=[0.29097707 0.12978982], Predicted=[1] X=[0.78082614 0.75391697], Predicted=[0]

If you had just one new data instance, you could provide this as an instance wrapped in an array to the

# example making new class prediction for a classification problem from keras.models import Sequential from keras.layers import Dense from sklearn.datasets import make_blobs from sklearn.preprocessing import MinMaxScaler from numpy import array # generate 2d classification dataset X, y = make_blobs(n_samples=100, centers=2, n_features=2, random_state=1) scalar = MinMaxScaler() scalar.fit(X) X = scalar.transform(X) # define and fit the final model model = Sequential() model.add(Dense(4, input_shape=(2,), activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam') model.fit(X, y, epochs=500, verbose=0) # new instance where we do not know the answer Xnew = array([[0.89337759, 0.65864154]]) # make a prediction ynew = model.predict_classes(Xnew) # show the inputs and predicted outputs print("X=%s, Predicted=%s" % (Xnew[0], ynew[0]))

Running the example prints the single instance and the predicted class.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

X=[0.89337759 0.65864154], Predicted=[0]

Note that when you prepared your data, you will have mapped the class values from your domain (such as strings) to integer values. You may have used a *LabelEncoder*.

This *LabelEncoder* can be used to convert the integers back into string values via the *inverse_transform()* function.

For this reason, you may want to save (pickle) the *LabelEncoder* used to encode your *y* values when fitting your final model.

Another type of prediction you may wish to make is the probability of the data instance belonging to each class.

This is called a probability prediction where, given a new instance, the model returns the probability for each outcome class as a value between 0 and 1.

You can make these types of predictions in Keras by calling the *predict_proba()* function; for example:

Xnew = [[...], [...]] ynew = model.predict_proba(Xnew)

In the case of a two-class (binary) classification problem, the sigmoid activation function is often used in the output layer. The predicted probability is taken as the likelihood of the observation belonging to class 1, or inverted (1 – probability) to give the probability for class 0.

In the case of a multi-class classification problem, the softmax activation function is often used on the output layer and the likelihood of the observation for each class is returned as a vector.

The example below makes a probability prediction for each example in the *Xnew* array of data instance.

# example making new probability predictions for a classification problem from keras.models import Sequential from keras.layers import Dense from sklearn.datasets import make_blobs from sklearn.preprocessing import MinMaxScaler # generate 2d classification dataset X, y = make_blobs(n_samples=100, centers=2, n_features=2, random_state=1) scalar = MinMaxScaler() scalar.fit(X) X = scalar.transform(X) # define and fit the final model model = Sequential() model.add(Dense(4, input_shape=(2,), activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam') model.fit(X, y, epochs=500, verbose=0) # new instances where we do not know the answer Xnew, _ = make_blobs(n_samples=3, centers=2, n_features=2, random_state=1) Xnew = scalar.transform(Xnew) # make a prediction ynew = model.predict_proba(Xnew) # show the inputs and predicted outputs for i in range(len(Xnew)): print("X=%s, Predicted=%s" % (Xnew[i], ynew[i]))

Running the instance makes the probability predictions and then prints the input data instance and the probability of each instance belonging to class 1.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

X=[0.89337759 0.65864154], Predicted=[0.0087348] X=[0.29097707 0.12978982], Predicted=[0.82020265] X=[0.78082614 0.75391697], Predicted=[0.00693122]

This can be helpful in your application if you want to present the probabilities to the user for expert interpretation.

Regression is a supervised learning problem where given input examples, the model learns a mapping to suitable output quantities, such as “0.1” and “0.2”, etc.

Below is an example of a finalized Keras model for regression.

# example of training a final regression model from keras.models import Sequential from keras.layers import Dense from sklearn.datasets import make_regression from sklearn.preprocessing import MinMaxScaler # generate regression dataset X, y = make_regression(n_samples=100, n_features=2, noise=0.1, random_state=1) scalarX, scalarY = MinMaxScaler(), MinMaxScaler() scalarX.fit(X) scalarY.fit(y.reshape(100,1)) X = scalarX.transform(X) y = scalarY.transform(y.reshape(100,1)) # define and fit the final model model = Sequential() model.add(Dense(4, input_shape=(2,), activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='linear')) model.compile(loss='mse', optimizer='adam') model.fit(X, y, epochs=1000, verbose=0)

We can predict quantities with the finalized regression model by calling the *predict()* function on the finalized model.

The *predict()* function takes an array of one or more data instances.

The example below demonstrates how to make regression predictions on multiple data instances with an unknown expected outcome.

# example of making predictions for a regression problem from keras.models import Sequential from keras.layers import Dense from sklearn.datasets import make_regression from sklearn.preprocessing import MinMaxScaler # generate regression dataset X, y = make_regression(n_samples=100, n_features=2, noise=0.1, random_state=1) scalarX, scalarY = MinMaxScaler(), MinMaxScaler() scalarX.fit(X) scalarY.fit(y.reshape(100,1)) X = scalarX.transform(X) y = scalarY.transform(y.reshape(100,1)) # define and fit the final model model = Sequential() model.add(Dense(4, input_shape=(2,), activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='linear')) model.compile(loss='mse', optimizer='adam') model.fit(X, y, epochs=1000, verbose=0) # new instances where we do not know the answer Xnew, a = make_regression(n_samples=3, n_features=2, noise=0.1, random_state=1) Xnew = scalarX.transform(Xnew) # make a prediction ynew = model.predict(Xnew) # show the inputs and predicted outputs for i in range(len(Xnew)): print("X=%s, Predicted=%s" % (Xnew[i], ynew[i]))

Running the example makes multiple predictions, then prints the inputs and predictions side by side for review.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

X=[0.29466096 0.30317302], Predicted=[0.17097184] X=[0.39445118 0.79390858], Predicted=[0.7475489] X=[0.02884127 0.6208843 ], Predicted=[0.43370453]

The same function can be used to make a prediction for a single data instance, as long as it is suitably wrapped in a surrounding list or array.

For example:

# example of making predictions for a regression problem from keras.models import Sequential from keras.layers import Dense from sklearn.datasets import make_regression from sklearn.preprocessing import MinMaxScaler from numpy import array # generate regression dataset X, y = make_regression(n_samples=100, n_features=2, noise=0.1, random_state=1) scalarX, scalarY = MinMaxScaler(), MinMaxScaler() scalarX.fit(X) scalarY.fit(y.reshape(100,1)) X = scalarX.transform(X) y = scalarY.transform(y.reshape(100,1)) # define and fit the final model model = Sequential() model.add(Dense(4, input_shape=(2,), activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='linear')) model.compile(loss='mse', optimizer='adam') model.fit(X, y, epochs=1000, verbose=0) # new instance where we do not know the answer Xnew = array([[0.29466096, 0.30317302]]) # make a prediction ynew = model.predict(Xnew) # show the inputs and predicted outputs print("X=%s, Predicted=%s" % (Xnew[0], ynew[0]))

Running the example makes a single prediction and prints the data instance and prediction for review.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

X=[0.29466096 0.30317302], Predicted=[0.17333156]

This section provides more resources on the topic if you are looking to go deeper.

- How to Train a Final Machine Learning Model
- Save and Load Your Keras Deep Learning Models
- Develop Your First Neural Network in Python With Keras Step-By-Step
- The 5 Step Life-Cycle for Long Short-Term Memory Models in Keras
- How to Make Predictions with Long Short-Term Memory Models in Keras

In this tutorial, you discovered how you can make classification and regression predictions with a finalized deep learning model with the Keras Python library.

Specifically, you learned:

- How to finalize a model in order to make it ready for making predictions.
- How to make class and probability predictions for classification problems in Keras.
- How to make regression predictions in in Keras.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Make Predictions with Keras appeared first on MachineLearningMastery.com.

]]>The post Why Initialize a Neural Network with Random Weights? appeared first on MachineLearningMastery.com.

]]>This is because this is an expectation of the stochastic optimization algorithm used to train the model, called stochastic gradient descent.

To understand this approach to problem solving, you must first understand the role of nondeterministic and randomized algorithms as well as the need for stochastic optimization algorithms to harness randomness in their search process.

In this post, you will discover the full background as to why neural network weights must be randomly initialized.

After reading this post, you will know:

- About the need for nondeterministic and randomized algorithms for challenging problems.
- The use of randomness during initialization and search in stochastic optimization algorithms.
- That stochastic gradient descent is a stochastic optimization algorithm and requires the random initialization of network weights.

**Kick-start your project** with my new book Deep Learning With Python, including *step-by-step tutorials* and the *Python source code* files for all examples.

Let’s get started.

This post is divided into 4 parts; they are:

- Deterministic and Non-Deterministic Algorithms
- Stochastic Search Algorithms
- Random Initialization in Neural Networks
- Initialization Methods

Classical algorithms are deterministic.

An example is an algorithm to sort a list.

Given an unsorted list, the sorting algorithm, say bubble sort or quick sort, will systematically sort the list until you have an ordered result. Deterministic means that each time the algorithm is given the same list, it will execute in exactly the same way. It will make the same moves at each step of the procedure.

Deterministic algorithms are great as they can make guarantees about best, worst, and average running time. The problem is, they are not suitable for all problems.

Some problems are hard for computers. Perhaps because of the number of combinations; perhaps because of the size of data. They are so hard because a deterministic algorithm cannot be used to solve them efficiently. The algorithm may run, but will continue running until the heat death of the universe.

An alternate solution is to use nondeterministic algorithms. These are algorithms that use elements of randomness when making decisions during the execution of the algorithm. This means that a different order of steps will be followed when the same algorithm is rerun on the same data.

They can rapidly speed up the process of getting a solution, but the solution will be approximate, or “*good*,” but often not the “*best*.” Nondeterministic algorithms often cannot make strong guarantees about running time or the quality of the solution found.

This is often fine as the problems are so hard that any good solution will often be satisfactory.

Search problems are often very challenging and require the use of nondeterministic algorithms that make heavy use of randomness.

The algorithms are not random per se; instead they make careful use of randomness. They are random within a bound and are referred to as stochastic algorithms.

The incremental, or step-wise, nature of the search often means the process and the algorithms are referred to as an optimization from an initial state or position to a final state or position. For example, stochastic optimization problem or a stochastic optimization algorithm.

Some examples include the genetic algorithm, simulated annealing, and stochastic gradient descent.

The search process is incremental from a starting point in the space of possible solutions toward some good enough solution.

They share common features in their use of randomness, such as:

- Use of randomness during initialization.
- Use of randomness during the progression of the search.

We know nothing about the structure of the search space. Therefore, to remove bias from the search process, we start from a randomly chosen position.

As the search process unfolds, there is a risk that we are stuck in an unfavorable area of the search space. Using randomness during the search process gives some likelihood of getting unstuck and finding a better final candidate solution.

The idea of getting stuck and returning a less-good solution is referred to as getting stuck in a local optima.

These two elements of random initialization and randomness during the search work together.

They work together better if we consider any solution found by the search as provisional, or a candidate, and that the search process can be performed multiple times.

This gives the stochastic search process multiple opportunities to start and traverse the space of candidate solutions in search of a better candidate solution–a so-called global optima.

The navigation of the space of candidate solutions is often described using the analogy of a one- or two-landscape of mountains and valleys (e.g. like a fitness landscape). If we are maximizing a score during the search, we can think of small hills in the landscape as a local optima and the largest hills as the global optima.

This is a fascinating area of research, an area where I have some background. For example, see my book:

Artificial neural networks are trained using a stochastic optimization algorithm called stochastic gradient descent.

The algorithm uses randomness in order to find a good enough set of weights for the specific mapping function from inputs to outputs in your data that is being learned. It means that your specific network on your specific training data will fit a different network with a different model skill each time the training algorithm is run.

This is a feature, not a bug.

I write about this issue more in the post:

As described in the previous section, stochastic optimization algorithms such as stochastic gradient descent use randomness in selecting a starting point for the search and in the progression of the search.

Specifically, stochastic gradient descent requires that the weights of the network are initialized to small random values (random, but close to zero, such as in [0.0, 0.1]). Randomness is also used during the search process in the shuffling of the training dataset prior to each epoch, which in turn results in differences in the gradient estimate for each batch.

You can learn more about stochastic gradient descent in this post:

The progression of the search or learning of a neural network is referred to as convergence. The discovering of a sub-optimal solution or local optima is referred to as premature convergence.

Training algorithms for deep learning models are usually iterative in nature and thus require the user to specify some initial point from which to begin the iterations. Moreover, training deep models is a sufficiently difficult task that most algorithms are strongly affected by the choice of initialization.

— Page 301, Deep Learning, 2016.

The most effective way to evaluate the skill of a neural network configuration is to repeat the search process multiple times and report the average performance of the model over those repeats. This gives the configuration the best chance to search the space from multiple different sets of initial conditions. Sometimes this is called a multiple restart or multiple-restart search.

You can learn more about the effective evaluation of neural networks in this post:

We can use the same set of weights each time we train the network; for example, you could use the values of 0.0 for all weights.

In this case, the equations of the learning algorithm would fail to make any changes to the network weights, and the model will be stuck. It is important to note that the bias weight in each neuron is set to zero by default, not a small random value.

Specifically, nodes that are side-by-side in a hidden layer connected to the same inputs must have different weights for the learning algorithm to update the weights.

This is often referred to as the need to break symmetry during training.

Perhaps the only property known with complete certainty is that the initial parameters need to “break symmetry” between different units. If two hidden units with the same activation function are connected to the same inputs, then these units must have different initial parameters. If they have the same initial parameters, then a deterministic learning algorithm applied to a deterministic cost and model will constantly update both of these units in the same way.

— Page 301, Deep Learning, 2016.

We could use the same set of random numbers each time the network is trained.

This would not be helpful when evaluating network configurations.

It may be helpful in order to train the same final set of network weights given a training dataset in the case where a model is being used in a production environment.

You can learn more about fixing the random seed for neural networks developed with Keras in this post:

Traditionally, the weights of a neural network were set to small random numbers.

The initialization of the weights of neural networks is a whole field of study as the careful initialization of the network can speed up the learning process.

Modern deep learning libraries, such as Keras, offer a host of network initialization methods, all are variations of initializing the weights with small random numbers.

For example, the current methods are available in Keras at the time of writing for all network types:

**Zeros**: Initializer that generates tensors initialized to 0.**Ones**: Initializer that generates tensors initialized to 1.**Constant**: Initializer that generates tensors initialized to a constant value.**RandomNormal**: Initializer that generates tensors with a normal distribution.**RandomUniform**: Initializer that generates tensors with a uniform distribution.**TruncatedNormal**: Initializer that generates a truncated normal distribution.**VarianceScaling**: Initializer capable of adapting its scale to the shape of weights.**Orthogonal**: Initializer that generates a random orthogonal matrix.**Identity**: Initializer that generates the identity matrix.**lecun_uniform**: LeCun uniform initializer.**glorot_normal**: Glorot normal initializer, also called Xavier normal initializer.**glorot_uniform**: Glorot uniform initializer, also called Xavier uniform initializer.**he_normal**: He normal initializer.**lecun_normal**: LeCun normal initializer.**he_uniform**: He uniform variance scaling initializer.

See the documentation for more details.

Out of interest, the default initializers chosen by Keras developers for different layer types are as follows:

**Dense**(e.g. MLP):*glorot_uniform***LSTM**:*glorot_uniform***CNN**:*glorot_uniform*

You can learn more about “*glorot_uniform*“, also called “*Xavier uniform*“, named for the developer of the method Xavier Glorot, in the paper:

There is no single best way to initialize the weights of a neural network.

Modern initialization strategies are simple and heuristic. Designing improved initialization strategies is a difficult task because neural network optimization is not yet well understood. […] Our understanding of how the initial point affects generalization is especially primitive, offering little to no guidance for how to select the initial point.

— Page 301, Deep Learning, 2016.

It is one more hyperparameter for you to explore and test and experiment with on your specific predictive modeling problem.

Do you have a favorite method for weight initialization?

Let me know in the comments below.

This section provides more resources on the topic if you are looking to go deeper.

- Deep Learning, 2016.

- Nondeterministic algorithm on Wikipedia
- Randomized algorithm on Wikipedia
- Stochastic optimization on Wikipedia
- Stochastic gradient descent on Wikipedia
- Fitness landscape on Wikipedia
- Neural Network FAQ
- Keras Weight Initialization
- Understanding the difficulty of training deep feedforward neural networks, 2010.

- What are good initial weights in a neural network?
- Why should weights of Neural Networks be initialized to random numbers?
- What are good initial weights in a neural network?

In this post, you discovered why neural network weights must be randomly initialized.

Specifically, you learned:

- About the need for nondeterministic and randomized algorithms for challenging problems.
- The use of randomness during initialization and search in stochastic optimization algorithms.
- That stochastic gradient descent is a stochastic optimization algorithm and requires the random initialization of network weights.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post Why Initialize a Neural Network with Random Weights? appeared first on MachineLearningMastery.com.

]]>The post When to Use MLP, CNN, and RNN Neural Networks appeared first on MachineLearningMastery.com.

]]>It can be difficult for a beginner to the field of deep learning to know what type of network to use. There are so many types of networks to choose from and new methods being published and discussed every day.

To make things worse, most neural networks are flexible enough that they work (make a prediction) even when used with the wrong type of data or prediction problem.

In this post, you will discover the suggested use for the three main classes of artificial neural networks.

After reading this post, you will know:

- Which types of neural networks to focus on when working on a predictive modeling problem.
- When to use, not use, and possible try using an MLP, CNN, and RNN on a project.
- To consider the use of hybrid models and to have a clear idea of your project goals before selecting a model.

**Kick-start your project** with my new book Deep Learning With Python, including *step-by-step tutorials* and the *Python source code* files for all examples.

Let’s get started.

This post is divided into five sections; they are:

- What Neural Networks to Focus on?
- When to Use Multilayer Perceptrons?
- When to Use Convolutional Neural Networks?
- When to Use Recurrent Neural Networks?
- Hybrid Network Models

Deep learning is the application of artificial neural networks using modern hardware.

It allows the development, training, and use of neural networks that are much larger (more layers) than was previously thought possible.

There are thousands of types of specific neural networks proposed by researchers as modifications or tweaks to existing models. Sometimes wholly new approaches.

As a practitioner, I recommend waiting until a model emerges as generally applicable. It is hard to tease out the signal of what works well generally from the noise of the vast number of publications released daily or weekly.

There are three classes of artificial neural networks that I recommend that you focus on in general. They are:

- Multilayer Perceptrons (MLPs)
- Convolutional Neural Networks (CNNs)
- Recurrent Neural Networks (RNNs)

These three classes of networks provide a lot of flexibility and have proven themselves over decades to be useful and reliable in a wide range of problems. They also have many subtypes to help specialize them to the quirks of different framings of prediction problems and different datasets.

Now that we know what networks to focus on, let’s look at when we can use each class of neural network.

Multilayer Perceptrons, or MLPs for short, are the classical type of neural network.

They are comprised of one or more layers of neurons. Data is fed to the input layer, there may be one or more hidden layers providing levels of abstraction, and predictions are made on the output layer, also called the visible layer.

For more details on the MLP, see the post:

MLPs are suitable for classification prediction problems where inputs are assigned a class or label.

They are also suitable for regression prediction problems where a real-valued quantity is predicted given a set of inputs. Data is often provided in a tabular format, such as you would see in a CSV file or a spreadsheet.

**Use MLPs For:**

- Tabular datasets
- Classification prediction problems
- Regression prediction problems

They are very flexible and can be used generally to learn a mapping from inputs to outputs.

This flexibility allows them to be applied to other types of data. For example, the pixels of an image can be reduced down to one long row of data and fed into a MLP. The words of a document can also be reduced to one long row of data and fed to a MLP. Even the lag observations for a time series prediction problem can be reduced to a long row of data and fed to a MLP.

As such, if your data is in a form other than a tabular dataset, such as an image, document, or time series, I would recommend at least testing an MLP on your problem. The results can be used as a baseline point of comparison to confirm that other models that may appear better suited add value.

**Try MLPs On:**

- Image data
- Text Data
- Time series data
- Other types of data

Convolutional Neural Networks, or CNNs, were designed to map image data to an output variable.

They have proven so effective that they are the go-to method for any type of prediction problem involving image data as an input.

For more details on CNNs, see the post:

The benefit of using CNNs is their ability to develop an internal representation of a two-dimensional image. This allows the model to learn position and scale in variant structures in the data, which is important when working with images.

**Use CNNs For:**

- Image data
- Classification prediction problems
- Regression prediction problems

More generally, CNNs work well with data that has a spatial relationship.

The CNN input is traditionally two-dimensional, a field or matrix, but can also be changed to be one-dimensional, allowing it to develop an internal representation of a one-dimensional sequence.

This allows the CNN to be used more generally on other types of data that has a spatial relationship. For example, there is an order relationship between words in a document of text. There is an ordered relationship in the time steps of a time series.

Although not specifically developed for non-image data, CNNs achieve state-of-the-art results on problems such as document classification used in sentiment analysis and related problems.

**Try CNNs On:**

- Text data
- Time series data
- Sequence input data

Recurrent Neural Networks, or RNNs, were designed to work with sequence prediction problems.

Sequence prediction problems come in many forms and are best described by the types of inputs and outputs supported.

Some examples of sequence prediction problems include:

**One-to-Many**: An observation as input mapped to a sequence with multiple steps as an output.**Many-to-One**: A sequence of multiple steps as input mapped to class or quantity prediction.**Many-to-Many**: A sequence of multiple steps as input mapped to a sequence with multiple steps as output.

The Many-to-Many problem is often referred to as sequence-to-sequence, or seq2seq for short.

For more details on the types of sequence prediction problems, see the post:

Recurrent neural networks were traditionally difficult to train.

The Long Short-Term Memory, or LSTM, network is perhaps the most successful RNN because it overcomes the problems of training a recurrent network and in turn has been used on a wide range of applications.

For more details on RNNs, see the post:

RNNs in general and LSTMs in particular have received the most success when working with sequences of words and paragraphs, generally called natural language processing.

This includes both sequences of text and sequences of spoken language represented as a time series. They are also used as generative models that require a sequence output, not only with text, but on applications such as generating handwriting.

**Use RNNs For:**

- Text data
- Speech data
- Classification prediction problems
- Regression prediction problems
- Generative models

Recurrent neural networks are not appropriate for tabular datasets as you would see in a CSV file or spreadsheet. They are also not appropriate for image data input.

**Don’t Use RNNs For:**

- Tabular data
- Image data

RNNs and LSTMs have been tested on time series forecasting problems, but the results have been poor, to say the least. Autoregression methods, even linear methods often perform much better. LSTMs are often outperformed by simple MLPs applied on the same data.

For more on this topic, see the post:

Nevertheless, it remains an active area.

**Perhaps Try RNNs on:**

- Time series data

A CNN or RNN model is rarely used alone.

These types of networks are used as layers in a broader model that also has one or more MLP layers. Technically, these are a hybrid type of neural network architecture.

Perhaps the most interesting work comes from the mixing of the different types of networks together into hybrid models.

For example, consider a model that uses a stack of layers with a CNN on the input, LSTM in the middle, and MLP at the output. A model like this can read a sequence of image inputs, such as a video, and generate a prediction. This is called a CNN LSTM architecture.

The network types can also be stacked in specific architectures to unlock new capabilities, such as the reusable image recognition models that use very deep CNN and MLP networks that can be added to a new LSTM model and used for captioning photos. Also, the encoder-decoder LSTM networks that can be used to have input and output sequences of differing lengths.

It is important to think clearly about what you and your stakeholders require from the project first, then seek out a network architecture (or develop one) that meets your specific project needs.

For a good framework to help you think about your data and prediction problems, see the post:

This section provides more resources on the topic if you are looking to go deeper.

- What Is Deep Learning?
- Crash Course On Multi-Layer Perceptron Neural Networks
- Crash Course in Convolutional Neural Networks for Machine Learning
- Crash Course in Recurrent Neural Networks for Deep Learning
- Gentle Introduction to Models for Sequence Prediction with Recurrent Neural Networks
- How to Define Your Machine Learning Problem

In this post, you discovered the suggested use for the three main classes of artificial neural networks.

Specifically, you learned:

- Which types of neural networks to focus on when working on a predictive modeling problem.
- When to use, not use, and possible try using an MLP, CNN, and RNN on a project.
- To consider the use of hybrid models and to have a clear idea of your project goals before selecting a model.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post When to Use MLP, CNN, and RNN Neural Networks appeared first on MachineLearningMastery.com.

]]>The post Difference Between a Batch and an Epoch in a Neural Network appeared first on MachineLearningMastery.com.

]]>Two hyperparameters that often confuse beginners are the batch size and number of epochs. They are both integer values and seem to do the same thing.

In this post, you will discover the difference between batches and epochs in stochastic gradient descent.

After reading this post, you will know:

- Stochastic gradient descent is an iterative learning algorithm that uses a training dataset to update a model.
- The batch size is a hyperparameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated.
- The number of epochs is a hyperparameter of gradient descent that controls the number of complete passes through the training dataset.

**Kick-start your project** with my new book Deep Learning With Python, including *step-by-step tutorials* and the *Python source code* files for all examples.

Let’s get started.

This post is divided into five parts; they are:

- Stochastic Gradient Descent
- What Is a Sample?
- What Is a Batch?
- What Is an Epoch?
- What Is the Difference Between Batch and Epoch?

Stochastic Gradient Descent, or SGD for short, is an optimization algorithm used to train machine learning algorithms, most notably artificial neural networks used in deep learning.

The job of the algorithm is to find a set of internal model parameters that perform well against some performance measure such as logarithmic loss or mean squared error.

Optimization is a type of searching process and you can think of this search as learning. The optimization algorithm is called “*gradient descent*“, where “*gradient*” refers to the calculation of an error gradient or slope of error and “descent” refers to the moving down along that slope towards some minimum level of error.

The algorithm is iterative. This means that the search process occurs over multiple discrete steps, each step hopefully slightly improving the model parameters.

Each step involves using the model with the current set of internal parameters to make predictions on some samples, comparing the predictions to the real expected outcomes, calculating the error, and using the error to update the internal model parameters.

This update procedure is different for different algorithms, but in the case of artificial neural networks, the backpropagation update algorithm is used.

Before we dive into batches and epochs, let’s take a look at what we mean by sample.

Learn more about gradient descent here:

A sample is a single row of data.

It contains inputs that are fed into the algorithm and an output that is used to compare to the prediction and calculate an error.

A training dataset is comprised of many rows of data, e.g. many samples. A sample may also be called an instance, an observation, an input vector, or a feature vector.

Now that we know what a sample is, let’s define a batch.

The batch size is a hyperparameter that defines the number of samples to work through before updating the internal model parameters.

Think of a batch as a for-loop iterating over one or more samples and making predictions. At the end of the batch, the predictions are compared to the expected output variables and an error is calculated. From this error, the update algorithm is used to improve the model, e.g. move down along the error gradient.

A training dataset can be divided into one or more batches.

When all training samples are used to create one batch, the learning algorithm is called batch gradient descent. When the batch is the size of one sample, the learning algorithm is called stochastic gradient descent. When the batch size is more than one sample and less than the size of the training dataset, the learning algorithm is called mini-batch gradient descent.

**Batch Gradient Descent**. Batch Size = Size of Training Set**Stochastic Gradient Descent**. Batch Size = 1**Mini-Batch Gradient Descent**. 1 < Batch Size < Size of Training Set

In the case of mini-batch gradient descent, popular batch sizes include 32, 64, and 128 samples. You may see these values used in models in the literature and in tutorials.

**What if the dataset does not divide evenly by the batch size?**

This can and does happen often when training a model. It simply means that the final batch has fewer samples than the other batches.

Alternately, you can remove some samples from the dataset or change the batch size such that the number of samples in the dataset does divide evenly by the batch size.

For more on the differences between these variations of gradient descent, see the post:

For more on the effect of batch size on the learning process, see the post:

A batch involves an update to the model using samples; next, let’s look at an epoch.

The number of epochs is a hyperparameter that defines the number times that the learning algorithm will work through the entire training dataset.

One epoch means that each sample in the training dataset has had an opportunity to update the internal model parameters. An epoch is comprised of one or more batches. For example, as above, an epoch that has one batch is called the batch gradient descent learning algorithm.

You can think of a for-loop over the number of epochs where each loop proceeds over the training dataset. Within this for-loop is another nested for-loop that iterates over each batch of samples, where one batch has the specified “batch size” number of samples.

The number of epochs is traditionally large, often hundreds or thousands, allowing the learning algorithm to run until the error from the model has been sufficiently minimized. You may see examples of the number of epochs in the literature and in tutorials set to 10, 100, 500, 1000, and larger.

It is common to create line plots that show epochs along the x-axis as time and the error or skill of the model on the y-axis. These plots are sometimes called learning curves. These plots can help to diagnose whether the model has over learned, under learned, or is suitably fit to the training dataset.

For more on diagnostics via learning curves with LSTM networks, see the post:

In case it is still not clear, let’s look at the differences between batches and epochs.

The batch size is a number of samples processed before the model is updated.

The number of epochs is the number of complete passes through the training dataset.

The size of a batch must be more than or equal to one and less than or equal to the number of samples in the training dataset.

The number of epochs can be set to an integer value between one and infinity. You can run the algorithm for as long as you like and even stop it using other criteria besides a fixed number of epochs, such as a change (or lack of change) in model error over time.

They are both integer values and they are both hyperparameters for the learning algorithm, e.g. parameters for the learning process, not internal model parameters found by the learning process.

You must specify the batch size and number of epochs for a learning algorithm.

There are no magic rules for how to configure these parameters. You must try different values and see what works best for your problem.

Finally, let’s make this concrete with a small example.

Assume you have a dataset with 200 samples (rows of data) and you choose a batch size of 5 and 1,000 epochs.

This means that the dataset will be divided into 40 batches, each with five samples. The model weights will be updated after each batch of five samples.

This also means that one epoch will involve 40 batches or 40 updates to the model.

With 1,000 epochs, the model will be exposed to or pass through the whole dataset 1,000 times. That is a total of 40,000 batches during the entire training process.

This section provides more resources on the topic if you are looking to go deeper.

- Gradient Descent For Machine Learning
- How to Control the Speed and Stability of Training Neural Networks Batch Size
- A Gentle Introduction to Mini-Batch Gradient Descent and How to Configure Batch Size
- A Gentle Introduction to Learning Curves for Diagnosing Model Performance
- Stochastic gradient descent on Wikipedia
- Backpropagation on Wikipedia

In this post, you discovered the difference between batches and epochs in stochastic gradient descent.

Specifically, you learned:

- Stochastic gradient descent is an iterative learning algorithm that uses a training dataset to update a model.
- The batch size is a hyperparameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated.
- The number of epochs is a hyperparameter of gradient descent that controls the number of complete passes through the training dataset.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post Difference Between a Batch and an Epoch in a Neural Network appeared first on MachineLearningMastery.com.

]]>The post Image Augmentation with Keras Preprocessing Layers and tf.image appeared first on MachineLearningMastery.com.

]]>There are many ways for image augmentation. You may use some external libraries or write your own functions for that. There are some modules in TensorFlow and Keras for augmentation too.

In this post, you will discover how you can use the Keras preprocessing layer as well as the `tf.image`

module in TensorFlow for image augmentation.

After reading this post, you will know:

- What are the Keras preprocessing layers, and how to use them
- What are the functions provided by the
`tf.image`

module for image augmentation - How to use augmentation together with the
`tf.data`

dataset

Let’s get started.

This article is divided into five sections; they are:

- Getting Images
- Visualizing the Images
- Keras Preprocessing Layers
- Using tf.image API for Augmentation
- Using Preprocessing Layers in Neural Networks

Before you see how you can do augmentation, you need to get the images. Ultimately, you need the images to be represented as arrays, for example, in HxWx3 in 8-bit integers for the RGB pixel value. There are many ways to get the images. Some can be downloaded as a ZIP file. If you’re using TensorFlow, you may get some image datasets from the `tensorflow_datasets`

library.

In this tutorial, you will use the citrus leaves images, which is a small dataset of less than 100MB. It can be downloaded from `tensorflow_datasets`

as follows:

import tensorflow_datasets as tfds ds, meta = tfds.load('citrus_leaves', with_info=True, split='train', shuffle_files=True)

Running this code the first time will download the image dataset into your computer with the following output:

Downloading and preparing dataset 63.87 MiB (download: 63.87 MiB, generated: 37.89 MiB, total: 101.76 MiB) to ~/tensorflow_datasets/citrus_leaves/0.1.2... Extraction completed...: 100%|██████████████████████████████| 1/1 [00:06<00:00, 6.54s/ file] Dl Size...: 100%|██████████████████████████████████████████| 63/63 [00:06<00:00, 9.63 MiB/s] Dl Completed...: 100%|███████████████████████████████████████| 1/1 [00:06<00:00, 6.54s/ url] Dataset citrus_leaves downloaded and prepared to ~/tensorflow_datasets/citrus_leaves/0.1.2. Subsequent calls will reuse this data.

The function above returns the images as a `tf.data`

dataset object and the metadata. This is a classification dataset. You can print the training labels with the following:

... for i in range(meta.features['label'].num_classes): print(meta.features['label'].int2str(i))

This prints:

Black spot canker greening healthy

If you run this code again at a later time, you will reuse the downloaded image. But the other way to load the downloaded images into a `tf.data`

dataset is to use the `image_dataset_from_directory()`

function.

As you can see from the screen output above, the dataset is downloaded into the directory `~/tensorflow_datasets`

. If you look at the directory, you see the directory structure as follows:

.../Citrus/Leaves ├── Black spot ├── Melanose ├── canker ├── greening └── healthy

The directories are the labels, and the images are files stored under their corresponding directory. You can let the function to read the directory recursively into a dataset:

import tensorflow as tf from tensorflow.keras.utils import image_dataset_from_directory # set to fixed image size 256x256 PATH = ".../Citrus/Leaves" ds = image_dataset_from_directory(PATH, validation_split=0.2, subset="training", image_size=(256,256), interpolation="bilinear", crop_to_aspect_ratio=True, seed=42, shuffle=True, batch_size=32)

You may want to set `batch_size=None`

if you do not want the dataset to be batched. Usually, you want the dataset to be batched for training a neural network model.

It is important to visualize the augmentation result, so you can verify the augmentation result is what we want it to be. You can use matplotlib for this.

In matplotlib, you have the `imshow()`

function to display an image. However, for the image to be displayed correctly, the image should be presented as an array of 8-bit unsigned integers (uint8).

Given that you have a dataset created using `image_dataset_from_directory()`

You can get the first batch (of 32 images) and display a few of them using `imshow()`

, as follows:

... import matplotlib.pyplot as plt fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(5,5)) for images, labels in ds.take(1): for i in range(3): for j in range(3): ax[i][j].imshow(images[i*3+j].numpy().astype("uint8")) ax[i][j].set_title(ds.class_names[labels[i*3+j]]) plt.show()

Here, you see a display of nine images in a grid, labeled with their corresponding classification label, using `ds.class_names`

. The images should be converted to NumPy array in uint8 for display. This code displays an image like the following:

The complete code from loading the image to display is as follows:

from tensorflow.keras.utils import image_dataset_from_directory import matplotlib.pyplot as plt # use image_dataset_from_directory() to load images, with image size scaled to 256x256 PATH='.../Citrus/Leaves' # modify to your path ds = image_dataset_from_directory(PATH, validation_split=0.2, subset="training", image_size=(256,256), interpolation="mitchellcubic", crop_to_aspect_ratio=True, seed=42, shuffle=True, batch_size=32) # Take one batch from dataset and display the images fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(5,5)) for images, labels in ds.take(1): for i in range(3): for j in range(3): ax[i][j].imshow(images[i*3+j].numpy().astype("uint8")) ax[i][j].set_title(ds.class_names[labels[i*3+j]]) plt.show()

Note that if you’re using `tensorflow_datasets`

to get the image, the samples are presented as a dictionary instead of a tuple of (image,label). You should change your code slightly to the following:

import tensorflow_datasets as tfds import matplotlib.pyplot as plt # use tfds.load() or image_dataset_from_directory() to load images ds, meta = tfds.load('citrus_leaves', with_info=True, split='train', shuffle_files=True) ds = ds.batch(32) # Take one batch from dataset and display the images fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(5,5)) for sample in ds.take(1): images, labels = sample["image"], sample["label"] for i in range(3): for j in range(3): ax[i][j].imshow(images[i*3+j].numpy().astype("uint8")) ax[i][j].set_title(meta.features['label'].int2str(labels[i*3+j])) plt.show()

For the rest of this post, assume the dataset is created using `image_dataset_from_directory()`

. You may need to tweak the code slightly if your dataset is created differently.

Keras comes with many neural network layers, such as convolution layers, that you need to train. There are also layers with no parameters to train, such as flatten layers to convert an array like an image into a vector.

The preprocessing layers in Keras are specifically designed to use in the early stages of a neural network. You can use them for image preprocessing, such as to resize or rotate the image or adjust the brightness and contrast. While the preprocessing layers are supposed to be part of a larger neural network, you can also use them as functions. Below is how you can use the resizing layer as a function to transform some images and display them side-by-side with the original:

... # create a resizing layer out_height, out_width = 128,256 resize = tf.keras.layers.Resizing(out_height, out_width) # show original vs resized fig, ax = plt.subplots(2, 3, figsize=(6,4)) for images, labels in ds.take(1): for i in range(3): ax[0][i].imshow(images[i].numpy().astype("uint8")) ax[0][i].set_title("original") # resize ax[1][i].imshow(resize(images[i]).numpy().astype("uint8")) ax[1][i].set_title("resize") plt.show()

The images are in 256×256 pixels, and the resizing layer will make them into 256×128 pixels. The output of the above code is as follows:

Since the resizing layer is a function, you can chain them to the dataset itself. For example,

... def augment(image, label): return resize(image), label resized_ds = ds.map(augment) for image, label in resized_ds: ...

The dataset `ds`

has samples in the form of `(image, label)`

. Hence you created a function that takes in such tuple and preprocesses the image with the resizing layer. You then assigned this function as an argument for the `map()`

in the dataset. When you draw a sample from the new dataset created with the `map()`

function, the image will be a transformed one.

There are more preprocessing layers available. Some are demonstrated below.

As you saw above, you can resize the image. You can also randomly enlarge or shrink the height or width of an image. Similarly, you can zoom in or zoom out on an image. Below is an example of manipulating the image size in various ways for a maximum of 30% increase or decrease:

... # Create preprocessing layers out_height, out_width = 128,256 resize = tf.keras.layers.Resizing(out_height, out_width) height = tf.keras.layers.RandomHeight(0.3) width = tf.keras.layers.RandomWidth(0.3) zoom = tf.keras.layers.RandomZoom(0.3) # Visualize images and augmentations fig, ax = plt.subplots(5, 3, figsize=(6,14)) for images, labels in ds.take(1): for i in range(3): ax[0][i].imshow(images[i].numpy().astype("uint8")) ax[0][i].set_title("original") # resize ax[1][i].imshow(resize(images[i]).numpy().astype("uint8")) ax[1][i].set_title("resize") # height ax[2][i].imshow(height(images[i]).numpy().astype("uint8")) ax[2][i].set_title("height") # width ax[3][i].imshow(width(images[i]).numpy().astype("uint8")) ax[3][i].set_title("width") # zoom ax[4][i].imshow(zoom(images[i]).numpy().astype("uint8")) ax[4][i].set_title("zoom") plt.show()

This code shows images as follows:

While you specified a fixed dimension in resize, you have a random amount of manipulation in other augmentations.

You can also do flipping, rotation, cropping, and geometric translation using preprocessing layers:

... # Create preprocessing layers flip = tf.keras.layers.RandomFlip("horizontal_and_vertical") # or "horizontal", "vertical" rotate = tf.keras.layers.RandomRotation(0.2) crop = tf.keras.layers.RandomCrop(out_height, out_width) translation = tf.keras.layers.RandomTranslation(height_factor=0.2, width_factor=0.2) # Visualize augmentations fig, ax = plt.subplots(5, 3, figsize=(6,14)) for images, labels in ds.take(1): for i in range(3): ax[0][i].imshow(images[i].numpy().astype("uint8")) ax[0][i].set_title("original") # flip ax[1][i].imshow(flip(images[i]).numpy().astype("uint8")) ax[1][i].set_title("flip") # crop ax[2][i].imshow(crop(images[i]).numpy().astype("uint8")) ax[2][i].set_title("crop") # translation ax[3][i].imshow(translation(images[i]).numpy().astype("uint8")) ax[3][i].set_title("translation") # rotate ax[4][i].imshow(rotate(images[i]).numpy().astype("uint8")) ax[4][i].set_title("rotate") plt.show()

This code shows the following images:

And finally, you can do augmentations on color adjustments as well:

... brightness = tf.keras.layers.RandomBrightness([-0.8,0.8]) contrast = tf.keras.layers.RandomContrast(0.2) # Visualize augmentation fig, ax = plt.subplots(3, 3, figsize=(6,7)) for images, labels in ds.take(1): for i in range(3): ax[0][i].imshow(images[i].numpy().astype("uint8")) ax[0][i].set_title("original") # brightness ax[1][i].imshow(brightness(images[i]).numpy().astype("uint8")) ax[1][i].set_title("brightness") # contrast ax[2][i].imshow(contrast(images[i]).numpy().astype("uint8")) ax[2][i].set_title("contrast") plt.show()

This shows the images as follows:

For completeness, below is the code to display the result of various augmentations:

from tensorflow.keras.utils import image_dataset_from_directory import tensorflow as tf import matplotlib.pyplot as plt # use image_dataset_from_directory() to load images, with image size scaled to 256x256 PATH='.../Citrus/Leaves' # modify to your path ds = image_dataset_from_directory(PATH, validation_split=0.2, subset="training", image_size=(256,256), interpolation="mitchellcubic", crop_to_aspect_ratio=True, seed=42, shuffle=True, batch_size=32) # Create preprocessing layers out_height, out_width = 128,256 resize = tf.keras.layers.Resizing(out_height, out_width) height = tf.keras.layers.RandomHeight(0.3) width = tf.keras.layers.RandomWidth(0.3) zoom = tf.keras.layers.RandomZoom(0.3) flip = tf.keras.layers.RandomFlip("horizontal_and_vertical") rotate = tf.keras.layers.RandomRotation(0.2) crop = tf.keras.layers.RandomCrop(out_height, out_width) translation = tf.keras.layers.RandomTranslation(height_factor=0.2, width_factor=0.2) brightness = tf.keras.layers.RandomBrightness([-0.8,0.8]) contrast = tf.keras.layers.RandomContrast(0.2) # Visualize images and augmentations fig, ax = plt.subplots(5, 3, figsize=(6,14)) for images, labels in ds.take(1): for i in range(3): ax[0][i].imshow(images[i].numpy().astype("uint8")) ax[0][i].set_title("original") # resize ax[1][i].imshow(resize(images[i]).numpy().astype("uint8")) ax[1][i].set_title("resize") # height ax[2][i].imshow(height(images[i]).numpy().astype("uint8")) ax[2][i].set_title("height") # width ax[3][i].imshow(width(images[i]).numpy().astype("uint8")) ax[3][i].set_title("width") # zoom ax[4][i].imshow(zoom(images[i]).numpy().astype("uint8")) ax[4][i].set_title("zoom") plt.show() fig, ax = plt.subplots(5, 3, figsize=(6,14)) for images, labels in ds.take(1): for i in range(3): ax[0][i].imshow(images[i].numpy().astype("uint8")) ax[0][i].set_title("original") # flip ax[1][i].imshow(flip(images[i]).numpy().astype("uint8")) ax[1][i].set_title("flip") # crop ax[2][i].imshow(crop(images[i]).numpy().astype("uint8")) ax[2][i].set_title("crop") # translation ax[3][i].imshow(translation(images[i]).numpy().astype("uint8")) ax[3][i].set_title("translation") # rotate ax[4][i].imshow(rotate(images[i]).numpy().astype("uint8")) ax[4][i].set_title("rotate") plt.show() fig, ax = plt.subplots(3, 3, figsize=(6,7)) for images, labels in ds.take(1): for i in range(3): ax[0][i].imshow(images[i].numpy().astype("uint8")) ax[0][i].set_title("original") # brightness ax[1][i].imshow(brightness(images[i]).numpy().astype("uint8")) ax[1][i].set_title("brightness") # contrast ax[2][i].imshow(contrast(images[i]).numpy().astype("uint8")) ax[2][i].set_title("contrast") plt.show()

Finally, it is important to point out that most neural network models can work better if the input images are scaled. While we usually use an 8-bit unsigned integer for the pixel values in an image (e.g., for display using `imshow()`

as above), a neural network prefers the pixel values to be between 0 and 1 or between -1 and +1. This can be done with preprocessing layers too. Below is how you can update one of the examples above to add the scaling layer into the augmentation:

... out_height, out_width = 128,256 resize = tf.keras.layers.Resizing(out_height, out_width) rescale = tf.keras.layers.Rescaling(1/127.5, offset=-1) # rescale pixel values to [-1,1] def augment(image, label): return rescale(resize(image)), label rescaled_resized_ds = ds.map(augment) for image, label in rescaled_resized_ds: ...

Besides the preprocessing layer, the `tf.image`

module also provides some functions for augmentation. Unlike the preprocessing layer, these functions are intended to be used in a user-defined function and assigned to a dataset using `map()`

as we saw above.

The functions provided by the `tf.image`

are not duplicates of the preprocessing layers, although there is some overlap. Below is an example of using the `tf.image`

functions to resize and crop images:

... fig, ax = plt.subplots(5, 3, figsize=(6,14)) for images, labels in ds.take(1): for i in range(3): # original ax[0][i].imshow(images[i].numpy().astype("uint8")) ax[0][i].set_title("original") # resize h = int(256 * tf.random.uniform([], minval=0.8, maxval=1.2)) w = int(256 * tf.random.uniform([], minval=0.8, maxval=1.2)) ax[1][i].imshow(tf.image.resize(images[i], [h,w]).numpy().astype("uint8")) ax[1][i].set_title("resize") # crop y, x, h, w = (128 * tf.random.uniform((4,))).numpy().astype("uint8") ax[2][i].imshow(tf.image.crop_to_bounding_box(images[i], y, x, h, w).numpy().astype("uint8")) ax[2][i].set_title("crop") # central crop x = tf.random.uniform([], minval=0.4, maxval=1.0) ax[3][i].imshow(tf.image.central_crop(images[i], x).numpy().astype("uint8")) ax[3][i].set_title("central crop") # crop to (h,w) at random offset h, w = (256 * tf.random.uniform((2,))).numpy().astype("uint8") seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32") ax[4][i].imshow(tf.image.stateless_random_crop(images[i], [h,w,3], seed).numpy().astype("uint8")) ax[4][i].set_title("random crop") plt.show()

Below is the output of the above code:

While the display of images matches what you might expect from the code, the use of `tf.image`

functions is quite different from that of the preprocessing layers. Every `tf.image`

function is different. Therefore, you can see the `crop_to_bounding_box()`

function takes pixel coordinates, but the `central_crop()`

function assumes a fraction ratio as the argument.

These functions are also different in the way randomness is handled. Some of these functions do not assume random behavior. Therefore, the random resize should have the exact output size generated using a random number generator separately before calling the resize function. Some other functions, such as `stateless_random_crop()`

, can do augmentation randomly, but a pair of random seeds in the `int32`

needs to be specified explicitly.

To continue the example, there are the functions for flipping an image and extracting the Sobel edges:

... fig, ax = plt.subplots(5, 3, figsize=(6,14)) for images, labels in ds.take(1): for i in range(3): ax[0][i].imshow(images[i].numpy().astype("uint8")) ax[0][i].set_title("original") # flip seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32") ax[1][i].imshow(tf.image.stateless_random_flip_left_right(images[i], seed).numpy().astype("uint8")) ax[1][i].set_title("flip left-right") # flip seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32") ax[2][i].imshow(tf.image.stateless_random_flip_up_down(images[i], seed).numpy().astype("uint8")) ax[2][i].set_title("flip up-down") # sobel edge sobel = tf.image.sobel_edges(images[i:i+1]) ax[3][i].imshow(sobel[0, ..., 0].numpy().astype("uint8")) ax[3][i].set_title("sobel y") # sobel edge ax[4][i].imshow(sobel[0, ..., 1].numpy().astype("uint8")) ax[4][i].set_title("sobel x") plt.show()

This shows the following:

And the following are the functions to manipulate the brightness, contrast, and colors:

... fig, ax = plt.subplots(5, 3, figsize=(6,14)) for images, labels in ds.take(1): for i in range(3): ax[0][i].imshow(images[i].numpy().astype("uint8")) ax[0][i].set_title("original") # brightness seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32") ax[1][i].imshow(tf.image.stateless_random_brightness(images[i], 0.3, seed).numpy().astype("uint8")) ax[1][i].set_title("brightness") # contrast ax[2][i].imshow(tf.image.stateless_random_contrast(images[i], 0.7, 1.3, seed).numpy().astype("uint8")) ax[2][i].set_title("contrast") # saturation ax[3][i].imshow(tf.image.stateless_random_saturation(images[i], 0.7, 1.3, seed).numpy().astype("uint8")) ax[3][i].set_title("saturation") # hue ax[4][i].imshow(tf.image.stateless_random_hue(images[i], 0.3, seed).numpy().astype("uint8")) ax[4][i].set_title("hue") plt.show()

This code shows the following:

Below is the complete code to display all of the above:

from tensorflow.keras.utils import image_dataset_from_directory import tensorflow as tf import matplotlib.pyplot as plt # use image_dataset_from_directory() to load images, with image size scaled to 256x256 PATH='.../Citrus/Leaves' # modify to your path ds = image_dataset_from_directory(PATH, validation_split=0.2, subset="training", image_size=(256,256), interpolation="mitchellcubic", crop_to_aspect_ratio=True, seed=42, shuffle=True, batch_size=32) # Visualize tf.image augmentations fig, ax = plt.subplots(5, 3, figsize=(6,14)) for images, labels in ds.take(1): for i in range(3): # original ax[0][i].imshow(images[i].numpy().astype("uint8")) ax[0][i].set_title("original") # resize h = int(256 * tf.random.uniform([], minval=0.8, maxval=1.2)) w = int(256 * tf.random.uniform([], minval=0.8, maxval=1.2)) ax[1][i].imshow(tf.image.resize(images[i], [h,w]).numpy().astype("uint8")) ax[1][i].set_title("resize") # crop y, x, h, w = (128 * tf.random.uniform((4,))).numpy().astype("uint8") ax[2][i].imshow(tf.image.crop_to_bounding_box(images[i], y, x, h, w).numpy().astype("uint8")) ax[2][i].set_title("crop") # central crop x = tf.random.uniform([], minval=0.4, maxval=1.0) ax[3][i].imshow(tf.image.central_crop(images[i], x).numpy().astype("uint8")) ax[3][i].set_title("central crop") # crop to (h,w) at random offset h, w = (256 * tf.random.uniform((2,))).numpy().astype("uint8") seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32") ax[4][i].imshow(tf.image.stateless_random_crop(images[i], [h,w,3], seed).numpy().astype("uint8")) ax[4][i].set_title("random crop") plt.show() fig, ax = plt.subplots(5, 3, figsize=(6,14)) for images, labels in ds.take(1): for i in range(3): ax[0][i].imshow(images[i].numpy().astype("uint8")) ax[0][i].set_title("original") # flip seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32") ax[1][i].imshow(tf.image.stateless_random_flip_left_right(images[i], seed).numpy().astype("uint8")) ax[1][i].set_title("flip left-right") # flip seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32") ax[2][i].imshow(tf.image.stateless_random_flip_up_down(images[i], seed).numpy().astype("uint8")) ax[2][i].set_title("flip up-down") # sobel edge sobel = tf.image.sobel_edges(images[i:i+1]) ax[3][i].imshow(sobel[0, ..., 0].numpy().astype("uint8")) ax[3][i].set_title("sobel y") # sobel edge ax[4][i].imshow(sobel[0, ..., 1].numpy().astype("uint8")) ax[4][i].set_title("sobel x") plt.show() fig, ax = plt.subplots(5, 3, figsize=(6,14)) for images, labels in ds.take(1): for i in range(3): ax[0][i].imshow(images[i].numpy().astype("uint8")) ax[0][i].set_title("original") # brightness seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32") ax[1][i].imshow(tf.image.stateless_random_brightness(images[i], 0.3, seed).numpy().astype("uint8")) ax[1][i].set_title("brightness") # contrast ax[2][i].imshow(tf.image.stateless_random_contrast(images[i], 0.7, 1.3, seed).numpy().astype("uint8")) ax[2][i].set_title("contrast") # saturation ax[3][i].imshow(tf.image.stateless_random_saturation(images[i], 0.7, 1.3, seed).numpy().astype("uint8")) ax[3][i].set_title("saturation") # hue ax[4][i].imshow(tf.image.stateless_random_hue(images[i], 0.3, seed).numpy().astype("uint8")) ax[4][i].set_title("hue") plt.show()

These augmentation functions should be enough for most uses. But if you have some specific ideas on augmentation, you would probably need a better image processing library. OpenCV and Pillow are common but powerful libraries that allow you to transform images better.

You used the Keras preprocessing layers as functions in the examples above. But they can also be used as layers in a neural network. It is trivial to use. Below is an example of how you can incorporate a preprocessing layer into a classification network and train it using a dataset:

from tensorflow.keras.utils import image_dataset_from_directory import tensorflow as tf import matplotlib.pyplot as plt # use image_dataset_from_directory() to load images, with image size scaled to 256x256 PATH='.../Citrus/Leaves' # modify to your path ds = image_dataset_from_directory(PATH, validation_split=0.2, subset="training", image_size=(256,256), interpolation="mitchellcubic", crop_to_aspect_ratio=True, seed=42, shuffle=True, batch_size=32) AUTOTUNE = tf.data.AUTOTUNE ds = ds.cache().prefetch(buffer_size=AUTOTUNE) num_classes = 5 model = tf.keras.Sequential([ tf.keras.layers.RandomFlip("horizontal_and_vertical"), tf.keras.layers.RandomRotation(0.2), tf.keras.layers.Rescaling(1/127.0, offset=-1), tf.keras.layers.Conv2D(32, 3, activation='relu'), tf.keras.layers.MaxPooling2D(), tf.keras.layers.Conv2D(32, 3, activation='relu'), tf.keras.layers.MaxPooling2D(), tf.keras.layers.Conv2D(32, 3, activation='relu'), tf.keras.layers.MaxPooling2D(), tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(num_classes) ]) model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) model.fit(ds, epochs=3)

Running this code gives the following output:

Found 609 files belonging to 5 classes. Using 488 files for training. Epoch 1/3 16/16 [==============================] - 5s 253ms/step - loss: 1.4114 - accuracy: 0.4283 Epoch 2/3 16/16 [==============================] - 4s 259ms/step - loss: 0.8101 - accuracy: 0.6475 Epoch 3/3 16/16 [==============================] - 4s 267ms/step - loss: 0.7015 - accuracy: 0.7111

In the code above, you created the dataset with `cache()`

and `prefetch()`

. This is a performance technique to allow the dataset to prepare data asynchronously while the neural network is trained. This would be significant if the dataset has some other augmentation assigned using the `map()`

function.

You will see some improvement in accuracy if you remove the `RandomFlip`

and `RandomRotation`

layers because you make the problem easier. However, as you want the network to predict well on a wide variation of image quality and properties, using augmentation can help your resulting network become more powerful.

Below is some documentation from TensorFlow that is related to the examples above:

`tf.data.Dataset`

API- Citrus leaves dataset
- Load and preprocess images
- Data augmentation
`tf.image`

API`tf.data`

performance

In this post, you have seen how you can use the `tf.data`

dataset with image augmentation functions from Keras and TensorFlow.

Specifically, you learned:

- How to use the preprocessing layers from Keras, both as a function and as part of a neural network
- How to create your own image augmentation function and apply it to the dataset using the
`map()`

function - How to use the functions provided by the
`tf.image`

module for image augmentation

The post Image Augmentation with Keras Preprocessing Layers and tf.image appeared first on MachineLearningMastery.com.

]]>The post Image Augmentation for Deep Learning with Keras appeared first on MachineLearningMastery.com.

]]>In this post, you will discover how to use data preparation and data augmentation with your image datasets when developing and evaluating deep learning models in Python with Keras.

After reading this post, you will know:

- About the image augmentation API provided by Keras and how to use it with your models
- How to perform feature standardization
- How to perform ZCA whitening of your images
- How to augment data with random rotations, shifts, and flips
- How to save augmented image data to disk

**Kick-start your project** with my new book Deep Learning With Python, including *step-by-step tutorials* and the *Python source code* files for all examples.

Let’s get started.

**Jun/2016**: First published**Update Aug/2016**: The examples in this post were updated for the latest Keras API. The datagen.next() function was removed**Update Oct/2016**: Updated for Keras 1.1.0, TensorFlow 0.10.0 and scikit-learn v0.18**Update Jan/2017**: Updated for Keras 1.2.0 and TensorFlow 0.12.1**Update Mar/2017**: Updated for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0**Update Sep/2019**: Updated for Keras 2.2.5 API**Update Jul/2022**: Updated for TensorFlow 2.x API with a workaround on the feature standardization issue

For an extended tutorial on the ImageDataGenerator for image data augmentation, see:

Like the rest of Keras, the image augmentation API is simple and powerful.

Keras provides the ImageDataGenerator class that defines the configuration for image data preparation and augmentation. This includes capabilities such as:

- Sample-wise standardization
- Feature-wise standardization
- ZCA whitening
- Random rotation, shifts, shear, and flips
- Dimension reordering
- Save augmented images to disk

An augmented image generator can be created as follows:

from tensorflow.keras.preprocessing.image import ImageDataGenerator datagen = ImageDataGenerator()

Rather than performing the operations on your entire image dataset in memory, the API is designed to be iterated by the deep learning model fitting process, creating augmented image data for you just in time. This reduces your memory overhead but adds some additional time cost during model training.

After you have created and configured your **ImageDataGenerator**, you must fit it on your data. This will calculate any statistics required to actually perform the transforms to your image data. You can do this by calling the **fit()** function on the data generator and passing it to your training dataset.

datagen.fit(train)

The data generator itself is, in fact, an iterator, returning batches of image samples when requested. You can configure the batch size and prepare the data generator and get batches of images by calling the **flow()** function.

X_batch, y_batch = datagen.flow(train, train, batch_size=32)

Finally, you can make use of the data generator. Instead of calling the **fit()** function on your model, you must call the **fit_generator()** function and pass in the data generator and the desired length of an epoch as well as the total number of epochs on which to train.

fit_generator(datagen, samples_per_epoch=len(train), epochs=100)

You can learn more about the Keras image data generator API in the Keras documentation.

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

Now that you know how the image augmentation API in Keras works, let’s look at some examples.

We will use the MNIST handwritten digit recognition task in these examples. To begin with, let’s take a look at the first nine images in the training dataset.

# Plot images from tensorflow.keras.datasets import mnist import matplotlib.pyplot as plt # load dbata (X_train, y_train), (X_test, y_test) = mnist.load_data() # create a grid of 3x3 images fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4)) for i in range(3): for j in range(3): ax[i][j].imshow(X_train[i*3+j], cmap=plt.get_cmap("gray")) # show the plot plt.show()

Running this example provides the following image that you can use as a point of comparison with the image preparation and augmentation in the examples below.

It is also possible to standardize pixel values across the entire dataset. This is called feature standardization and mirrors the type of standardization often performed for each column in a tabular dataset.

You can perform feature standardization by setting the `featurewise_center`

and `featurewise_std_normalization`

arguments to True on the `ImageDataGenerator`

class. These are set to False by default. However, the recent version of Keras has a bug in the feature standardization so that the mean and standard deviation is calculated across all pixels. If you use the `fit()`

function from the `ImageDataGenerator`

class, you will see an image similar to the one above:

# Standardize images across the dataset, mean=0, stdev=1 from tensorflow.keras.datasets import mnist from tensorflow.keras.preprocessing.image import ImageDataGenerator import matplotlib.pyplot as plt # load data (X_train, y_train), (X_test, y_test) = mnist.load_data() # reshape to be [samples][width][height][channels] X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)) X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)) # convert from int to float X_train = X_train.astype('float32') X_test = X_test.astype('float32') # define data preparation datagen = ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True) # fit parameters from data datagen.fit(X_train) # configure batch size and retrieve one batch of images for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False): print(X_batch.min(), X_batch.mean(), X_batch.max()) # create a grid of 3x3 images fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4)) for i in range(3): for j in range(3): ax[i][j].imshow(X_batch[i*3+j], cmap=plt.get_cmap("gray")) # show the plot plt.show() break

For example, the minimum, mean, and maximum values from the batch printed above are:

-0.42407447 -0.04093817 2.8215446

And the image displayed is as follows:

The workaround is to compute the feature standardization manually. Each pixel should have a separate mean and standard deviation, and it should be computed across different samples but independent from other pixels in the same sample. You just need to replace the `fit()`

function with your own computation:

# Standardize images across the dataset, every pixel has mean=0, stdev=1 from tensorflow.keras.datasets import mnist from tensorflow.keras.preprocessing.image import ImageDataGenerator import matplotlib.pyplot as plt # load data (X_train, y_train), (X_test, y_test) = mnist.load_data() # reshape to be [samples][width][height][channels] X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)) X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)) # convert from int to float X_train = X_train.astype('float32') X_test = X_test.astype('float32') # define data preparation datagen = ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True) # fit parameters from data datagen.mean = X_train.mean(axis=0) datagen.std = X_train.std(axis=0) # configure batch size and retrieve one batch of images for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False): print(X_batch.min(), X_batch.mean(), X_batch.max()) # create a grid of 3x3 images fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4)) for i in range(3): for j in range(3): ax[i][j].imshow(X_batch[i*3+j], cmap=plt.get_cmap("gray")) # show the plot plt.show() break

The minimum, mean, and maximum as printed now have a wider range:

-1.2742625 -0.028436039 17.46127

Running this example, you can see that the effect is different, seemingly darkening and lightening different digits.

A whitening transform of an image is a linear algebraic operation that reduces the redundancy in the matrix of pixel images.

Less redundancy in the image is intended to better highlight the structures and features in the image to the learning algorithm.

Typically, image whitening is performed using the Principal Component Analysis (PCA) technique. More recently, an alternative called ZCA (learn more in Appendix A of this tech report) shows better results in transformed images that keep all the original dimensions. And unlike PCA, the resulting transformed images still look like their originals. Precisely, whitening converts each image into a white noise vector, i.e., each element in the vector has zero mean and unit standard derivation and is statistically independent of each other.

You can perform a ZCA whitening transform by setting the `zca_whitening`

argument to True. But due to the same issue as feature standardization, you must first zero-center your input data separately:

# ZCA Whitening from tensorflow.keras.datasets import mnist from tensorflow.keras.preprocessing.image import ImageDataGenerator import matplotlib.pyplot as plt # load data (X_train, y_train), (X_test, y_test) = mnist.load_data() # reshape to be [samples][width][height][channels] X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)) X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)) # convert from int to float X_train = X_train.astype('float32') X_test = X_test.astype('float32') # define data preparation datagen = ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True, zca_whitening=True) # fit parameters from data X_mean = X_train.mean(axis=0) datagen.fit(X_train - X_mean) # configure batch size and retrieve one batch of images for X_batch, y_batch in datagen.flow(X_train - X_mean, y_train, batch_size=9, shuffle=False): print(X_batch.min(), X_batch.mean(), X_batch.max()) # create a grid of 3x3 images fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4)) for i in range(3): for j in range(3): ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray")) # show the plot plt.show() break

Running the example, you can see the same general structure in the images and how the outline of each digit has been highlighted.

Sometimes images in your sample data may have varying and different rotations in the scene.

You can train your model to better handle rotations of images by artificially and randomly rotating images from your dataset during training.

The example below creates random rotations of the MNIST digits up to 90 degrees by setting the rotation_range argument.

# Random Rotations from tensorflow.keras.datasets import mnist from tensorflow.keras.preprocessing.image import ImageDataGenerator import matplotlib.pyplot as plt # load data (X_train, y_train), (X_test, y_test) = mnist.load_data() # reshape to be [samples][width][height][channels] X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)) X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)) # convert from int to float X_train = X_train.astype('float32') X_test = X_test.astype('float32') # define data preparation datagen = ImageDataGenerator(rotation_range=90) # configure batch size and retrieve one batch of images for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False): # create a grid of 3x3 images fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4)) for i in range(3): for j in range(3): ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray")) # show the plot plt.show() break

Running the example, you can see that images have been rotated left and right up to a limit of 90 degrees. This is not helpful on this problem because the MNIST digits have a normalized orientation, but this transform might be of help when learning from photographs where the objects may have different orientations.

Objects in your images may not be centered in the frame. They may be off-center in a variety of different ways.

You can train your deep learning network to expect and currently handle off-center objects by artificially creating shifted versions of your training data. Keras supports separate horizontal and vertical random shifting of training data by the `width_shift_range`

and `height_shift_range`

arguments.

# Random Shifts from tensorflow.keras.datasets import mnist from tensorflow.keras.preprocessing.image import ImageDataGenerator import matplotlib.pyplot as plt # load data (X_train, y_train), (X_test, y_test) = mnist.load_data() # reshape to be [samples][width][height][channels] X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)) X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)) # convert from int to float X_train = X_train.astype('float32') X_test = X_test.astype('float32') # define data preparation shift = 0.2 datagen = ImageDataGenerator(width_shift_range=shift, height_shift_range=shift) # configure batch size and retrieve one batch of images for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False): # create a grid of 3x3 images fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4)) for i in range(3): for j in range(3): ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray")) # show the plot plt.show() break

Running this example creates shifted versions of the digits. Again, this is not required for MNIST as the handwritten digits are already centered, but you can see how this might be useful on more complex problem domains.

Another augmentation to your image data that can improve performance on large and complex problems is to create random flips of images in your training data.

Keras supports random flipping along both the vertical and horizontal axes using the `vertical_flip`

and `horizontal_flip`

arguments.

# Random Flips from tensorflow.keras.datasets import mnist from tensorflow.keras.preprocessing.image import ImageDataGenerator import matplotlib.pyplot as plt # load data (X_train, y_train), (X_test, y_test) = mnist.load_data() # reshape to be [samples][width][height][channels] X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)) X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)) # convert from int to float X_train = X_train.astype('float32') X_test = X_test.astype('float32') # define data preparation datagen = ImageDataGenerator(horizontal_flip=True, vertical_flip=True) # configure batch size and retrieve one batch of images for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False): # create a grid of 3x3 images fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4)) for i in range(3): for j in range(3): ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray")) # show the plot plt.show() break

Running this example, you can see flipped digits. Flipping digits is not useful as they will always have the correct left and right orientation, but this may be useful for problems with photographs of objects in a scene that can have a varied orientation.

The data preparation and augmentation are performed just in time by Keras.

This is efficient in terms of memory, but you may require the exact images used during training. For example, perhaps you would like to use them with a different software package later or only generate them once and use them on multiple different deep learning models or configurations.

Keras allows you to save the images generated during training. The directory, filename prefix, and image file type can be specified to the `flow()`

function before training. Then, during training, the generated images will be written to the file.

The example below demonstrates this and writes nine images to a “`images`

” subdirectory with the prefix “`aug`

” and the file type of PNG.

# Save augmented images to file from tensorflow.keras.datasets import mnist from tensorflow.keras.preprocessing.image import ImageDataGenerator import matplotlib.pyplot as plt # load data (X_train, y_train), (X_test, y_test) = mnist.load_data() # reshape to be [samples][width][height][channels] X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)) X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)) # convert from int to float X_train = X_train.astype('float32') X_test = X_test.astype('float32') # define data preparation datagen = ImageDataGenerator(horizontal_flip=True, vertical_flip=True) # configure batch size and retrieve one batch of images for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False, save_to_dir='images', save_prefix='aug', save_format='png'): # create a grid of 3x3 images fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4)) for i in range(3): for j in range(3): ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray")) # show the plot plt.show() break

Running the example, you can see that images are only written when they are generated.

Image data is unique in that you can review the data and transformed copies of the data and quickly get an idea of how the model may perceive it.

Below are some tips for getting the most from image data preparation and augmentation for deep learning.

**Review Dataset**. Take some time to review your dataset in great detail. Look at the images. Take note of image preparation and augmentations that might benefit the training process of your model, such as the need to handle different shifts, rotations, or flips of objects in the scene.**Review Augmentations**. Review sample images after the augmentation has been performed. It is one thing to intellectually know what image transforms you are using; it is a very different thing to look at examples. Review images both with individual augmentations you are using as well as the full set of augmentations you plan to use. You may see ways to simplify or further enhance your model training process.**Evaluate a Suite of Transforms**. Try more than one image data preparation and augmentation scheme. Often you can be surprised by the results of a data preparation scheme you did not think would be beneficial.

In this post, you discovered image data preparation and augmentation.

You discovered a range of techniques you can use easily in Python with Keras for deep learning models. You learned about:

- The ImageDataGenerator API in Keras for generating transformed images just in time
- Sample-wise and Feature-wise pixel standardization
- The ZCA whitening transform
- Random rotations, shifts, and flips of images
- How to save transformed images to file for later reuse

Do you have any questions about image data augmentation or this post? Ask your questions in the comments, and I will do my best to answer.

The post Image Augmentation for Deep Learning with Keras appeared first on MachineLearningMastery.com.

]]>The post Loss Functions in TensorFlow appeared first on MachineLearningMastery.com.

]]>In this post, you will learn what loss functions are and delve into some commonly used loss functions and how you can apply them to your neural networks.

After reading this article, you will learn:

- What are loss functions, and how they are different from metrics
- Common loss functions for regression and classification problems
- How to use loss functions in your TensorFlow model

Let’s get started!

This article is divided into five sections; they are:

- What are loss functions?
- Mean absolute error
- Mean squared error
- Categorical cross-entropy
- Loss functions in practice

In neural networks, loss functions help optimize the performance of the model. They are usually used to measure some penalty that the model incurs on its predictions, such as the deviation of the prediction away from the ground truth label. Loss functions are usually differentiable across their domain (but it is allowed that the gradient is undefined only for very specific points, such as x = 0, which is basically ignored in practice). In the training loop, they are differentiated with respect to parameters, and these gradients are used for your backpropagation and gradient descent steps to optimize your model on the training set.

Loss functions are also slightly different from metrics. While loss functions can tell you the performance of our model, they might not be of direct interest or easily explainable by humans. This is where metrics come in. Metrics such as accuracy are much more useful for humans to understand the performance of a neural network even though they might not be good choices for loss functions since they might not be differentiable.

In the following, let’s explore some common loss functions: the mean absolute error, mean squared error, and categorical cross entropy.

The mean absolute error (MAE) measures the absolute difference between predicted values and the ground truth labels and takes the mean of the difference across all training examples. Mathematically, it is equal to $\frac{1}{m}\sum_{i=1}^m\lvert\hat{y}_i–y_i\rvert$ where $m$ is the number of training examples and $y_i$ and $\hat{y}_i$ are the ground truth and predicted values, respectively, averaged over all training examples.

The MAE is never negative and would be zero only if the prediction matched the ground truth perfectly. It is an intuitive loss function and might also be used as one of your metrics, specifically for regression problems, since you want to minimize the error in your predictions.

Let’s look at what the mean absolute error loss function looks like graphically:

Similar to activation functions, you might also be interested in what the gradient of the loss function looks like since you are using the gradient later to do backpropagation to train your model’s parameters.

You might notice a discontinuity in the gradient function for the mean absolute loss function. Many tend to ignore it since it occurs only at x = 0, which, in practice, rarely happens since it is the probability of a single point in a continuous distribution.

Let’s take a look at how to implement this loss function in TensorFlow using the Keras losses module:

import tensorflow as tf from tensorflow.keras.losses import MeanAbsoluteError y_true = [1., 0.] y_pred = [2., 3.] mae_loss = MeanAbsoluteError() print(mae_loss(y_true, y_pred).numpy())

This gives you `2.0`

as the output as expected, since $ \frac{1}{2}(\lvert 2-1\rvert + \lvert 3-0\rvert) = \frac{1}{2}(4) = 4 $. Next, let’s explore another loss function for regression models with slightly different properties, the mean squared error.

Another popular loss function for regression models is the mean squared error (MSE), which is equal to $\frac{1}{m}\sum_{i=1}^m(\hat{y}_i–y_i)^2$. It is similar to the mean absolute error as it also measures the deviation of the predicted value from the ground truth value. However, the mean squared error squares this difference (always non-negative since squares of real numbers are always non-negative), which gives it slightly different properties.

One notable one is that the mean squared error favors a large number of small errors over a small number of large errors, which leads to models with fewer outliers or at least outliers that are less severe than models trained with a mean absolute error. This is because a large error would have a significantly larger impact on the error and, consequently, the gradient of the error when compared to a small error.

Graphically,

Then, looking at the gradient,

Notice that larger errors would lead to a larger magnitude for the gradient and a larger loss. Hence, for example, two training examples that deviate from their ground truths by 1 unit would lead to a loss of 2, while a single training example that deviates from its ground truth by 2 units would lead to a loss of 4, hence having a larger impact.

Let’s look at how to implement the mean squared loss in TensorFlow.

import tensorflow as tf from tensorflow.keras.losses import MeanSquaredError y_true = [1., 0.] y_pred = [2., 3.] mse_loss = MeanSquaredError() print(mse_loss(y_true, y_pred).numpy())

This gives the output `5.0`

as expected since $\frac{1}{2}[(2-1)^2 + (3-0)^2] = \frac{1}{2}(10) = 5$. Notice that the second example with a predicted value of 3 and actual value of 0 contributes 90% of the error under the mean squared error vs. 75% under the mean absolute error.

Sometimes, you may see people use root mean squared error (RMSE) as a metric. This will take the square root of MSE. From the perspective of a loss function, MSE and RMSE are equivalent.

Both MAE and MSE measure values in a continuous range. Hence they are for regression problems. For classification problems, you can use categorical cross-entropy.

The previous two loss functions are for regression models, where the output could be any real number. However, for classification problems, there is a small, discrete set of numbers that the output could take. Furthermore, the number used to label-encode the classes is arbitrary and with no semantic meaning (e.g., using the labels 0 for cat, 1 for dog, and 2 for horse does not represent that a dog is half cat and half horse). Therefore, it should not have an impact on the performance of the model.

In a classification problem, the model’s output is a vector of probability for each category. In Keras models, this vector is usually expected to be “logits,” i.e., real numbers to be transformed to probability using the softmax function or the output of a softmax activation function.

The cross-entropy between two probability distributions is a measure of the difference between the two probability distributions. Precisely, it is $-\sum_i P(X = x_i) \log Q(X = x_i)$ for probability $P$ and $Q$. In machine learning, we usually have the probability $P$ provided by the training data and $Q$ predicted by the model, in which $P$ is 1 for the correct class and 0 for every other class. The predicted probability $Q$, however, is usually valued between 0 and 1. Hence when used for classification problems in machine learning, this formula can be simplified into: $$\text{categorical cross entropy} = – \log p_{gt}$$ where $p_{gt}$ is the model-predicted probability of the ground truth class for that particular sample.

Cross-entropy metrics have a negative sign because $\log(x)$ tends to negative infinity as $x$ tends to zero. We want a higher loss when the probability approaches 0 and a lower loss when the probability approaches 1. Graphically,

Notice that the loss is exactly 0 if the probability of the ground truth class is 1 as desired. Also, as the probability of the ground truth class tends to 0, the loss tends to positive infinity as well, hence substantially penalizing bad predictions. You might recognize this loss function for logistic regression, which is similar except the logistic regression loss is specific to the case of binary classes.

Now, looking at the gradient of the cross entropy loss,

Looking at the gradient, you can see that the gradient is generally negative, which is also expected since, to decrease this loss, you would want the probability on the ground truth class to be as high as possible. Recall that gradient descent goes in the opposite direction of the gradient.

There are two different ways to implement categorical cross entropy in TensorFlow. The first method takes in one-hot vectors as input:

import tensorflow as tf from tensorflow.keras.losses import CategoricalCrossentropy # using one hot vector representation y_true = [[0, 1, 0], [1, 0, 0]] y_pred = [[0.15, 0.75, 0.1], [0.75, 0.15, 0.1]] cross_entropy_loss = CategoricalCrossentropy() print(cross_entropy_loss(y_true, y_pred).numpy())

This gives the output as `0.2876821`

which is equal to $-log(0.75)$ as expected. The other way of implementing the categorical cross entropy loss in TensorFlow is using a label-encoded representation for the class, where the class is represented by a single non-negative integer indicating the ground truth class instead.

import tensorflow as tf from tensorflow.keras.losses import SparseCategoricalCrossentropy y_true = [1, 0] y_pred = [[0.15, 0.75, 0.1], [0.75, 0.15, 0.1]] cross_entropy_loss = SparseCategoricalCrossentropy() print(cross_entropy_loss(y_true, y_pred).numpy())

This likewise gives the output `0.2876821`

.

Now that you’ve explored loss functions for both regression and classification models, let’s take a look at how you can use loss functions in your machine learning models.

Let’s explore how to use loss functions in practice. You’ll explore this through a simple dense model on the MNIST digit classification dataset.

First, download the data from the Keras datasets module:

import tensorflow.keras as keras (trainX, trainY), (testX, testY) = keras.datasets.mnist.load_data()

Then, build your model:

from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense, Input, Flatten model = Sequential([ Input(shape=(28,28,1,)), Flatten(), Dense(units=84, activation="relu"), Dense(units=10, activation="softmax"), ]) print (model.summary())

And look at the model architecture outputted from the above code:

_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= flatten_1 (Flatten) (None, 784) 0 dense_2 (Dense) (None, 84) 65940 dense_3 (Dense) (None, 10) 850 ================================================================= Total params: 66,790 Trainable params: 66,790 Non-trainable params: 0 _________________________________________________________________

You can then compile your model, which is also where you introduce the loss function. Since this is a classification problem, use the cross entropy loss. In particular, since the MNIST dataset in Keras datasets is represented as a label instead of a one-hot vector, use the SparseCategoricalCrossEntropy loss.

model.compile(optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics="acc")

And finally, you train your model:

history = model.fit(x=trainX, y=trainY, batch_size=256, epochs=10, validation_data=(testX, testY))

And your model successfully trains with the following output:

Epoch 1/10 235/235 [==============================] - 2s 6ms/step - loss: 7.8607 - acc: 0.8184 - val_loss: 1.7445 - val_acc: 0.8789 Epoch 2/10 235/235 [==============================] - 1s 6ms/step - loss: 1.1011 - acc: 0.8854 - val_loss: 0.9082 - val_acc: 0.8821 Epoch 3/10 235/235 [==============================] - 1s 6ms/step - loss: 0.5729 - acc: 0.8998 - val_loss: 0.6689 - val_acc: 0.8927 Epoch 4/10 235/235 [==============================] - 1s 5ms/step - loss: 0.3911 - acc: 0.9203 - val_loss: 0.5406 - val_acc: 0.9097 Epoch 5/10 235/235 [==============================] - 1s 6ms/step - loss: 0.3016 - acc: 0.9306 - val_loss: 0.5024 - val_acc: 0.9182 Epoch 6/10 235/235 [==============================] - 1s 6ms/step - loss: 0.2443 - acc: 0.9405 - val_loss: 0.4571 - val_acc: 0.9242 Epoch 7/10 235/235 [==============================] - 1s 5ms/step - loss: 0.2076 - acc: 0.9469 - val_loss: 0.4173 - val_acc: 0.9282 Epoch 8/10 235/235 [==============================] - 1s 5ms/step - loss: 0.1852 - acc: 0.9514 - val_loss: 0.4335 - val_acc: 0.9287 Epoch 9/10 235/235 [==============================] - 1s 6ms/step - loss: 0.1576 - acc: 0.9577 - val_loss: 0.4217 - val_acc: 0.9342 Epoch 10/10 235/235 [==============================] - 1s 5ms/step - loss: 0.1455 - acc: 0.9597 - val_loss: 0.4151 - val_acc: 0.9344

And that’s one example of how to use a loss function in a TensorFlow model.

Below is some documentation on loss functions from TensorFlow/Keras:

- Mean absolute error: https://www.tensorflow.org/api_docs/python/tf/keras/losses/MeanAbsoluteError
- Mean squared error: https://www.tensorflow.org/api_docs/python/tf/keras/losses/MeanSquaredError
- Categorical cross entropy: https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy
- Sparse categorical cross entropy: https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy

In this post, you have seen loss functions and the role that they play in a neural network. You have also seen some popular loss functions used in regression and classification models, as well as how to use the cross entropy loss function in a TensorFlow model.

Specifically, you learned:

- What are loss functions, and how they are different from metrics
- Common loss functions for regression and classification problems
- How to use loss functions in your TensorFlow model

The post Loss Functions in TensorFlow appeared first on MachineLearningMastery.com.

]]>The post Understanding the Design of a Convolutional Neural Network appeared first on MachineLearningMastery.com.

]]>In this tutorial, you will make sense of the operation of convolutional layers and their role in a larger convolutional neural network.

After finishing this tutorial, you will learn:

- How convolutional layers extract features from an image
- How different convolutional layers can stack up to build a neural network

Let’s get started.

This article is divided into three sections; they are:

- An Example Network
- Showing the Feature Maps
- Effect of the Convolutional Layers

The following is a program to do image classification on the CIFAR-10 dataset:

import matplotlib.pyplot as plt import numpy as np import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D, Dropout, MaxPooling2D, Flatten, Dense from tensorflow.keras.constraints import MaxNorm from tensorflow.keras.datasets.cifar10 import load_data (X_train, y_train), (X_test, y_test) = load_data() # rescale image X_train_scaled = X_train / 255.0 X_test_scaled = X_test / 255.0 model = Sequential([ Conv2D(32, (3,3), input_shape=(32, 32, 3), padding="same", activation="relu", kernel_constraint=MaxNorm(3)), Dropout(0.3), Conv2D(32, (3,3), padding="same", activation="relu", kernel_constraint=MaxNorm(3)), MaxPooling2D(), Flatten(), Dense(512, activation="relu", kernel_constraint=MaxNorm(3)), Dropout(0.5), Dense(10, activation="sigmoid") ]) model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics="sparse_categorical_accuracy") model.fit(X_train_scaled, y_train, validation_data=(X_test_scaled, y_test), epochs=25, batch_size=32)

This network should be able to achieve around 70% accuracy in classification. The images are in 32×32 pixels in RGB color. They are in 10 different classes, and the labels are integers from 0 to 9.

You can print the network using Keras’s `summary()`

function:

... model.summary()

In this network, the following will be shown on the screen:

Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 32, 32, 32) 896 dropout (Dropout) (None, 32, 32, 32) 0 conv2d_1 (Conv2D) (None, 32, 32, 32) 9248 max_pooling2d (MaxPooling2D (None, 16, 16, 32) 0 ) flatten (Flatten) (None, 8192) 0 dense (Dense) (None, 512) 4194816 dropout_1 (Dropout) (None, 512) 0 dense_1 (Dense) (None, 10) 5130 ================================================================= Total params: 4,210,090 Trainable params: 4,210,090 Non-trainable params: 0 _________________________________________________________________

It is typical in a network for image classification to be comprised of convolutional layers at an early stage, with dropout and pooling layers interleaved. Then, at a later stage, the output from convolutional layers is flattened and processed by some fully connected layers.

In the above network, there are two convolutional layers (`Conv2D`

). The first layer is defined as follows:

Conv2D(32, (3,3), input_shape=(32, 32, 3), padding="same", activation="relu", kernel_constraint=MaxNorm(3))

This means the convolutional layer will have a 3×3 kernel and apply on an input image of 32×32 pixels and three channels (the RGB colors). Therefore, the output of this layer will be 32 channels.

In order to make sense of the convolutional layer, you can check out its kernel. The variable `model`

holds the network, and you can find the kernel of the first convolutional layer with the following:

... print(model.layers[0].kernel)

This prints:

<tf.Variable 'conv2d/kernel:0' shape=(3, 3, 3, 32) dtype=float32, numpy= array([[[[-2.30068922e-01, 1.41024575e-01, -1.93124503e-01, -2.03153938e-01, 7.71819279e-02, 4.81446862e-01, -1.11971676e-01, -1.75487325e-01, -4.01797555e-02, ... 4.64215249e-01, 4.10646647e-02, 4.99733612e-02, -5.22711873e-02, -9.20209661e-03, -1.16479330e-01, 9.25614685e-02, -4.43541892e-02]]]], dtype=float32)>

You can tell that `model.layers[0]`

is the correct layer by comparing the name `conv2d`

from the above output to the output of `model.summary()`

. This layer has a kernel of the shape `(3, 3, 3, 32)`

, which are the height, width, input channels, and output feature maps, respectively.

Assume the kernel is a NumPy array `k`

. A convolutional layer will take its kernel `k[:, :, 0, n]`

(a 3×3 array) and apply on the first channel of the image. Then apply `k[:, :, 1, n]`

on the second channel of the image, and so on. Afterward, the result of the convolution on all the channels is added up to become the feature map `n`

of the output, where `n`

, in this case, will run from 0 to 31 for the 32 output feature maps.

In Keras, you can extract the output of each layer using an extractor model. In the following, you will create a batch with one input image and send it to the network. Then look at the feature maps of the first convolutional layer:

... # Extract output from each layer extractor = tf.keras.Model(inputs=model.inputs, outputs=[layer.output for layer in model.layers]) features = extractor(np.expand_dims(X_train[7], 0)) # Show the 32 feature maps from the first layer l0_features = features[0].numpy()[0] fig, ax = plt.subplots(4, 8, sharex=True, sharey=True, figsize=(16,8)) for i in range(0, 32): row, col = i//8, i%8 ax[row][col].imshow(l0_features[..., i]) plt.show()

The above code will print the feature maps like the following:

This corresponds to the following input image:

You can see that they are called feature maps because they are highlighting certain features from the input image. A feature is identified using a small window (in this case, over a 3×3 pixels filter). The input image has three color channels. Each channel has a different filter applied, and their results are combined for an output feature.

You can similarly display the feature map from the output of the second convolutional layer as follows:

... # Show the 32 feature maps from the third layer l2_features = features[2].numpy()[0] fig, ax = plt.subplots(4, 8, sharex=True, sharey=True, figsize=(16,8)) for i in range(0, 32): row, col = i//8, i%8 ax[row][col].imshow(l2_features[..., i]) plt.show()

This shows the following:

From the above, you can see that the features extracted are more abstract and less recognizable.

The most important hyperparameter to a convolutional layer is the size of the filter. Usually, it is in a square shape, and you can consider that as a **window** or **receptive field** to look at the input image. Therefore, the higher resolution of the image, then you can expect a larger filter.

On the other hand, a filter too large will blur the detailed features because all pixels from the receptive field through the filter will be combined into one pixel at the output feature map. Therefore, there is a trade-off for the appropriate size of the filter.

Stacking two convolutional layers (without any other layers in between) is equivalent to a single convolutional layer with a larger filter. But a typical design to use nowadays is two layers with small filters stacked together rather than one larger with a larger filter, as there are fewer parameters to train.

The exception would be a convolutional layer with a 1×1 filter. This is usually found as the beginning layer of a network. The purpose of such a convolutional layer is to combine the input channels into one rather than transforming the pixels. Conceptually, this can convert a color image into grayscale, but usually, you can use multiple ways of conversion to create more input channels than merely RGB for the network.

Also, note that in the above network, you are using `Conv2D`

for a 2D filter. There is also a `Conv3D`

layer for a 3D filter. The difference is whether you apply the filter separately for each channel or feature map or consider the input feature maps stacked up as a 3D array and apply a single filter to transform it altogether. Usually, the former is used as it is more reasonable to consider no particular order in which the feature maps should be stacked.

This section provides more resources on the topic if you are looking to go deeper.

**Articles**

- Convolutional Neural Networks (CNNs / ConvNets) from the Stanford Course CS231n

**Tutorials**

- How Do Convolutional Layers Work in Deep Learning Neural Networks?
- A Gentle Introduction to 1×1 Convolutions to Manage Model Complexity
- A Gentle Introduction to Padding and Stride for Convolutional Neural Networks

In this post, you have seen how to visualize the feature maps from a convolutional neural network and how it works to extract the feature maps.

Specifically, you learned:

- The structure of a typical convolutional neural network
- What is the effect of the filter size on a convolutional layer
- What is the effect of stacking convolutional layers in a network

The post Understanding the Design of a Convolutional Neural Network appeared first on MachineLearningMastery.com.

]]>The post A Gentle Introduction to the tensorflow.data API appeared first on MachineLearningMastery.com.

]]>`tf.data`

dataset.
In this tutorial, you will see how you can use the `tf.data`

dataset for a Keras model. After finishing this tutorial, you will learn:

- How to create and use the
`tf.data`

dataset - The benefit of doing so compared to a generator function

Let’s get started.

This article is divided into four sections; they are:

- Training a Keras Model with NumPy Array and Generator Function
- Creating a Dataset using
`tf.data`

- Creating a Dataset from Generator Function
- Data with Prefetch

Before you see how the `tf.data`

API works, let’s review how you might usually train a Keras model.

First, you need a dataset. An example is the fashion MNIST dataset that comes with the Keras API. This dataset has 60,000 training samples and 10,000 test samples of 28×28 pixels in grayscale, and the corresponding classification label is encoded with integers 0 to 9.

The dataset is a NumPy array. Then you can build a Keras model for classification, and with the model’s `fit()`

function, you provide the NumPy array as data.

The complete code is as follows:

import matplotlib.pyplot as plt import tensorflow as tf from tensorflow.keras.datasets.fashion_mnist import load_data from tensorflow.keras.layers import Dense, Flatten from tensorflow.keras.models import Sequential (train_image, train_label), (test_image, test_label) = load_data() print(train_image.shape) print(train_label.shape) print(test_image.shape) print(test_label.shape) model = Sequential([ Flatten(input_shape=(28,28)), Dense(100, activation="relu"), Dense(100, activation="relu"), Dense(10, activation="sigmoid") ]) model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics="sparse_categorical_accuracy") history = model.fit(train_image, train_label, batch_size=32, epochs=50, validation_data=(test_image, test_label), verbose=0) print(model.evaluate(test_image, test_label)) plt.plot(history.history['val_sparse_categorical_accuracy']) plt.show()

Running this code will print out the following:

(60000, 28, 28) (60000,) (10000, 28, 28) (10000,) 313/313 [==============================] - 0s 392us/step - loss: 0.5114 - sparse_categorical_accuracy: 0.8446 [0.5113903284072876, 0.8446000218391418]

And also, create the following plot of validation accuracy over the 50 epochs you trained your model:

The other way of training the same network is to provide the data from a Python generator function instead of a NumPy array. A generator function is the one with a `yield`

statement to emit data while the function runs parallel to the data consumer. A generator of the fashion MNIST dataset can be created as follows:

def batch_generator(image, label, batchsize): N = len(image) i = 0 while True: yield image[i:i+batchsize], label[i:i+batchsize] i = i + batchsize if i + batchsize > N: i = 0

This function is supposed to be called with the syntax `batch_generator(train_image, train_label, 32)`

. It will scan the input arrays in batches indefinitely. Once it reaches the end of the array, it will restart from the beginning.

Training a Keras model with a generator is similar to using the `fit()`

function:

history = model.fit(batch_generator(train_image, train_label, 32), steps_per_epoch=len(train_image)//32, epochs=50, validation_data=(test_image, test_label), verbose=0)

Instead of providing the data and label, you just need to provide the generator as it will give out both. When data are presented as a NumPy array, you can tell how many samples there are by looking at the length of the array. Keras can complete one epoch when the entire dataset is used once. However, your generator function will emit batches indefinitely, so you need to tell it when an epoch is ended, using the `steps_per_epoch`

argument to the `fit()`

function.

In the above code, the validation data was provided as a NumPy array, but you can use a generator instead and specify the `validation_steps`

argument.

The following is the complete code using a generator function, in which the output is the same as the previous example:

import matplotlib.pyplot as plt import tensorflow as tf from tensorflow.keras.datasets.fashion_mnist import load_data from tensorflow.keras.layers import Dense, Flatten from tensorflow.keras.models import Sequential (train_image, train_label), (test_image, test_label) = load_data() print(train_image.shape) print(train_label.shape) print(test_image.shape) print(test_label.shape) model = Sequential([ Flatten(input_shape=(28,28)), Dense(100, activation="relu"), Dense(100, activation="relu"), Dense(10, activation="sigmoid") ]) def batch_generator(image, label, batchsize): N = len(image) i = 0 while True: yield image[i:i+batchsize], label[i:i+batchsize] i = i + batchsize if i + batchsize > N: i = 0 model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics="sparse_categorical_accuracy") history = model.fit(batch_generator(train_image, train_label, 32), steps_per_epoch=len(train_image)//32, epochs=50, validation_data=(test_image, test_label), verbose=0) print(model.evaluate(test_image, test_label)) plt.plot(history.history['val_sparse_categorical_accuracy']) plt.show()

`tf.data`

Given that you have the fashion MNIST data loaded, you can convert it into a `tf.data`

dataset, like the following:

... dataset = tf.data.Dataset.from_tensor_slices((train_image, train_label)) print(dataset.element_spec)

This prints the dataset’s spec as follows:

(TensorSpec(shape=(28, 28), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.uint8, name=None))

You can see the data is a tuple (as a tuple was passed as an argument to the `from_tensor_slices()`

function), whereas the first element is in the shape `(28,28)`

while the second element is a scalar. Both elements are stored as 8-bit unsigned integers.

If you do not present the data as a tuple of two NumPy arrays when you create the dataset, you can also do it later. The following creates the same dataset but first creates the dataset for the image data and the label separately before combining them:

... train_image_data = tf.data.Dataset.from_tensor_slices(train_image) train_label_data = tf.data.Dataset.from_tensor_slices(train_label) dataset = tf.data.Dataset.zip((train_image_data, train_label_data)) print(dataset.element_spec)

This will print the same spec:

(TensorSpec(shape=(28, 28), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.uint8, name=None))

The `zip()`

function in the dataset is like the `zip()`

function in Python because it matches data one by one from multiple datasets into a tuple.

One benefit of using the `tf.data`

dataset is the flexibility in handling the data. Below is the complete code on how you can train a Keras model using a dataset in which the batch size is set to the dataset:

import matplotlib.pyplot as plt import tensorflow as tf from tensorflow.keras.datasets.fashion_mnist import load_data from tensorflow.keras.layers import Dense, Flatten from tensorflow.keras.models import Sequential (train_image, train_label), (test_image, test_label) = load_data() dataset = tf.data.Dataset.from_tensor_slices((train_image, train_label)) model = Sequential([ Flatten(input_shape=(28,28)), Dense(100, activation="relu"), Dense(100, activation="relu"), Dense(10, activation="sigmoid") ]) history = model.fit(dataset.batch(32), epochs=50, validation_data=(test_image, test_label), verbose=0) print(model.evaluate(test_image, test_label)) plt.plot(history.history['val_sparse_categorical_accuracy']) plt.show()

This is the simplest use case of using a dataset. If you dive deeper, you can see that a dataset is just an iterator. Therefore, you can print out each sample in a dataset using the following:

for image, label in dataset: print(image) # array of shape (28,28) in tf.Tensor print(label) # integer label in tf.Tensor

The dataset has many functions built in. The `batch()`

used before is one of them. If you create batches from a dataset and print them, you have the following:

for image, label in dataset.batch(32): print(image) # array of shape (32,28,28) in tf.Tensor print(label) # array of shape (32,) in tf.Tensor

Here, each item from a batch is not a sample but a batch of samples. You also have functions such as `map()`

, `filter()`

, and `reduce()`

for sequence transformation, or `concatendate()`

and `interleave()`

for combining with another dataset. There are also `repeat()`

, `take()`

, `take_while()`

, and `skip()`

like our familiar counterpart from Python’s `itertools`

module. A full list of the functions can be found in the API documentation.

So far, you saw how a dataset could be used in place of a NumPy array in training a Keras model. Indeed, a dataset can also be created out of a generator function. But instead of a generator function that generates a **batch,** as you saw in one of the examples above, you now make a generator function that generates one sample at a time. The following is the function:

import numpy as np import tensorflow as tf def shuffle_generator(image, label, seed): idx = np.arange(len(image)) np.random.default_rng(seed).shuffle(idx) for i in idx: yield image[i], label[i] dataset = tf.data.Dataset.from_generator( shuffle_generator, args=[train_image, train_label, 42], output_signature=( tf.TensorSpec(shape=(28,28), dtype=tf.uint8), tf.TensorSpec(shape=(), dtype=tf.uint8))) print(dataset.element_spec)

This function randomizes the input array by shuffling the index vector. Then it generates one sample at a time. Unlike the previous example, this generator will end when the samples from the array are exhausted.

You can create a dataset from the function using `from_generator()`

. You need to provide the name of the generator function (instead of an instantiated generator) and also the output signature of the dataset. This is required because the `tf.data.Dataset`

API cannot infer the dataset spec before the generator is consumed.

Running the above code will print the same spec as before:

(TensorSpec(shape=(28, 28), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.uint8, name=None))

Such a dataset is functionally equivalent to the dataset that you created previously. Hence you can use it for training as before. The following is the complete code:

import matplotlib.pyplot as plt import numpy as np import tensorflow as tf from tensorflow.keras.datasets.fashion_mnist import load_data from tensorflow.keras.layers import Dense, Flatten from tensorflow.keras.models import Sequential (train_image, train_label), (test_image, test_label) = load_data() def shuffle_generator(image, label, seed): idx = np.arange(len(image)) np.random.default_rng(seed).shuffle(idx) for i in idx: yield image[i], label[i] dataset = tf.data.Dataset.from_generator( shuffle_generator, args=[train_image, train_label, 42], output_signature=( tf.TensorSpec(shape=(28,28), dtype=tf.uint8), tf.TensorSpec(shape=(), dtype=tf.uint8))) model = Sequential([ Flatten(input_shape=(28,28)), Dense(100, activation="relu"), Dense(100, activation="relu"), Dense(10, activation="sigmoid") ]) history = model.fit(dataset.batch(32), epochs=50, validation_data=(test_image, test_label), verbose=0) print(model.evaluate(test_image, test_label)) plt.plot(history.history['val_sparse_categorical_accuracy']) plt.show()

The real benefit of using a dataset is to use `prefetch()`

.

Using a NumPy array for training is probably the best in performance. However, this means you need to load all data into memory. Using a generator function for training allows you to prepare one batch at a time, in which the data can be loaded from disk on demand, for example. However, using a generator function to train a Keras model means either the training loop or the generator function is running at any time. It is not easy to make the generator function and Keras’s training loop run in parallel.

Dataset is the API that allows the generator and the training loop to run in parallel. If you have a generator that is computationally expensive (e.g., doing image augmentation in realtime), you can create a dataset from such a generator function and then use it with `prefetch()`

, as follows:

... history = model.fit(dataset.batch(32).prefetch(3), epochs=50, validation_data=(test_image, test_label), verbose=0)

The number argument to `prefetch()`

is the size of the buffer. Here, the dataset is asked to keep three batches in memory ready for the training loop to consume. Whenever a batch is consumed, the dataset API will resume the generator function to refill the buffer asynchronously in the background. Therefore, you can allow the training loop and the data preparation algorithm inside the generator function to run in parallel.

It’s worth mentioning that, in the previous section, you created a shuffling generator for the dataset API. Indeed the dataset API also has a `shuffle()`

function to do the same, but you may not want to use it unless the dataset is small enough to fit in memory.

The `shuffle()`

function, same as `prefetch()`

, takes a buffer-size argument. The shuffle algorithm will fill the buffer with the dataset and draw one element randomly from it. The consumed element will be replaced with the next element from the dataset. Hence you need the buffer as large as the dataset itself to make a truly random shuffle. This limitation is demonstrated with the following snippet:

import tensorflow as tf import numpy as np n_dataset = tf.data.Dataset.from_tensor_slices(np.arange(10000)) for n in n_dataset.shuffle(10).take(20): print(n.numpy())

The output from the above looks like the following:

9 6 2 7 5 1 4 14 11 17 19 18 3 16 15 22 10 23 21 13

Here you can see the numbers are shuffled around its neighborhood, and you never see large numbers from its output.

More about the `tf.data`

dataset can be found from its API documentation:

In this post, you have seen how you can use the `tf.data`

dataset and how it can be used in training a Keras model.

Specifically, you learned:

- How to train a model using data from a NumPy array, a generator, and a dataset
- How to create a dataset using a NumPy array or a generator function
- How to use prefetch with a dataset to make the generator and training loop run in parallel

The post A Gentle Introduction to the tensorflow.data API appeared first on MachineLearningMastery.com.

]]>