Line Plots of KL Divergence Loss and Classification Accuracy over Training Epochs on the Blobs Multi-Class Classification Problem

How to Choose Loss Functions When Training Deep Learning Neural Networks

Deep learning neural networks are trained using the stochastic gradient descent optimization algorithm. As part of the optimization algorithm, the error for the current state of the model must be estimated repeatedly. This requires the choice of an error function, conventionally called a loss function, that can be used to estimate the loss of the […]

Continue Reading 10
Line Plots of Train and Test Accuracy for a Suite of Learning Rates on the Blobs Classification Problem

Understand the Impact of Learning Rate on Model Performance With Deep Learning Neural Networks

Deep learning neural networks are trained using the stochastic gradient descent optimization algorithm. The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. Choosing the learning rate is challenging as a value too small may result in a […]

Continue Reading 8
How to Configure the Learning Rate Hyperparameter When Training Deep Learning Neural Networks

How to Configure the Learning Rate Hyperparameter When Training Deep Learning Neural Networks

The weights of a neural network cannot be calculated using an analytical method. Instead, the weights must be discovered via an empirical optimization procedure called stochastic gradient descent. The optimization problem addressed by stochastic gradient descent for neural networks is challenging and the space of solutions (sets of weights) may be comprised of many good […]

Continue Reading 4
Line Plots of Classification Accuracy on Train and Test Datasets With Different Batch Sizes

How to Control the Speed and Stability of Training Neural Networks With Gradient Descent Batch Size

Neural networks are trained using gradient descent where the estimate of the error used to update the weights is calculated based on a subset of the training dataset. The number of examples from the training dataset used in the estimate of the error gradient is called the batch size and is an important hyperparameter that […]

Continue Reading 6
Line Plot Classification Accuracy of MLP With Batch Normalization After Activation Function on Train and Test Datasets Over Training Epochs

How to Accelerate Learning of Deep Neural Networks With Batch Normalization

Batch normalization is a technique designed to automatically standardize the inputs to a layer in a deep learning neural network. Once implemented, batch normalization has the effect of dramatically accelerating the training process of a neural network, and in some cases improves the performance of the model via a modest regularization effect. In this tutorial, […]

Continue Reading 11
Overview of Course Structure

Practical Deep Learning for Coders (Review)

Practical deep learning is a challenging subject in which to get started. It is often taught in a bottom-up manner, requiring that you first get familiar with linear algebra, calculus, and mathematical optimization before eventually learning the neural network techniques. This can take years, and most of the background theory will not help you to […]

Continue Reading 8
Line Plot of Train and Test Set Accuracy of Over Training Epochs for Deep MLP with ReLU with 15 Hidden Layers

How to Fix Vanishing Gradients Using the Rectified Linear Activation Function

The vanishing gradients problem is one example of unstable behavior that you may encounter when training a deep neural network. It describes the situation where a deep multilayer feed-forward network or a recurrent neural network is unable to propagate useful gradient information from the output end of the model back to the layers near the […]

Continue Reading 4