Deep learning neural networks are likely to quickly overfit a training dataset with few examples. Ensembles of neural networks with different model configurations are known to reduce overfitting, but require the additional computational expense of training and maintaining multiple models. A single model can be used to simulate having a large number of different network […]

# Archive | Better Deep Learning

## How to Reduce Generalization Error in Deep Neural Networks With Activity Regularization in Keras

Activity regularization provides an approach to encourage a neural network to learn sparse features or internal representations of raw observations. It is common to seek sparse learned representations in autoencoders, called sparse autoencoders, and in encoder-decoder models, although the approach can also be used generally to reduce overfitting and improve a model’s ability to generalize […]

## Activation Regularization for Reducing Generalization Error in Deep Learning Neural Networks

Deep learning models are capable of automatically learning a rich internal representation from raw input data. This is called feature or representation learning. Better learned representations, in turn, can lead to better insights into the domain, e.g. via visualization of learned features, and to better predictive models that make use of the learned features. A […]

## How to Reduce Overfitting in Deep Neural Networks Using Weight Constraints in Keras

Weight constraints provide an approach to reduce the overfitting of a deep learning neural network model on the training data and improve the performance of the model on new data, such as the holdout test set. There are multiple types of weight constraints, such as maximum and unit vector norms, and some require a hyperparameter […]

## A Gentle Introduction to Weight Constraints to Reduce Generalization Error in Deep Learning

Weight regularization methods like weight decay introduce a penalty to the loss function when training a neural network to encourage the network to use small weights. Smaller weights in a neural network can result in a model that is more stable and less likely to overfit the training dataset, in turn having better performance when […]

## How to Reduce Overfitting of a Deep Learning Model with Weight Regularization

Weight regularization provides an approach to reduce the overfitting of a deep learning neural network model on the training data and improve the performance of the model on new data, such as the holdout test set. There are multiple types of weight regularization, such as L1 and L2 vector norms, and each requires a hyperparameter […]

## Use Weight Regularization to Reduce Overfitting of Deep Learning Models

Neural networks learn a set of weights that best map inputs to outputs. A network with large network weights can be a sign of an unstable network where small changes in the input can lead to large changes in the output. This can be a sign that the network has overfit the training dataset and […]

## How to Configure the Number of Layers and Nodes in a Neural Network

Artificial neural networks have two main hyperparameters that control the architecture or topology of the network: the number of layers and the number of nodes in each hidden layer. You must specify values for these parameters when configuring your network. The most reliable way to configure these hyperparameters for your specific predictive modeling problem is […]

## A Gentle Introduction to Transfer Learning for Deep Learning

Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. It is a popular approach in deep learning where pre-trained models are used as the starting point on computer vision and natural language processing tasks given the vast […]

## Gentle Introduction to the Adam Optimization Algorithm for Deep Learning

The choice of optimization algorithm for your deep learning model can mean the difference between good results in minutes, hours, and days. The Adam optimization algorithm is an extension to stochastic gradient descent that has recently seen broader adoption for deep learning applications in computer vision and natural language processing. In this post, you will […]