How to Update Neural Network Models With More Data

By Jason Brownlee on February 22, 2021 in Deep Learning 42

Deep learning neural network models used for predictive modeling may need to be updated.

This may be because the data has changed since the model was developed and deployed, or it may be the case that additional labeled data has been made available since the model was developed and it is expected that the additional data will improve the performance of the model.

It is important to experiment and evaluate with a range of different approaches when updating neural network models for new data, especially if model updating will be automated, such as on a periodic schedule.

There are many ways to update neural network models, although the two main approaches involve either using the existing model as a starting point and retraining it, or leaving the existing model unchanged and combining the predictions from the existing model with a new model.

In this tutorial, you will discover how to update deep learning neural network models in response to new data.

After completing this tutorial, you will know:

Neural network models may need to be updated when the underlying data changes or when new labeled data is made available.
How to update trained neural network models with just new data or combinations of old and new data.
How to create an ensemble of existing and new models trained on just new data or combinations of old and new data.

Let’s get started.

How to Update Neural Network Models With More Data
Photo by Judy Gallagher, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

Updating Neural Network Models
Retraining Update Strategies
1. Update Model on New Data Only
2. Update Model on Old and New Data
Ensemble Update Strategies
1. Ensemble Model With Model on New Data Only
2. Ensemble Model With Model on Old and New Data

Updating Neural Network Models

Selecting and finalizing a deep learning neural network model for a predictive modeling project is just the beginning.

You can then start using the model to make predictions on new data.

One possible problem that you may encounter is that the nature of the prediction problem may change over time.

You may notice this by the fact that the effectiveness of predictions may begin to decline over time. This may be because the assumptions made and captured in the model are changing or no longer hold.

Generally, this is referred to as the problem of “concept drift” where the underlying probability distributions of variables and relationships between variables change over time, which can negatively impact the model built from the data.

For more on concept drift, see the tutorial:

A Gentle Introduction to Concept Drift in Machine Learning

Concept drift may affect your model at different times and depends specifically on the prediction problem you are solving and the model chosen to address it.

It can be helpful to monitor the performance of a model over time and use a clear drop in model performance as a trigger to make a change to your model, such as re-training it on new data.

Alternately, you may know that data in your domain changes frequently enough that a change to the model is required periodically, such as weekly, monthly, or annually.

Finally, you may operate your model for a while and accumulate additional data with known outcomes that you wish to use to update your model, with the hopes of improving predictive performance.

Importantly, you have a lot of flexibility when it comes to responding to a change to the problem or the availability of new data.

For example, you can take the trained neural network model and update the model weights using the new data. Or we might want to leave the existing model untouched and combine its predictions with a new model fit on the newly available data.

These approaches might represent two general themes in updating neural network models in response to new data, they are:

Retrain Update Strategies.
Ensemble Update Strategies.

Let’s take a closer look at each in turn.

Retraining Update Strategies

A benefit of neural network models is that their weights can be updated at any time with continued training.

When responding to changes in the underlying data or the availability of new data, there are a few different strategies to choose from when updating a neural network model, such as:

Continue training the model on the new data only.
Continue training the model on the old and new data.

We might also imagine variations on the above strategies, such as using a sample of the new data or a sample of new and old data instead of all available data, as well as possible instance-based weightings on sampled data.

We might also consider extensions of the model that freeze the layers of the existing model (e.g. so model weights cannot change during training), then add new layers with model weights that can change, grafting on extensions to the model to handle any change in the data. Perhaps this is a variation of the retraining and the ensemble approach in the next section, and we’ll leave it for now.

Nevertheless, these are the two main strategies to consider.

Let’s make these approaches concrete with a worked example.

Update Model on New Data Only

We can update the model on the new data only.

One extreme version of this approach is to not use any new data and simply re-train the model on the old data. This might be the same as “do nothing” in response to the new data. At the other extreme, a model could be fit on the new data only, discarding the old data and old model.

Ignore new data, do nothing.
Update existing model on new data.
Fit new model on new data, discard old model and data.

We will focus on the middle ground in this example, but it might be interesting to test all three approaches on your problem and see what works best.

First, we can define a synthetic binary classification dataset and split it into half, then use one portion as “old data” and another portion as “new data.”

...
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

...

# define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# record the number of input features in the data

n_features = X.shape[1]

# split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

We can then define a Multilayer Perceptron model (MLP) and fit it on the old data only.

...
# define the model
model = Sequential()
model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

...

# define the model

model = Sequential()

model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

# compile the model

model.compile(optimizer=opt, loss='binary_crossentropy')

# fit the model on old data

model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

We can then imagine saving the model and using it for some time.

Time passes, and we wish to update it on new data that has become available.

This would involve using a much smaller learning rate than normal so that we do not wash away the weights learned on the old data.

Note: you will need to discover a learning rate that is appropriate for your model and dataset that achieves better performance than simply fitting a new model from scratch.

...
# update model on new data only with a smaller learning rate
opt = SGD(learning_rate=0.001, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')

...

# update model on new data only with a smaller learning rate

opt = SGD(learning_rate=0.001, momentum=0.9)

# compile the model

model.compile(optimizer=opt, loss='binary_crossentropy')

We can then fit the model on the new data only with this smaller learning rate.

...
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on new data
model.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)

...

model.compile(optimizer=opt, loss='binary_crossentropy')

# fit the model on new data

model.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)

Tying this together, the complete example of updating a neural network model on new data only is listed below.

# update neural network with new data only
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the model
model = Sequential()
model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# save model...

# load model...

# update model on new data only with a smaller learning rate
opt = SGD(learning_rate=0.001, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on new data
model.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)

# update neural network with new data only

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.optimizers import SGD

# define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# record the number of input features in the data

n_features = X.shape[1]

# split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

# define the model

model = Sequential()

model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

# compile the model

model.compile(optimizer=opt, loss='binary_crossentropy')

# fit the model on old data

model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# save model...

# load model...

# update model on new data only with a smaller learning rate

opt = SGD(learning_rate=0.001, momentum=0.9)

# compile the model

model.compile(optimizer=opt, loss='binary_crossentropy')

# fit the model on new data

model.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)

Next, let’s look at updating the model on new and old data.

Update Model on Old and New Data

We can update the model on a combination of both old and new data.

An extreme version of this approach is to discard the model and simply fit a new model on all available data, new and old. A less extreme version would be to use the existing model as a starting point and update it based on the combined dataset.

Again, it is a good idea to test both strategies and see what works well for your dataset.

We will focus on the less extreme update strategy in this case.

The synthetic dataset and model can be fit on the old dataset as before.

...
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the model
model = Sequential()
model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

...

# define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# record the number of input features in the data

n_features = X.shape[1]

# split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

# define the model

model = Sequential()

model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

# compile the model

model.compile(optimizer=opt, loss='binary_crossentropy')

# fit the model on old data

model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

New data comes available and we wish to update the model on a combination of both old and new data.

First, we must use a much smaller learning rate in an attempt to use the current weights as a starting point for the search.

Note: you will need to discover a learning rate that is appropriate for your model and dataset that achieves better performance than simply fitting a new model from scratch.

...
# update model with a smaller learning rate
opt = SGD(learning_rate=0.001, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')

...

# update model with a smaller learning rate

opt = SGD(learning_rate=0.001, momentum=0.9)

# compile the model

model.compile(optimizer=opt, loss='binary_crossentropy')

We can then create a composite dataset composed of old and new data.

...
# create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))

...

# create a composite dataset of old and new data

X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))

Finally, we can update the model on this composite dataset.

...
# fit the model on new data
model.fit(X_both, y_both, epochs=100, batch_size=32, verbose=0)

...

# fit the model on new data

model.fit(X_both, y_both, epochs=100, batch_size=32, verbose=0)

Tying this together, the complete example of updating a neural network model on both old and new data is listed below.

# update neural network with both old and new data
from numpy import vstack
from numpy import hstack
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the model
model = Sequential()
model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# save model...

# load model...

# update model with a smaller learning rate
opt = SGD(learning_rate=0.001, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))
# fit the model on new data
model.fit(X_both, y_both, epochs=100, batch_size=32, verbose=0)

# update neural network with both old and new data

from numpy import vstack

from numpy import hstack

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.optimizers import SGD

# define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# record the number of input features in the data

n_features = X.shape[1]

# split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

# define the model

model = Sequential()

model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

# compile the model

model.compile(optimizer=opt, loss='binary_crossentropy')

# fit the model on old data

model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# save model...

# load model...

# update model with a smaller learning rate

opt = SGD(learning_rate=0.001, momentum=0.9)

# compile the model

model.compile(optimizer=opt, loss='binary_crossentropy')

# create a composite dataset of old and new data

X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))

# fit the model on new data

model.fit(X_both, y_both, epochs=100, batch_size=32, verbose=0)

Next, let’s look at how to use ensemble models to respond to new data.

Ensemble Update Strategies

An ensemble is a predictive model that is composed of multiple other models.

There are many different types of ensemble models, although perhaps the simplest approach is to average the predictions from multiple different models.

For more on ensemble algorithms for deep learning neural networks, see the tutorial:

Ensemble Learning Methods for Deep Learning Neural Networks

We can use an ensemble model as a strategy when responding to changes in the underlying data or availability of new data.

Mirroring the approaches in the previous section, we might consider two approaches to ensemble learning algorithms as strategies for responding to new data; they are:

Ensemble of existing model and new model fit on new data only.
Ensemble of existing model and new model fit on old and new data.

Again, we might consider variations on these approaches, such as samples of old and new data, and more than one existing or additional models included in the ensemble.

Nevertheless, these are the two main strategies to consider.

Let’s make these approaches concrete with a worked example.

Ensemble Model With Model on New Data Only

We can create an ensemble of the existing model and a new model fit on only the new data.

The expectation is that the ensemble predictions perform better or are more stable (lower variance) than using either the old model or the new model alone. This should be checked on your dataset before adopting the ensemble.

First, we can prepare the dataset and fit the old model, as we did in the previous sections.

...
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
old_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
old_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

...

# define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# record the number of input features in the data

n_features = X.shape[1]

# split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

# define the old model

old_model = Sequential()

old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

old_model.add(Dense(1, activation='sigmoid'))

# define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

# compile the model

old_model.compile(optimizer=opt, loss='binary_crossentropy')

# fit the model on old data

old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

Some time passes and new data becomes available.

We can then fit a new model on the new data, naturally discovering a model and configuration that works well or best on the new dataset only.

In this case, we’ll simply use the same model architecture and configuration as the old model.

...
# define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
new_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
new_model.compile(optimizer=opt, loss='binary_crossentropy')

...

# define the new model

new_model = Sequential()

new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

new_model.add(Dense(1, activation='sigmoid'))

# define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

# compile the model

new_model.compile(optimizer=opt, loss='binary_crossentropy')

We can then fit this new model on the new data only.

...
# fit the model on old data
new_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)

...

# fit the model on old data

new_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)

Now that we have the two models, we can make predictions with each model, and calculate the average of the predictions as the “ensemble prediction.”

...
# make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
# combine predictions into single array
combined = hstack((yhat1, yhat2))
# calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)

...

# make predictions with both models

yhat1 = old_model.predict(X_new)

yhat2 = new_model.predict(X_new)

# combine predictions into single array

combined = hstack((yhat1, yhat2))

# calculate outcome as mean of predictions

yhat = mean(combined, axis=-1)

Tying this together, the complete example of updating using an ensemble of the existing model and a new model fit on new data only is listed below.

# ensemble old neural network with new model fit on new data only
from numpy import hstack
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
old_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
old_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# save model...

# load model...

# define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
new_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
new_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
new_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)

# make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
# combine predictions into single array
combined = hstack((yhat1, yhat2))
# calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)

# ensemble old neural network with new model fit on new data only

from numpy import hstack

from numpy import mean

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.optimizers import SGD

# define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# record the number of input features in the data

n_features = X.shape[1]

# split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

# define the old model

old_model = Sequential()

old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

old_model.add(Dense(1, activation='sigmoid'))

# define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

# compile the model

old_model.compile(optimizer=opt, loss='binary_crossentropy')

# fit the model on old data

old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# save model...

# load model...

# define the new model

new_model = Sequential()

new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

new_model.add(Dense(1, activation='sigmoid'))

# define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

# compile the model

new_model.compile(optimizer=opt, loss='binary_crossentropy')

# fit the model on old data

new_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)

# make predictions with both models

yhat1 = old_model.predict(X_new)

yhat2 = new_model.predict(X_new)

# combine predictions into single array

combined = hstack((yhat1, yhat2))

# calculate outcome as mean of predictions

yhat = mean(combined, axis=-1)

Ensemble Model With Model on Old and New Data

We can create an ensemble of the existing model and a new model fit on both the old and the new data.

First, we can prepare the dataset and fit the old model, as we did in the previous sections.

...
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
old_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
old_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

...

# define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# record the number of input features in the data

n_features = X.shape[1]

# split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

# define the old model

old_model = Sequential()

old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

old_model.add(Dense(1, activation='sigmoid'))

# define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

# compile the model

old_model.compile(optimizer=opt, loss='binary_crossentropy')

# fit the model on old data

old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

Some time passes and new data becomes available.

We can then fit a new model on a composite of the old and new data, naturally discovering a model and configuration that works well or best on the new dataset only.

In this case, we’ll simply use the same model architecture and configuration as the old model.

...
# define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
new_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
new_model.compile(optimizer=opt, loss='binary_crossentropy')

...

# define the new model

new_model = Sequential()

new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

new_model.add(Dense(1, activation='sigmoid'))

# define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

# compile the model

new_model.compile(optimizer=opt, loss='binary_crossentropy')

We can create a composite dataset from the old and new data, then fit the new model on this dataset.

...
# create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))
# fit the model on old data
new_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)

...

# create a composite dataset of old and new data

X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))

# fit the model on old data

new_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)

Finally, we can use both models together to make ensemble predictions.

...
# make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
# combine predictions into single array
combined = hstack((yhat1, yhat2))
# calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)

...

# make predictions with both models

yhat1 = old_model.predict(X_new)

yhat2 = new_model.predict(X_new)

# combine predictions into single array

combined = hstack((yhat1, yhat2))

# calculate outcome as mean of predictions

yhat = mean(combined, axis=-1)

Tying this together, the complete example of updating using an ensemble of the existing model and a new model fit on the old and new data is listed below.

# ensemble old neural network with new model fit on old and new data
from numpy import hstack
from numpy import vstack
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
old_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
old_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# save model...

# load model...

# define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
new_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
new_model.compile(optimizer=opt, loss='binary_crossentropy')
# create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))
# fit the model on old data
new_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)

# make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
# combine predictions into single array
combined = hstack((yhat1, yhat2))
# calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)

# ensemble old neural network with new model fit on old and new data

from numpy import hstack

from numpy import vstack

from numpy import mean

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.optimizers import SGD

# define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# record the number of input features in the data

n_features = X.shape[1]

# split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

# define the old model

old_model = Sequential()

old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

old_model.add(Dense(1, activation='sigmoid'))

# define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

# compile the model

old_model.compile(optimizer=opt, loss='binary_crossentropy')

# fit the model on old data

old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# save model...

# load model...

# define the new model

new_model = Sequential()

new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

new_model.add(Dense(1, activation='sigmoid'))

# define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

# compile the model

new_model.compile(optimizer=opt, loss='binary_crossentropy')

# create a composite dataset of old and new data

X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))

# fit the model on old data

new_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)

# make predictions with both models

yhat1 = old_model.predict(X_new)

yhat2 = new_model.predict(X_new)

# combine predictions into single array

combined = hstack((yhat1, yhat2))

# calculate outcome as mean of predictions

yhat = mean(combined, axis=-1)

Summary

In this tutorial, you discovered how to update deep learning neural network models in response to new data.

Specifically, you learned:

Neural network models may need to be updated when the underlying data changes or when new labeled data is made available.
How to update trained neural network models with just new data or combinations of old and new data.
How to create an ensemble of existing and new models trained on just new data or combinations of old and new data.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

42 Responses to How to Update Neural Network Models With More Data

Jack March 5, 2021 at 6:59 pm #

hi, Dr. Jason Brownlee

Good example
but i cannot see the full code in the website, because i can not move the screenshot and see the all code, can i see the all code?

Reply
- Jason Brownlee March 6, 2021 at 5:15 am #
  
  Sorry, I don’t understand. Can you please elaborate?
  
  What do you mean, you cannot see the full code example?
  
  Reply
- Cpc March 6, 2021 at 6:49 am #
  
  Try to press and hold for a few seconds on the code box. I did that and a horizontal bar appeared. Now I can read all the code.
  
  Reply
  - Jason Brownlee March 6, 2021 at 9:02 am #
    
    Copy-paste has been fixed, sorry about that.
    
    Reply
Vidya March 8, 2021 at 1:18 pm #

Hi Jason.

What does kernel_initailizer = ‘he_normal’ do ? Tried understanding from tensorflow.keras website but not clear .

Thanks so much!

Reply
- Jason Brownlee March 8, 2021 at 1:35 pm #
  
  It defines the way we set random weights before we set training, called weight initialization.
  
  You can learn more here:
  https://machinelearningmastery.com/weight-initialization-for-deep-learning-neural-networks/
  
  Reply
  - Vidya March 8, 2021 at 9:56 pm #
    
    Thank you!
    
    Reply
Vidya March 8, 2021 at 1:24 pm #

Hi Jason .

Thanks for the post.
When we get new data , isn’t it always better to add to the old data and retrain ? As more the historical data , more better is the inferencing . But does it vary with case ?

Reply
- Jason Brownlee March 8, 2021 at 1:36 pm #
  
  You’re welcome.
  
  No good general rules. It really depends on how much new data there is and how different it is.
  
  Reply
Niranjan March 12, 2021 at 11:30 pm #

Hi Jason,

Thanks for this post.

I am just wondering to know that what are your thoughts on doing addition of new labels into the existing model.
Lets say, I have Model A already trained on 10 classes and then after some time I want to add 2 more classes dataset to the existing, one option is that retrain whole model with all classes 12 and use previous weights.
But my concern is that, do you think we can retrain on 12 classes [10 old + 2 new] without using older data or some percentage of older data and whole new dataset.?? We can use previous trained model weights.

Thanks

Reply
- Jason Brownlee March 13, 2021 at 5:33 am #
  
  It feels more like transfer learning than model updating. I’d probably cut off the output layer and fit a new output layer.
  
  Nevertheless, experiment and discvoer what works well for your dataset.
  
  Reply
  - Nishant Kumar February 3, 2022 at 4:23 pm #
    
    HI Jason ,
    
    This is for neural networks.
    
    Let’s say I have a robust sklearn model but it does not support partial fit .
    
    In this case, how should we proceed ?
    
    Reply
    - James Carmichael February 4, 2022 at 10:21 am #
      
      Hi Nishant…Please clarify what you are wanting to accomplish with your model so that I may better assist you.
      
      Reply
yan April 28, 2021 at 10:06 am #

Think you very much for this work.

Reply
- Jason Brownlee April 29, 2021 at 6:21 am #
  
  You’re welcome.
  
  Reply
Mostafa Amin RIZK July 14, 2021 at 12:26 am #

Hello,
Is there any published research articles about this topic?

Reply
- Jason Brownlee July 14, 2021 at 5:30 am #
  
  Perhaps search on: scholar.google.com
  
  Reply
ishaque July 14, 2021 at 8:38 pm #

Hi sir, my question is that for implementing the Update Model on New Data, I need to change the input_shape of the model to train new data on the new model, so please tell me how I can change the input_shape of the existing model.

Reply
- Jason Brownlee July 15, 2021 at 5:28 am #
  
  You can remove the input layer from the model and define a new input layer with the desired shape. This is easily done using the functional API and is done all the time in transfer learning (see some examples on the blog).
  
  Reply
  - ishaque July 16, 2021 at 11:00 pm #
    
    could you please give me code for ” remove the input layer from the model and define a new input layer with the desired shape in functional API” I have been searching for the solution but I didn’t find a solution?
    
    FYI I have read the blog for functional API and that was great thank you.
    help
    
    Reply
    - Jason Brownlee July 17, 2021 at 5:23 am #
      
      There are many examples of this on the blog, use the search and look at examples for feature selection on image classification datasets.
      
      Reply
      - ishaque July 17, 2021 at 4:32 pm #
        
        plz, share the code for changing the input of existing I am so confused. kindly help
      - Jason Brownlee July 18, 2021 at 5:21 am #
        
        Sorry, I don’t have the capacity to prepare custom code for you.
ww July 26, 2021 at 11:23 am #

Hi, sir. Thanks for your post.
Now I use NN model to forecasting time series, when I have new time series data, can i update my NN model using the new time series data?

Reply
- Jason Brownlee July 27, 2021 at 5:04 am #
  
  The above tutorial shows you how to update your model.
  
  Reply
recohut October 26, 2021 at 4:07 am #

Isn’t this retraining strategy cause catastrophic forgetting…..can u also do some tutorial on incremental learning based retraining please?

Reply
- Adrian Tam October 27, 2021 at 2:51 am #
  
  Yes, that can be catastrophic forgetting so that’s why, for example, we use a smaller learning rate in retraining.
  
  Reply
Saurav Pandey February 14, 2022 at 5:56 pm #

Sir,
First of all thank you very much for these blogs.I have some question regarding retraining of model on new data.
1. In the strategy where we update our old model on new and old dataset,isn’t this strategy will be very prone to overfitting on old dataset?
2. What if we use sample of old data and whole new data for updating the model.Will this strategy work?

Reply
Shruti February 23, 2022 at 7:17 am #

Hello Jason,

Is it possible to change the cost function to update the trained model? Suppose I started training with MSE as a cost function but now with new dataset I want to update with MAE as cost function.

Reply
- James Carmichael February 23, 2022 at 12:18 pm #
  
  Hello Shruti…You may find the following of interest:
  
  https://stackoverflow.com/questions/60996892/how-to-replace-loss-function-during-training-tensorflow-keras
  
  Reply
Mara March 25, 2022 at 8:50 pm #

Hi,
First of all, thank you for your posts! They are extremely valuable.
Now, I was was wondering how to handle normalized data. To elaborate: I train my model with normalized data, train, validation and test set for the model are all scaled with the min and max value of the training set. My question is now: do I scale new data also with the min and max of the original training data or do I scale it with their own min and max?

Reply
- James Carmichael March 28, 2022 at 8:03 am #
  
  Hi Mara…New validation data does not need to be scaled. Normalization and scaling are used to improve training.
  
  Reply
Peter Allan April 5, 2022 at 10:55 pm #

I have new data which I want to combine with my old data and update my pretrained network in light of this new data. My starting point will be the previous best weights and the learning rate used when I stopped the training previously (I’m using a learning rate scheduler)

Firstly, what is the best way to perform the test-train split? It looks like you add all the new data to the training set, i.e. reserve non for a test set. Would it not be better to split the new data in the same split as the old, e.g. 80:20 and add the data to the old train and test sets?

In addition, I use various transformers (min-max, box-cox etc.) on my old data prior to training. What is the best strategy here? Refit the transformers on the combined old and new data before proceeding to update the network? Or apply the transform fitted to the old data on the new? I see pros and cons of both approaches here.

Reply
- James Carmichael April 6, 2022 at 8:38 am #
  
  Hi Peter…You may find the following beneficial:
  
  https://machinelearningmastery.com/train-test-split-for-evaluating-machine-learning-algorithms/
  
  Reply
Emily Hunter April 6, 2022 at 1:35 pm #

Hi Jason,

Thanks for the amazing post.

I have a query regarding updating the model. I have developed a XGBoost model for classification that I have trained & tuned using events from a period of 24 months. I want to run the model in production. However, I want to retrain the model using the latest month data (whenever it is available) but with the same set of hyperparameters that I have tuned. The approach is that I will discard the oldest month events and add new month so I’ll always have a 24 months window.
Does this make sense or I’d have to tune the hyperparameters in each run too?

Reply
- James Carmichael April 7, 2022 at 9:50 am #
  
  Hi Emily…The following resource may help add clarity regarding updating a model with new data:
  
  https://machinelearningmastery.com/update-neural-network-models-with-more-data/
  
  Reply
Peter Allan April 8, 2022 at 6:12 pm #

I’m aware of how to split data my question was more aimed at how to update old data with new data and transform, but thank you for your help

Reply
- James Carmichael April 9, 2022 at 8:45 am #
  
  You are very welcome Peter!
  
  Reply
Darshan S.P September 1, 2022 at 2:22 pm #

Hi Sir,

I am new to this model training part.

Should we use always model.compile after loading pre-trained model.

This will starts as fresh weights and bias for new data training right?

I need to know when to use compile=True or False while retraining the pretrained models.

Reply
- James Carmichael September 2, 2022 at 9:25 am #
  
  Hi Darshan…The following discussion should add clarity:
  
  https://stackoverflow.com/questions/47995324/does-model-compile-initialize-all-the-weights-and-biases-in-keras-tensorflow
  
  Reply
Ziming Wang October 11, 2022 at 12:11 pm #

Hi Sir, thanks for your post!
I have two question.
Firstly, what is the native difference in the training process with only new data and the training process with combination of new data and old data?
Secondly, if we could directly modify the weights of the trained model through tf? If we could get the weights of the trained model and modify it?

Reply
- James Carmichael October 12, 2022 at 7:57 am #
  
  Hi Ziming…Please clarify your first question so that we may better assist you. Regarding your second question, the following resource may be of interest:
  
  https://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/
  
  Reply

Navigation

How to Update Neural Network Models With More Data

Tutorial Overview

Updating Neural Network Models

Retraining Update Strategies

Update Model on New Data Only

Update Model on Old and New Data

Ensemble Update Strategies

Ensemble Model With Model on New Data Only

Ensemble Model With Model on Old and New Data

Further Reading

Tutorials

Summary

More On This Topic

42 Responses to How to Update Neural Network Models With More Data

Leave a Reply Click here to cancel reply.