Retraining Update Strategies on New Data Only

swapneel-panda-419bc751 · 7 June 2021 12:54

Update Model on New Data Only
We can update the model on the new data only.

One extreme version of this approach is to not use any new data and simply re-train the model on the old data. This might be the same as “do nothing” in response to the new data. At the other extreme, a model could be fit on the new data only, discarding the old data and old model.

Ignore new data, do nothing.
Update existing model on new data.
Fit new model on new data, discard old model and data.
We will focus on the middle ground in this example, but it might be interesting to test all three approaches on your problem and see what works best.

First, we can define a synthetic binary classification dataset and split it into half, then use one portion as “old data” and another portion as “new data.”

…

define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

record the number of input features in the data

n_features = X.shape[1]

split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
1
2
3
4
5
6
7
…

define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

record the number of input features in the data

n_features = X.shape[1]

split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
We can then define a Multilayer Perceptron model (MLP) and fit it on the old data only.

…

define the model

model = Sequential()
model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
model.add(Dense(1, activation=‘sigmoid’))

define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

compile the model

model.compile(optimizer=opt, loss=‘binary_crossentropy’)

fit the model on old data

model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)
1
2
3
4
5
6
7
8
9
10
11
12
…

define the model

model = Sequential()
model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
model.add(Dense(1, activation=‘sigmoid’))

define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

compile the model

model.compile(optimizer=opt, loss=‘binary_crossentropy’)

fit the model on old data

model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)
We can then imagine saving the model and using it for some time.

Time passes, and we wish to update it on new data that has become available.

This would involve using a much smaller learning rate than normal so that we do not wash away the weights learned on the old data.

Note: you will need to discover a learning rate that is appropriate for your model and dataset that achieves better performance than simply fitting a new model from scratch.

…

update model on new data only with a smaller learning rate

opt = SGD(learning_rate=0.001, momentum=0.9)

compile the model

model.compile(optimizer=opt, loss=‘binary_crossentropy’)
1
2
3
4
5
…

update model on new data only with a smaller learning rate

opt = SGD(learning_rate=0.001, momentum=0.9)

compile the model

model.compile(optimizer=opt, loss=‘binary_crossentropy’)
We can then fit the model on the new data only with this smaller learning rate.

…
model.compile(optimizer=opt, loss=‘binary_crossentropy’)

fit the model on new data

model.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)
1
2
3
4
…
model.compile(optimizer=opt, loss=‘binary_crossentropy’)

fit the model on new data

model.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)
Tying this together, the complete example of updating a neural network model on new data only is listed below.

update neural network with new data only

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

record the number of input features in the data

n_features = X.shape[1]

split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

define the model

model = Sequential()
model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
model.add(Dense(1, activation=‘sigmoid’))

define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

compile the model

model.compile(optimizer=opt, loss=‘binary_crossentropy’)

fit the model on old data

model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

save model…

load model…

update model on new data only with a smaller learning rate

opt = SGD(learning_rate=0.001, momentum=0.9)

compile the model

model.compile(optimizer=opt, loss=‘binary_crossentropy’)

fit the model on new data

model.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

update neural network with new data only

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

record the number of input features in the data

n_features = X.shape[1]

split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

define the model

model = Sequential()
model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
model.add(Dense(1, activation=‘sigmoid’))

define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

compile the model

model.compile(optimizer=opt, loss=‘binary_crossentropy’)

fit the model on old data

model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

save model…

load model…

update model on new data only with a smaller learning rate

opt = SGD(learning_rate=0.001, momentum=0.9)

compile the model

model.compile(optimizer=opt, loss=‘binary_crossentropy’)

fit the model on new data

model.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)