We can create an ensemble of the existing model and a new model fit on both the old and the new data.
The expectation is that the ensemble predictions perform better or are more stable (lower variance) than using either the old model or the new model alone. This should be checked on your dataset before adopting the ensemble.
First, we can prepare the dataset and fit the old model, as we did in the previous sections.
…
define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
record the number of input features in the data
n_features = X.shape[1]
split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
old_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
old_model.add(Dense(1, activation=‘sigmoid’))
define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
compile the model
old_model.compile(optimizer=opt, loss=‘binary_crossentropy’)
fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
…
define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
record the number of input features in the data
n_features = X.shape[1]
split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
old_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
old_model.add(Dense(1, activation=‘sigmoid’))
define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
compile the model
old_model.compile(optimizer=opt, loss=‘binary_crossentropy’)
fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)
Some time passes and new data becomes available.
We can then fit a new model on a composite of the old and new data, naturally discovering a model and configuration that works well or best on the new dataset only.
In this case, we’ll simply use the same model architecture and configuration as the old model.
…
define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
new_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
new_model.add(Dense(1, activation=‘sigmoid’))
define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
compile the model
new_model.compile(optimizer=opt, loss=‘binary_crossentropy’)
1
2
3
4
5
6
7
8
9
10
…
define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
new_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
new_model.add(Dense(1, activation=‘sigmoid’))
define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
compile the model
new_model.compile(optimizer=opt, loss=‘binary_crossentropy’)
We can create a composite dataset from the old and new data, then fit the new model on this dataset.
…
create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))
fit the model on old data
new_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)
1
2
3
4
5
…
create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))
fit the model on old data
new_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)
Finally, we can use both models together to make ensemble predictions.
…
make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
combine predictions into single array
combined = hstack((yhat1, yhat2))
calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)
1
2
3
4
5
6
7
8
…
make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
combine predictions into single array
combined = hstack((yhat1, yhat2))
calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)
Tying this together, the complete example of updating using an ensemble of the existing model and a new model fit on the old and new data is listed below.
ensemble old neural network with new model fit on old and new data
from numpy import hstack
from numpy import vstack
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
record the number of input features in the data
n_features = X.shape[1]
split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
old_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
old_model.add(Dense(1, activation=‘sigmoid’))
define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
compile the model
old_model.compile(optimizer=opt, loss=‘binary_crossentropy’)
fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)
save model…
load model…
define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
new_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
new_model.add(Dense(1, activation=‘sigmoid’))
define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
compile the model
new_model.compile(optimizer=opt, loss=‘binary_crossentropy’)
create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))
fit the model on old data
new_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)
make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
combine predictions into single array
combined = hstack((yhat1, yhat2))
calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
ensemble old neural network with new model fit on old and new data
from numpy import hstack
from numpy import vstack
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
record the number of input features in the data
n_features = X.shape[1]
split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
old_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
old_model.add(Dense(1, activation=‘sigmoid’))
define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
compile the model
old_model.compile(optimizer=opt, loss=‘binary_crossentropy’)
fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)
save model…
load model…
define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
new_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
new_model.add(Dense(1, activation=‘sigmoid’))
define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
compile the model
new_model.compile(optimizer=opt, loss=‘binary_crossentropy’)
create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))
fit the model on old data
new_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)
make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
combine predictions into single array
combined = hstack((yhat1, yhat2))
calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)