Ensemble Model With Model on New Data Only

We can create an ensemble of the existing model and a new model fit on only the new data.

The expectation is that the ensemble predictions perform better or are more stable (lower variance) than using either the old model or the new model alone. This should be checked on your dataset before adopting the ensemble.

First, we can prepare the dataset and fit the old model, as we did in the previous sections.

define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

record the number of input features in the data

n_features = X.shape[1]

split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

define the old model

old_model = Sequential()
old_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
old_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
old_model.add(Dense(1, activation=‘sigmoid’))

define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

compile the model

old_model.compile(optimizer=opt, loss=‘binary_crossentropy’)

fit the model on old data

old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

record the number of input features in the data

n_features = X.shape[1]

split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

define the old model

old_model = Sequential()
old_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
old_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
old_model.add(Dense(1, activation=‘sigmoid’))

define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

compile the model

old_model.compile(optimizer=opt, loss=‘binary_crossentropy’)

fit the model on old data

old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)
Some time passes and new data becomes available.

We can then fit a new model on the new data, naturally discovering a model and configuration that works well or best on the new dataset only.

In this case, we’ll simply use the same model architecture and configuration as the old model.

define the new model

new_model = Sequential()
new_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
new_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
new_model.add(Dense(1, activation=‘sigmoid’))

define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

compile the model

new_model.compile(optimizer=opt, loss=‘binary_crossentropy’)
1
2
3
4
5
6
7
8
9
10

define the new model

new_model = Sequential()
new_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
new_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
new_model.add(Dense(1, activation=‘sigmoid’))

define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

compile the model

new_model.compile(optimizer=opt, loss=‘binary_crossentropy’)
We can then fit this new model on the new data only.

fit the model on old data

new_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)
1
2
3

fit the model on old data

new_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)
Now that we have the two models, we can make predictions with each model, and calculate the average of the predictions as the “ensemble prediction.”

make predictions with both models

yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)

combine predictions into single array

combined = hstack((yhat1, yhat2))

calculate outcome as mean of predictions

yhat = mean(combined, axis=-1)
1
2
3
4
5
6
7
8

make predictions with both models

yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)

combine predictions into single array

combined = hstack((yhat1, yhat2))

calculate outcome as mean of predictions

yhat = mean(combined, axis=-1)
Tying this together, the complete example of updating using an ensemble of the existing model and a new model fit on new data only is listed below.

ensemble old neural network with new model fit on new data only

from numpy import hstack
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

record the number of input features in the data

n_features = X.shape[1]

split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

define the old model

old_model = Sequential()
old_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
old_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
old_model.add(Dense(1, activation=‘sigmoid’))

define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

compile the model

old_model.compile(optimizer=opt, loss=‘binary_crossentropy’)

fit the model on old data

old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

save model…

load model…

define the new model

new_model = Sequential()
new_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
new_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
new_model.add(Dense(1, activation=‘sigmoid’))

define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

compile the model

new_model.compile(optimizer=opt, loss=‘binary_crossentropy’)

fit the model on old data

new_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)

make predictions with both models

yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)

combine predictions into single array

combined = hstack((yhat1, yhat2))

calculate outcome as mean of predictions

yhat = mean(combined, axis=-1)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

ensemble old neural network with new model fit on new data only

from numpy import hstack
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

define dataset

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

record the number of input features in the data

n_features = X.shape[1]

split into old and new data

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

define the old model

old_model = Sequential()
old_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
old_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
old_model.add(Dense(1, activation=‘sigmoid’))

define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

compile the model

old_model.compile(optimizer=opt, loss=‘binary_crossentropy’)

fit the model on old data

old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

save model…

load model…

define the new model

new_model = Sequential()
new_model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=n_features))
new_model.add(Dense(10, kernel_initializer=‘he_normal’, activation=‘relu’))
new_model.add(Dense(1, activation=‘sigmoid’))

define the optimization algorithm

opt = SGD(learning_rate=0.01, momentum=0.9)

compile the model

new_model.compile(optimizer=opt, loss=‘binary_crossentropy’)

fit the model on old data

new_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)

make predictions with both models

yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)

combine predictions into single array

combined = hstack((yhat1, yhat2))

calculate outcome as mean of predictions

yhat = mean(combined, axis=-1)