How to Develop a Neural Net for Predicting Car Insurance Payout-2

First MLP and Learning Dynamics

We will develop a Multilayer Perceptron (MLP) model for the dataset using TensorFlow.

We cannot know what model architecture of learning hyperparameters would be good or best for this dataset, so we must experiment and discover what works well.

Given that the dataset is small, a small batch size is probably a good idea, e.g. 8 or 16 rows. Using the Adam version of stochastic gradient descent is a good idea when getting started as it will automatically adapt the learning rate and works well on most datasets.

Before we evaluate models in earnest, it is a good idea to review the learning dynamics and tune the model architecture and learning configuration until we have stable learning dynamics, then look at getting the most out of the model.

We can do this by using a simple train/test split of the data and review plots of the learning curves. This will help us see if we are over-learning or under-learning; then we can adapt the configuration accordingly.

First, we can split the dataset into input and output variables, then into 67/33 train and test sets.

split into input and output columns

X, y = df.values[:, :-1], df.values[:, -1]

split into train and test datasets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

split into input and output columns

X, y = df.values[:, :-1], df.values[:, -1]

split into train and test datasets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
Next, we can define a minimal MLP model. In this case, we will use one hidden layer with 10 nodes and one output layer (chosen arbitrarily). We will use the ReLU activation function in the hidden layer and the “he_normal” weight initialization, as together, they are a good practice.

The output of the model is a linear activation (no activation) and we will minimize mean squared error (MSE) loss.

determine the number of input features

n_features = X.shape[1]

define model

model = Sequential()
model.add(Dense(10, activation=‘relu’, kernel_initializer=‘he_normal’, input_shape=(n_features,)))
model.add(Dense(1))

compile the model

model.compile(optimizer=‘adam’, loss=‘mse’)

determine the number of input features

n_features = X.shape[1]

define model

model = Sequential()
model.add(Dense(10, activation=‘relu’, kernel_initializer=‘he_normal’, input_shape=(n_features,)))
model.add(Dense(1))

compile the model

model.compile(optimizer=‘adam’, loss=‘mse’)
We will fit the model for 100 training epochs (chosen arbitrarily) with a batch size of eight because it is a small dataset.

We are fitting the model on raw data, which we think might be a bad idea, but it is an important starting point.

fit the model

history = model.fit(X_train, y_train, epochs=100, batch_size=8, verbose=0, validation_data=(X_test,y_test))

fit the model

history = model.fit(X_train, y_train, epochs=100, batch_size=8, verbose=0, validation_data=(X_test,y_test))
At the end of training, we will evaluate the model’s performance on the test dataset and report performance as the mean absolute error (MAE), which I typically prefer over MSE or RMSE.

predict test set

yhat = model.predict(X_test)

evaluate predictions

score = mean_absolute_error(y_test, yhat)
print(‘MAE: %.3f’ % score)

predict test set

yhat = model.predict(X_test)

evaluate predictions

score = mean_absolute_error(y_test, yhat)
print(‘MAE: %.3f’ % score)
Finally, we will plot learning curves of the MSE loss on the train and test sets during training.

plot learning curves

pyplot.title(‘Learning Curves’)
pyplot.xlabel(‘Epoch’)
pyplot.ylabel(‘Mean Squared Error’)
pyplot.plot(history.history[‘loss’], label=‘train’)
pyplot.plot(history.history[‘val_loss’], label=‘val’)
pyplot.legend()
pyplot.show()

plot learning curves

pyplot.title(‘Learning Curves’)
pyplot.xlabel(‘Epoch’)
pyplot.ylabel(‘Mean Squared Error’)
pyplot.plot(history.history[‘loss’], label=‘train’)
pyplot.plot(history.history[‘val_loss’], label=‘val’)
pyplot.legend()
pyplot.show()
Tying this all together, the complete example of evaluating our first MLP on the auto insurance dataset is listed below.

fit a simple mlp model and review learning curves

from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from matplotlib import pyplot

load the dataset

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv
df = read_csv(path, header=None)

split into input and output columns

X, y = df.values[:, :-1], df.values[:, -1]

split into train and test datasets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

determine the number of input features

n_features = X.shape[1]

define model

model = Sequential()
model.add(Dense(10, activation=‘relu’, kernel_initializer=‘he_normal’, input_shape=(n_features,)))
model.add(Dense(1))

compile the model

model.compile(optimizer=‘adam’, loss=‘mse’)

fit the model

history = model.fit(X_train, y_train, epochs=100, batch_size=8, verbose=0, validation_data=(X_test,y_test))

predict test set

yhat = model.predict(X_test)

evaluate predictions

score = mean_absolute_error(y_test, yhat)
print(‘MAE: %.3f’ % score)

plot learning curves

pyplot.title(‘Learning Curves’)
pyplot.xlabel(‘Epoch’)
pyplot.ylabel(‘Mean Squared Error’)
pyplot.plot(history.history[‘loss’], label=‘train’)
pyplot.plot(history.history[‘val_loss’], label=‘val’)
pyplot.legend()
pyplot.show()

fit a simple mlp model and review learning curves

from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from matplotlib import pyplot

load the dataset

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv
df = read_csv(path, header=None)

split into input and output columns

X, y = df.values[:, :-1], df.values[:, -1]

split into train and test datasets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

determine the number of input features

n_features = X.shape[1]

define model

model = Sequential()
model.add(Dense(10, activation=‘relu’, kernel_initializer=‘he_normal’, input_shape=(n_features,)))
model.add(Dense(1))

compile the model

model.compile(optimizer=‘adam’, loss=‘mse’)

fit the model

history = model.fit(X_train, y_train, epochs=100, batch_size=8, verbose=0, validation_data=(X_test,y_test))

predict test set

yhat = model.predict(X_test)

evaluate predictions

score = mean_absolute_error(y_test, yhat)
print(‘MAE: %.3f’ % score)

plot learning curves

pyplot.title(‘Learning Curves’)
pyplot.xlabel(‘Epoch’)
pyplot.ylabel(‘Mean Squared Error’)
pyplot.plot(history.history[‘loss’], label=‘train’)
pyplot.plot(history.history[‘val_loss’], label=‘val’)
pyplot.legend()
pyplot.show()
Running the example first fits the model on the training dataset, then reports the MAE on the test dataset.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the model achieved a MAE of about 33.2, which is a good baseline in performance, which we might be able to improve upon.

MAE: 33.233
1
MAE: 33.233
Line plots of the MSE on the train and test sets are then created.

We can see that the model has a good fit and converges nicely. The configuration of the model is a good starting point.

Learning Curves of Simple MLP on Auto Insurance Dataset
Learning Curves of Simple MLP on Auto Insurance Dataset

The learning dynamics are good so far, and the MAE is a rough estimate and should not be relied upon.

We can probably increase the capacity of the model a little and expect similar learning dynamics. For example, we can add a second hidden layer with eight nodes (chosen arbitrarily) and double the number of training epochs to 200.

define model

model = Sequential()
model.add(Dense(10, activation=‘relu’, kernel_initializer=‘he_normal’, input_shape=(n_features,)))
model.add(Dense(8, activation=‘relu’, kernel_initializer=‘he_normal’))
model.add(Dense(1))

compile the model

model.compile(optimizer=‘adam’, loss=‘mse’)

fit the model

history = model.fit(X_train, y_train, epochs=200, batch_size=8, verbose=0, validation_data=(X_test,y_test))

define model

model = Sequential()
model.add(Dense(10, activation=‘relu’, kernel_initializer=‘he_normal’, input_shape=(n_features,)))
model.add(Dense(8, activation=‘relu’, kernel_initializer=‘he_normal’))
model.add(Dense(1))

compile the model

model.compile(optimizer=‘adam’, loss=‘mse’)

fit the model

history = model.fit(X_train, y_train, epochs=200, batch_size=8, verbose=0, validation_data=(X_test,y_test))
The complete example is listed below.

fit a deeper mlp model and review learning curves

from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from matplotlib import pyplot

load the dataset

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv
df = read_csv(path, header=None)

split into input and output columns

X, y = df.values[:, :-1], df.values[:, -1]

split into train and test datasets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

determine the number of input features

n_features = X.shape[1]

define model

model = Sequential()
model.add(Dense(10, activation=‘relu’, kernel_initializer=‘he_normal’, input_shape=(n_features,)))
model.add(Dense(8, activation=‘relu’, kernel_initializer=‘he_normal’))
model.add(Dense(1))

compile the model

model.compile(optimizer=‘adam’, loss=‘mse’)

fit the model

history = model.fit(X_train, y_train, epochs=200, batch_size=8, verbose=0, validation_data=(X_test,y_test))

predict test set

yhat = model.predict(X_test)

evaluate predictions

score = mean_absolute_error(y_test, yhat)
print(‘MAE: %.3f’ % score)

plot learning curves

pyplot.title(‘Learning Curves’)
pyplot.xlabel(‘Epoch’)
pyplot.ylabel(‘Mean Squared Error’)
pyplot.plot(history.history[‘loss’], label=‘train’)
pyplot.plot(history.history[‘val_loss’], label=‘val’)
pyplot.legend()
pyplot.show()

fit a deeper mlp model and review learning curves

from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from matplotlib import pyplot

load the dataset

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv
df = read_csv(path, header=None)

split into input and output columns

X, y = df.values[:, :-1], df.values[:, -1]

split into train and test datasets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

determine the number of input features

n_features = X.shape[1]

define model

model = Sequential()
model.add(Dense(10, activation=‘relu’, kernel_initializer=‘he_normal’, input_shape=(n_features,)))
model.add(Dense(8, activation=‘relu’, kernel_initializer=‘he_normal’))
model.add(Dense(1))

compile the model

model.compile(optimizer=‘adam’, loss=‘mse’)

fit the model

history = model.fit(X_train, y_train, epochs=200, batch_size=8, verbose=0, validation_data=(X_test,y_test))

predict test set

yhat = model.predict(X_test)

evaluate predictions

score = mean_absolute_error(y_test, yhat)
print(‘MAE: %.3f’ % score)

plot learning curves

pyplot.title(‘Learning Curves’)
pyplot.xlabel(‘Epoch’)
pyplot.ylabel(‘Mean Squared Error’)
pyplot.plot(history.history[‘loss’], label=‘train’)
pyplot.plot(history.history[‘val_loss’], label=‘val’)
pyplot.legend()
pyplot.show()
Running the example first fits the model on the training dataset, then reports the MAE on the test dataset.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see a slight improvement in MAE to about 27.9, although the high variance of the train/test split means that this evaluation is not reliable.

MAE: 27.939
1
MAE: 27.939
Learning curves for the MSE train and test sets are then plotted. We can see that, as expected, the model achieves a good fit and convergences within a reasonable number of iterations.

Learning Curves of Deeper MLP on Auto Insurance Dataset
Learning Curves of Deeper MLP on Auto Insurance Dataset

Finally, we can try transforming the data and see how this impacts the learning dynamics.

In this case, we will use a power transform to make the data distribution less skewed. This will also automatically standardize the variables so that they have a mean of zero and a standard deviation of one — a good practice when modeling with a neural network.

First, we must ensure that the target variable is a two-dimensional array.

ensure that the target variable is a 2d array

y_train, y_test = y_train.reshape((len(y_train),1)), y_test.reshape((len(y_test),1))

ensure that the target variable is a 2d array

y_train, y_test = y_train.reshape((len(y_train),1)), y_test.reshape((len(y_test),1))
Next, we can apply a PowerTransformer to the input and target variables.

This can be achieved by first fitting the transform on the training data, then transforming the train and test sets.

This process is applied separately for the input and output variables to avoid data leakage.

power transform input data

pt1 = PowerTransformer()
pt1.fit(X_train)
X_train = pt1.transform(X_train)
X_test = pt1.transform(X_test)

power transform output data

pt2 = PowerTransformer()
pt2.fit(y_train)
y_train = pt2.transform(y_train)
y_test = pt2.transform(y_test)

power transform input data

pt1 = PowerTransformer()
pt1.fit(X_train)
X_train = pt1.transform(X_train)
X_test = pt1.transform(X_test)

power transform output data

pt2 = PowerTransformer()
pt2.fit(y_train)
y_train = pt2.transform(y_train)
y_test = pt2.transform(y_test)
The data is then used to fit the model.

The transform can then be inverted on the predictions made by the model and the expected target values from the test set and we can calculate the MAE in the correct scale as before.

inverse transforms on target variable

y_test = pt2.inverse_transform(y_test)
yhat = pt2.inverse_transform(yhat)

inverse transforms on target variable

y_test = pt2.inverse_transform(y_test)
yhat = pt2.inverse_transform(yhat)
Tying this together, the complete example of fitting and evaluating an MLP with transformed data and creating learning curves of the model is listed below.

fit a mlp model with data transforms and review learning curves

from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import PowerTransformer
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from matplotlib import pyplot

load the dataset

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv
df = read_csv(path, header=None)

split into input and output columns

X, y = df.values[:, :-1], df.values[:, -1]

split into train and test datasets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

ensure that the target variable is a 2d array

y_train, y_test = y_train.reshape((len(y_train),1)), y_test.reshape((len(y_test),1))

power transform input data

pt1 = PowerTransformer()
pt1.fit(X_train)
X_train = pt1.transform(X_train)
X_test = pt1.transform(X_test)

power transform output data

pt2 = PowerTransformer()
pt2.fit(y_train)
y_train = pt2.transform(y_train)
y_test = pt2.transform(y_test)

determine the number of input features

n_features = X.shape[1]

define model

model = Sequential()
model.add(Dense(10, activation=‘relu’, kernel_initializer=‘he_normal’, input_shape=(n_features,)))
model.add(Dense(8, activation=‘relu’, kernel_initializer=‘he_normal’))
model.add(Dense(1))

compile the model

model.compile(optimizer=‘adam’, loss=‘mse’)

fit the model

history = model.fit(X_train, y_train, epochs=200, batch_size=8, verbose=0, validation_data=(X_test,y_test))

predict test set

yhat = model.predict(X_test)

inverse transforms on target variable

y_test = pt2.inverse_transform(y_test)
yhat = pt2.inverse_transform(yhat)

evaluate predictions

score = mean_absolute_error(y_test, yhat)
print(‘MAE: %.3f’ % score)

plot learning curves

pyplot.title(‘Learning Curves’)
pyplot.xlabel(‘Epoch’)
pyplot.ylabel(‘Mean Squared Error’)
pyplot.plot(history.history[‘loss’], label=‘train’)
pyplot.plot(history.history[‘val_loss’], label=‘val’)
pyplot.legend()
pyplot.show()

fit a mlp model with data transforms and review learning curves

from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import PowerTransformer
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from matplotlib import pyplot

load the dataset

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv
df = read_csv(path, header=None)

split into input and output columns

X, y = df.values[:, :-1], df.values[:, -1]

split into train and test datasets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

ensure that the target variable is a 2d array

y_train, y_test = y_train.reshape((len(y_train),1)), y_test.reshape((len(y_test),1))

power transform input data

pt1 = PowerTransformer()
pt1.fit(X_train)
X_train = pt1.transform(X_train)
X_test = pt1.transform(X_test)

power transform output data

pt2 = PowerTransformer()
pt2.fit(y_train)
y_train = pt2.transform(y_train)
y_test = pt2.transform(y_test)

determine the number of input features

n_features = X.shape[1]

define model

model = Sequential()
model.add(Dense(10, activation=‘relu’, kernel_initializer=‘he_normal’, input_shape=(n_features,)))
model.add(Dense(8, activation=‘relu’, kernel_initializer=‘he_normal’))
model.add(Dense(1))

compile the model

model.compile(optimizer=‘adam’, loss=‘mse’)

fit the model

history = model.fit(X_train, y_train, epochs=200, batch_size=8, verbose=0, validation_data=(X_test,y_test))

predict test set

yhat = model.predict(X_test)

inverse transforms on target variable

y_test = pt2.inverse_transform(y_test)
yhat = pt2.inverse_transform(yhat)

evaluate predictions

score = mean_absolute_error(y_test, yhat)
print(‘MAE: %.3f’ % score)

plot learning curves

pyplot.title(‘Learning Curves’)
pyplot.xlabel(‘Epoch’)
pyplot.ylabel(‘Mean Squared Error’)
pyplot.plot(history.history[‘loss’], label=‘train’)
pyplot.plot(history.history[‘val_loss’], label=‘val’)
pyplot.legend()
pyplot.show()
Running the example first fits the model on the training dataset, then reports the MAE on the test dataset.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, the model achieves a reasonable MAE score, although worse than the performance reported previously. We will ignore model performance for now.

MAE: 34.320
1
MAE: 34.320
Line plots of the learning curves are created showing that the model achieved a reasonable fit and had more than enough time to converge.