In this section, we will develop a prediction interval using the regression problem and model developed in the previous section.

Calculating prediction intervals for nonlinear regression algorithms like neural networks is challenging compared to linear methods like linear regression where the prediction interval calculation is trivial. There is no standard technique.

There are many ways to calculate an effective prediction interval for neural network models. I recommend some of the papers listed in the “further reading” section to learn more.

In this tutorial, we will use a very simple approach that has plenty of room for extension. I refer to it as “quick and dirty” because it is fast and easy to calculate, but is limited.

It involves fitting multiple final models (e.g. 10 to 30). The distribution of the point predictions from ensemble members is then used to calculate both a point prediction and a prediction interval.

For example, a point prediction can be taken as the mean of the point predictions from ensemble members, and a 95% prediction interval can be taken as 1.96 standard deviations from the mean.

This is a simple Gaussian prediction interval, although alternatives could be used, such as the min and max of the point predictions. Alternatively, the bootstrap method could be used to train each ensemble member on a different bootstrap sample and the 2.5th and 97.5th percentiles of the point predictions can be used as prediction intervals.

For more on the bootstrap method, see the tutorial:

A Gentle Introduction to the Bootstrap Method

These extensions are left as exercises; we will stick with the simple Gaussian prediction interval.

Let’s assume that the training dataset, defined in the previous section, is the entire dataset and we are training a final model or models on this entire dataset. We can then make predictions with prediction intervals on the test set and evaluate how effective the interval might be in the future.

We can simplify the code by dividing the elements developed in the previous section into functions.

First, let’s define a function for loading and preparing a regression dataset defined by a URL.

# load and prepare the dataset

def load_dataset(url):

dataframe = read_csv(url, header=None)

values = dataframe.values

# split into input and output values

X, y = values[:, :-1], values[:,-1]

# split into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.67, random_state=1)

# scale input data

scaler = MinMaxScaler()

scaler.fit(X_train)

X_train = scaler.transform(X_train)

X_test = scaler.transform(X_test)

return X_train, X_test, y_train, y_test

1

2

3

4

5

6

7

8

9

10

11

12

13

14

# load and prepare the dataset

def load_dataset(url):

dataframe = read_csv(url, header=None)

values = dataframe.values

# split into input and output values

X, y = values[:, :-1], values[:,-1]

# split into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.67, random_state=1)

# scale input data

scaler = MinMaxScaler()

scaler.fit(X_train)

X_train = scaler.transform(X_train)

X_test = scaler.transform(X_test)

return X_train, X_test, y_train, y_test

Next, we can define a function that will define and train an MLP model given the training dataset, then return the fit model ready for making predictions.

# define and fit the model

def fit_model(X_train, y_train):

# define neural network model

features = X_train.shape[1]

model = Sequential()

model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=features))

model.add(Dense(5, kernel_initializer=‘he_normal’, activation=‘relu’))

model.add(Dense(1))

# compile the model and specify loss and optimizer

opt = Adam(learning_rate=0.01, beta_1=0.85, beta_2=0.999)

model.compile(optimizer=opt, loss=‘mse’)

# fit the model on the training dataset

model.fit(X_train, y_train, verbose=0, epochs=300, batch_size=16)

return model

1

2

3

4

5

6

7

8

9

10

11

12

13

14

# define and fit the model

def fit_model(X_train, y_train):

# define neural network model

features = X_train.shape[1]

model = Sequential()

model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=features))

model.add(Dense(5, kernel_initializer=‘he_normal’, activation=‘relu’))

model.add(Dense(1))

# compile the model and specify loss and optimizer

opt = Adam(learning_rate=0.01, beta_1=0.85, beta_2=0.999)

model.compile(optimizer=opt, loss=‘mse’)

# fit the model on the training dataset

model.fit(X_train, y_train, verbose=0, epochs=300, batch_size=16)

return model

We require multiple models to make point predictions that will define a distribution of point predictions from which we can estimate the interval.

As such, we will need to fit multiple models on the training dataset. Each model must be different so that it makes different predictions. This can be achieved given the stochastic nature of training MLP models, given the random initial weights, and given the use of the stochastic gradient descent optimization algorithm.

The more models, the better the point predictions will estimate the capability of the model. I would recommend at least 10 models, and perhaps not much benefit beyond 30 models.

The function below fits an ensemble of models and stores them in a list that is returned.

For interest, each fit model is also evaluated on the test set which is reported after each model is fit. We would expect that each model will have a slightly different estimated performance on the hold-out test set and the reported scores will help us confirm this expectation.

# fit an ensemble of models

def fit_ensemble(n_members, X_train, X_test, y_train, y_test):

ensemble = list()

for i in range(n_members):

# define and fit the model on the training set

model = fit_model(X_train, y_train)

# evaluate model on the test set

yhat = model.predict(X_test, verbose=0)

mae = mean_absolute_error(y_test, yhat)

print(’>%d, MAE: %.3f’ % (i+1, mae))

# store the model

ensemble.append(model)

return ensemble

1

2

3

4

5

6

7

8

9

10

11

12

13

# fit an ensemble of models

def fit_ensemble(n_members, X_train, X_test, y_train, y_test):

ensemble = list()

for i in range(n_members):

# define and fit the model on the training set

model = fit_model(X_train, y_train)

# evaluate model on the test set

yhat = model.predict(X_test, verbose=0)

mae = mean_absolute_error(y_test, yhat)

print(’>%d, MAE: %.3f’ % (i+1, mae))

# store the model

ensemble.append(model)

return ensemble

Finally, we can use the trained ensemble of models to make point predictions, which can be summarized into a prediction interval.

The function below implements this. First, each model makes a point prediction on the input data, then the 95% prediction interval is calculated and the lower, mean, and upper values of the interval are returned.

The function is designed to take a single row as input, but could easily be adapted for multiple rows.

# make predictions with the ensemble and calculate a prediction interval

def predict_with_pi(ensemble, X):

# make predictions

yhat = [model.predict(X, verbose=0) for model in ensemble]

yhat = asarray(yhat)

# calculate 95% gaussian prediction interval

interval = 1.96 * yhat.std()

lower, upper = yhat.mean() - interval, yhat.mean() + interval

return lower, yhat.mean(), upper

1

2

3

4

5

6

7

8

9

# make predictions with the ensemble and calculate a prediction interval

def predict_with_pi(ensemble, X):

# make predictions

yhat = [model.predict(X, verbose=0) for model in ensemble]

yhat = asarray(yhat)

# calculate 95% gaussian prediction interval

interval = 1.96 * yhat.std()

lower, upper = yhat.mean() - interval, yhat.mean() + interval

return lower, yhat.mean(), upper

Finally, we can call these functions.

First, the dataset is loaded and prepared, then the ensemble is defined and fit.

…

# load dataset

url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’

X_train, X_test, y_train, y_test = load_dataset(url)

# fit ensemble

n_members = 30

ensemble = fit_ensemble(n_members, X_train, X_test, y_train, y_test)

1

2

3

4

5

6

7

…

# load dataset

url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’

X_train, X_test, y_train, y_test = load_dataset(url)

# fit ensemble

n_members = 30

ensemble = fit_ensemble(n_members, X_train, X_test, y_train, y_test)

We can then use a single row of data from the test set and make a prediction with a prediction interval, the results of which are then reported.

We also report the expected value which we would expect would be covered by the prediction interval (perhaps close to 95% of the time; this is not entirely accurate, but is a rough approximation).

…

# make predictions with prediction interval

newX = asarray([X_test[0, :]])

lower, mean, upper = predict_with_pi(ensemble, newX)

print(‘Point prediction: %.3f’ % mean)

print(‘95%% prediction interval: [%.3f, %.3f]’ % (lower, upper))

print(‘True value: %.3f’ % y_test[0])

1

2

3

4

5

6

7

…

# make predictions with prediction interval

newX = asarray([X_test[0, :]])

lower, mean, upper = predict_with_pi(ensemble, newX)

print(‘Point prediction: %.3f’ % mean)

print(‘95%% prediction interval: [%.3f, %.3f]’ % (lower, upper))

print(‘True value: %.3f’ % y_test[0])

Tying this together, the complete example of making predictions with a prediction interval with a Multilayer Perceptron neural network is listed below.

# prediction interval for mlps on the housing regression dataset

from numpy import asarray

from pandas import read_csv

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_absolute_error

from sklearn.preprocessing import MinMaxScaler

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.optimizers import Adam

# load and prepare the dataset

def load_dataset(url):

dataframe = read_csv(url, header=None)

values = dataframe.values

# split into input and output values

X, y = values[:, :-1], values[:,-1]

# split into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.67, random_state=1)

# scale input data

scaler = MinMaxScaler()

scaler.fit(X_train)

X_train = scaler.transform(X_train)

X_test = scaler.transform(X_test)

return X_train, X_test, y_train, y_test

# define and fit the model

def fit_model(X_train, y_train):

# define neural network model

features = X_train.shape[1]

model = Sequential()

model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=features))

model.add(Dense(5, kernel_initializer=‘he_normal’, activation=‘relu’))

model.add(Dense(1))

# compile the model and specify loss and optimizer

opt = Adam(learning_rate=0.01, beta_1=0.85, beta_2=0.999)

model.compile(optimizer=opt, loss=‘mse’)

# fit the model on the training dataset

model.fit(X_train, y_train, verbose=0, epochs=300, batch_size=16)

return model

# fit an ensemble of models

def fit_ensemble(n_members, X_train, X_test, y_train, y_test):

ensemble = list()

for i in range(n_members):

# define and fit the model on the training set

model = fit_model(X_train, y_train)

# evaluate model on the test set

yhat = model.predict(X_test, verbose=0)

mae = mean_absolute_error(y_test, yhat)

print(’>%d, MAE: %.3f’ % (i+1, mae))

# store the model

ensemble.append(model)

return ensemble

# make predictions with the ensemble and calculate a prediction interval

def predict_with_pi(ensemble, X):

# make predictions

yhat = [model.predict(X, verbose=0) for model in ensemble]

yhat = asarray(yhat)

# calculate 95% gaussian prediction interval

interval = 1.96 * yhat.std()

lower, upper = yhat.mean() - interval, yhat.mean() + interval

return lower, yhat.mean(), upper

# load dataset

url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’

X_train, X_test, y_train, y_test = load_dataset(url)

# fit ensemble

n_members = 30

ensemble = fit_ensemble(n_members, X_train, X_test, y_train, y_test)

# make predictions with prediction interval

newX = asarray([X_test[0, :]])

lower, mean, upper = predict_with_pi(ensemble, newX)

print(‘Point prediction: %.3f’ % mean)

print(‘95%% prediction interval: [%.3f, %.3f]’ % (lower, upper))

print(‘True value: %.3f’ % y_test[0])

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

# prediction interval for mlps on the housing regression dataset

from numpy import asarray

from pandas import read_csv

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_absolute_error

from sklearn.preprocessing import MinMaxScaler

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.optimizers import Adam

# load and prepare the dataset

def load_dataset(url):

dataframe = read_csv(url, header=None)

values = dataframe.values

# split into input and output values

X, y = values[:, :-1], values[:,-1]

# split into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.67, random_state=1)

# scale input data

scaler = MinMaxScaler()

scaler.fit(X_train)

X_train = scaler.transform(X_train)

X_test = scaler.transform(X_test)

return X_train, X_test, y_train, y_test

# define and fit the model

def fit_model(X_train, y_train):

# define neural network model

features = X_train.shape[1]

model = Sequential()

model.add(Dense(20, kernel_initializer=‘he_normal’, activation=‘relu’, input_dim=features))

model.add(Dense(5, kernel_initializer=‘he_normal’, activation=‘relu’))

model.add(Dense(1))

# compile the model and specify loss and optimizer

opt = Adam(learning_rate=0.01, beta_1=0.85, beta_2=0.999)

model.compile(optimizer=opt, loss=‘mse’)

# fit the model on the training dataset

model.fit(X_train, y_train, verbose=0, epochs=300, batch_size=16)

return model

# fit an ensemble of models

def fit_ensemble(n_members, X_train, X_test, y_train, y_test):

ensemble = list()

for i in range(n_members):

# define and fit the model on the training set

model = fit_model(X_train, y_train)

# evaluate model on the test set

yhat = model.predict(X_test, verbose=0)

mae = mean_absolute_error(y_test, yhat)

print(’>%d, MAE: %.3f’ % (i+1, mae))

# store the model

ensemble.append(model)

return ensemble

# make predictions with the ensemble and calculate a prediction interval

def predict_with_pi(ensemble, X):

# make predictions

yhat = [model.predict(X, verbose=0) for model in ensemble]

yhat = asarray(yhat)

# calculate 95% gaussian prediction interval

interval = 1.96 * yhat.std()

lower, upper = yhat.mean() - interval, yhat.mean() + interval

return lower, yhat.mean(), upper

# load dataset

X_train, X_test, y_train, y_test = load_dataset(url)

# fit ensemble

n_members = 30

ensemble = fit_ensemble(n_members, X_train, X_test, y_train, y_test)

# make predictions with prediction interval

newX = asarray([X_test[0, :]])

lower, mean, upper = predict_with_pi(ensemble, newX)

print(‘Point prediction: %.3f’ % mean)

print(‘95%% prediction interval: [%.3f, %.3f]’ % (lower, upper))

print(‘True value: %.3f’ % y_test[0])

Running the example fits each ensemble member in turn and reports its estimated performance on the hold out tests set; finally, a single prediction with prediction interval is made and reported.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that each model has a slightly different performance, confirming our expectation that the models are indeed different.

Finally, we can see that the ensemble made a point prediction of about 30.5 with a 95% prediction interval of [26.287, 34.822]. We can also see that the true value was 28.2 and that the interval does capture this value, which is great.

1, MAE: 2.259

2, MAE: 2.144

3, MAE: 2.732

4, MAE: 2.628

5, MAE: 2.483

6, MAE: 2.551

7, MAE: 2.505

8, MAE: 2.299

9, MAE: 2.706

10, MAE: 2.145

11, MAE: 2.765

12, MAE: 3.244

13, MAE: 2.385

14, MAE: 2.592

15, MAE: 2.418

16, MAE: 2.493

17, MAE: 2.367

18, MAE: 2.569

19, MAE: 2.664

20, MAE: 2.233

21, MAE: 2.228

22, MAE: 2.646

23, MAE: 2.641

24, MAE: 2.492

25, MAE: 2.558

26, MAE: 2.416

27, MAE: 2.328

28, MAE: 2.383

29, MAE: 2.215

30, MAE: 2.408

Point prediction: 30.555

95% prediction interval: [26.287, 34.822]

True value: 28.200

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

1, MAE: 2.259

2, MAE: 2.144

3, MAE: 2.732

4, MAE: 2.628

5, MAE: 2.483

6, MAE: 2.551

7, MAE: 2.505

8, MAE: 2.299

9, MAE: 2.706

10, MAE: 2.145

11, MAE: 2.765

12, MAE: 3.244

13, MAE: 2.385

14, MAE: 2.592

15, MAE: 2.418

16, MAE: 2.493

17, MAE: 2.367

18, MAE: 2.569

19, MAE: 2.664

20, MAE: 2.233

21, MAE: 2.228

22, MAE: 2.646

23, MAE: 2.641

24, MAE: 2.492

25, MAE: 2.558

26, MAE: 2.416

27, MAE: 2.328

28, MAE: 2.383

29, MAE: 2.215

30, MAE: 2.408

Point prediction: 30.555

95% prediction interval: [26.287, 34.822]

True value: 28.200

This is a quick and dirty technique for making predictions with a prediction interval for neural networks, as we discussed above.