What is meant by ‘Training set’ and ‘Test Set’?

**What is meant by Training set’ and ‘Test Set’

Training Set

This dataset corresponds to Step 1 in the previous section. It includes the set of input examples that the model will be fit into — or trained on — by adjusting the parameters (i.e. weights in the context of Neural Networks).

Validation Set

In order for the model to be trained, it needs to periodically be evaluated ( Step 2 ), and that is exactly what the validation set is for. Through calculating the loss (i.e. error rate) the model yields on the validation set at any given point, we can know how accurate it is. This is the essence of training. Subsequently, the model will tune its parameters based on the frequent evaluation results on the validation set.

Test Set

This corresponds to the final evaluation that the model goes through after the training phase (utilizing training and validation sets) has been completed. This step is critical to test the generalizability of the model ( Step 3 ). By using this set, we can get the working accuracy of our model.

It is worth mentioning that we need to be subjective — and honest — by not exposing the model to the test set until the training phase is over. This way, we can consider the final accuracy measure to be reliable.

Training data and test data sets are the subsets of your main data set.

Say you have a data set of 1000 data points. Usually, it is divided in 70-30 such that, the 700 data points will be your training data and the 300 data points will be your test data.

Essentially you will train your model using your training data and the test data will be hidden from your model during training. You won’t use the data points from the test data to train your model.

Again you can divide your training data set of 700 points into the training set and validation set, keeping the validation set smaller. The training set will be used to train your ML algorithm and the validation set will be used to validate your model.