Define different categories in which data is split while creating a model

mayur-srivastava-969b4813 · 20 June 2021 16:49

While building a model, we divided the data into three categories:

Training set: The training set is what we use to create the model and change the model’s variables. However, we cannot rely on the model built on top of the training set to be right. When fresh inputs are fed into the model, it may produce erroneous results.

Validation set: We use a validation set to examine the model’s behaviour in the absence of samples from the training dataset. Then, using the estimated benchmark of the validation data, we’ll tweak hyperparameters.

Test set: The test set is a subset of the actual dataset that hasn’t been used to train the model yet. This dataset is unknown to the model. As a result, we can compute the response of the constructed model on concealed data using the test dataset. The test dataset is used to assess the model’s performance.