Discussion Forum | Board Infinity

How much data should you allocate for your training, validation, and test sets?

Data management

You have to find a balance, and there’s no right answer for every problem. If your test set is too small, you’ll have an unreliable estimation of model performance (performance statistic will have high variance). If your training set is too small, your actual model parameters will have a high variance.

A good rule of thumb is to use an 80/20 train/test split. Then, your train set can be further split into train/validation or into partitions for cross-validation.