Should one build ROC curve on train data or test data?

Let’s first understand the interpretation of doing it the both way.

Building an AUC ROC curve will simply explain the maximum possible variable separability of the model. Computing the AUC ROC on train set, will tell if model is confident in it’s learning or not. This is like a student re-answering the same question when taught by teacher during the class.

AUC ROC on test set will tell, how good it performed on unknown dataset. So this is like answering a different question based upon a concept learnt.

The difference between the two if is very large, you the know the student doesn’t understand concepts well, which is nothing but model “overfitting”. However, if the training set’s AUCROC is itself bad, then the model is not learning in the first place any pattern.

It depends what you want. If your focus is just good predictions, then a decent AUC ROC on test set is good to go. But if you want to delve deeper and understand feature importance and model characteristics, check out metrics on train set too.

Some people also go for train, validation and test, where they tune parameter on train, AUCROC based model selection and threshold selection on validation set, and final assessment on test set.

Test dataset, to calculate roc on the test set,because thts actually the set of data that can help you estimate generalized performance, as it was not used to train the model in any way.and thats the definition of test set…and if auc score is good, means model is good but if its bad , means overfitting