Getting a very good f1-score, or accuracy is not the end of the analysis. Infact, these metrics can be extremely sensitive if the test size is low. They can be extremely good or bad, and in either case don’t reflect the ‘generalizability’ of ML model.
So generalizability is the ability of ML model to correctly classify or forecast on unseen data.
Some of the best ways to ensure generalizability:
- Make sure there is sufficient data in test set
- Consider stratified sampling when splitting the data into train and test.
- Conduct a k-fold validation technique to really understand if ML is able to learn in each random split
- Prevent all sorts of data leakages into test set during research