Just like any other science experients, data science experiments should also begin with aims and hypothesis to tests. Drafting a set of hypothesis before conducting ML exploration will also allow a first hand report to the business team and various stake holders on the thinking direction.
Let’s take an example:
Incase of fraud detection, it’s class imbalance problem with dynamic changing trends. So following could be hypothesis to be tested:
- Undersampling works better than upsampling or nosampling
- Data generated more than 2 years back tends to be outdated
Implications of each one has to be understood.
The result of 1 will answer what techniques and ML pipeline to set up
The result of 2 will answer what ML ops deployment architecture to follow.