Hypothesis testing in ML experiments

rajanikant-ghate · 27 February 2022 17:43

Just like any other science experients, data science experiments should also begin with aims and hypothesis to tests. Drafting a set of hypothesis before conducting ML exploration will also allow a first hand report to the business team and various stake holders on the thinking direction.

Let’s take an example:
Incase of fraud detection, it’s class imbalance problem with dynamic changing trends. So following could be hypothesis to be tested:

Undersampling works better than upsampling or nosampling
Data generated more than 2 years back tends to be outdated

Implications of each one has to be understood.
The result of 1 will answer what techniques and ML pipeline to set up
The result of 2 will answer what ML ops deployment architecture to follow.