What is mixed modeling in data science?

rajanikant-ghate · 16 May 2021 16:51

A business problem is often nested. For e.g. imagine the data being collected by clinical trials done for a covid vaccine. The main question to address, what drug components to maintain in what proportions for different age groups & BMIs.
Had there been no complexity involved of age and bmi, a logistic regression model would have been ideal to predict for x1, x2, … concentrations of components, is the vaccine effective. However, it’s hypothetically possible that two contradictory models work well on under-weight and over-weight people.

A mixed modeling approach would use unsupervised clustering approach to create clusters of patients, and then a logistic regression model can be built for each of the cluster.

PS: The example was just for illustration purpose. In reality, it’s much more complex.

chirag-garg · 15 August 2021 16:50

Looking at the market in general, probably 85% of the MMM specialists would be content with using multiple linear regression and apply Ordinary Least square to estimate the coefficients of the model.

However, a linear model may not be adequate to model the real world. In fact, consider the following equation and the following scenarios:

Ice Cream Sales=β0+β1 Price+β2 Distribution+β3 Temperature

If the price is zero: sales should be infinite (provided stock is available)
If the distribution is zero: sales should be zero as the product cannot be sold
If the temperature significantly decreases, the sales cannot become negative.

Linear model treats all variables the same way.

Relative variables like Price and distribution are treated the same as other incremental regressors:
When these variables are equal to zero we still get a finite volume.

Linear model is suitable only if the regressors’ values are within a contained range