What is mixed modeling in data science?

A business problem is often nested. For e.g. imagine the data being collected by clinical trials done for a covid vaccine. The main question to address, what drug components to maintain in what proportions for different age groups & BMIs.
Had there been no complexity involved of age and bmi, a logistic regression model would have been ideal to predict for x1, x2, … concentrations of components, is the vaccine effective. However, it’s hypothetically possible that two contradictory models work well on under-weight and over-weight people.

A mixed modeling approach would use unsupervised clustering approach to create clusters of patients, and then a logistic regression model can be built for each of the cluster.

PS: The example was just for illustration purpose. In reality, it’s much more complex.

Looking at the market in general, probably 85% of the MMM specialists would be content with using multiple linear regression and apply Ordinary Least square to estimate the coefficients of the model.

However, a linear model may not be adequate to model the real world. In fact, consider the following equation and the following scenarios:

Ice Cream Sales=β0+β1 Price+β2 Distribution+β3 Temperature

  • If the price is zero: sales should be infinite (provided stock is available)
  • If the distribution is zero: sales should be zero as the product cannot be sold
  • If the temperature significantly decreases, the sales cannot become negative.

Linear model treats all variables the same way.

  • Relative variables like Price and distribution are treated the same as other incremental regressors:
  • When these variables are equal to zero we still get a finite volume.

Linear model is suitable only if the regressors’ values are within a contained range