How to define/select metrics?

How to define/select metrics?

There two parts to selecting the right set of metrics for analysis

  1. The business shortlisting
    This is more of a domain knowledge driven exercise where you would try to find the universe of all exhaustive metrics possible for the business problem. For example if you are trying to analyze which type of customer will buy a particular product. You would obviously want to consider metrics like, price of the product, discount, closes competitor of the product and its price, customer demographics like age location, income etc. This list should be as exhaustive as possible , whatever you can think of will determine if the customer will buy or not

  2. Then come the whole process of statsitics and math called as “feature engineering” which is all about analyzing the data you have for all the above selected metrics and choosing what goes into a model. This is done in a variety of ways at various stages of the model
    a. check correlation among variables - putting multiple variables which are correlated to each other is not recommended
    b. check information gain / correlation what you are trying to predict /analyze - this tells you how important the variable is
    c. Check beta coefficient of the metric in the final model
    d. based on the above you can remove the variable, keep the variable, transform the variable or create a new derived variable by combining it or create interaction variables as well

In the area of software metrics for statistical decision making or control system programming, the selection of metrics follows the same standard guidelines of all software development.

  • top-down v. bottom up - you can start with what you need and move towards what you have or start with what you have and interpret as you need.
  • resiliency - if you commit to an implementation with a particular set of metrics, will it be robust over time, version updates, and operating scenarios.
  • complexity v. value - often, the same result can be calculated multiple different ways. What is the simplest form to achieve what you need?
  • operating model - metrics when collected are more useful if you understand what the metrics are measuring.
  • redundancy and self-validation - metrics often are a handle to an observation in the real world, and as such independent means to achieve the same approximate value can be used to quickly identify unexpected results.