Suppose you are given survey data, and it has some missing data, how would you deal with missing values from that survey?

This is among the important data science interview questions. There are two main techniques for dealing with missing values –

Debugging Techniques – It is a Data Cleaning process consisting of evaluating the quality of the information collected, increasing its quality, in order to avoid lax analysis. The most popular debugging techniques are –

Searching the list of values: It is about searching the data matrix for values ​​that are outside the response range. These values ​​can be considered as missing, or the correct value can be estimated from other variables

Filtering questions : It is about comparing the number of responses of a filter category and another filtered category. If any anomaly is observed that cannot be solved, it will be considered as a lost value.

Checking for Logical Consistencies: The answers that may be considered contradictory to each other are checked.

Counting the Level of representativeness : A count is made of the number of responses obtained in each variable. If the number of unanswered questions is very high, it is possible to assume equality between the answers and the non-answers or to make an imputation of the non-answer.

  • Imputation Technique

This technique consists of replacing the missing values ​​with valid values ​​or answers by estimating them. There are three types of imputation:

  • Random imputation
  • Hot Deck imputation
  • Imputation of the mean of subclasses