How would you handle suspected or missing data?

how would you handle suspected or missing data?

1 Like

We can handle the missing values by two ways first we have to categorise as whether it is numeric data or categorical data and then apply the mean,median,mode method.
For Numeric basically we use mean method
& for Categorical we use mode method
Suppose we have more then 30% data as missing then we basically drop that column and if it is up to 5% then we used to fill the data using fillna method.

1 Like

“Suspected and missing data”, I am assuming you are talking about data which fails to meet data quality checks, or I would say error out during data ingestion, initial load.

  1. Error out data or suspected dataset gives you opportunity to improvise your DQA model. You can identify the pattern what type of data is rejected and why?
  2. It might be getting rejected due to type mis-match, range check or key-val pair mismatch, you can improvise your model based on missing data.
  3. Suspected data, I believe you are referring to flagged dataset which met few of the rules but did not meet all the DQA requirement. It might be getting rejected due to null check, key identifier missing or primary attribute being null.