How Would You Approach a Dataset That’s Missing More Than 30 Percent of Its Values?

nikhil-waghalkar-46dbf38a · 23 April 2022 18:22

The approach will depend on the size of the dataset. If it is a large dataset, then the quickest method would be to simply remove the rows containing the missing values. Since the dataset is large, this won’t affect the ability of the model to produce results.

If the dataset is small, then it is not practical to simply eliminate the values. In that case, it is better to calculate the mean or mode of that particular feature and input that value where there are missing entries.

Another approach would be to use a machine learning algorithm to predict the missing values. This can yield accurate results unless there are entries with a very high variance from the rest of the dataset.