Explain why Data Cleansing is essential and which method you use to maintain clean data?
Data ,when collected from various resources as stated in (https://medium.com/@TheDataGyan/day-6-getting-data-in-r-9b704ac9c31d)can be really untidy.
- It may not be segregated in terms of it’s feature values, neither it might be available in a clean tabular format.
- It can be redundant, full of missing values and outliers( Values which are very far from the desired range of a feature).
- It may not be understandable.
- It may not have well defined format.
so before appliying it to the model it needs to be processed. This is called as Data cleaning