Data Cleaning Techniques.
Data ,when collected from various resources as stated in (https://medium.com/@TheDataGyan/day-6-getting-data-in-r-9b704ac9c31d)can be really untidy.
- It may not be segregated in terms of it’s feature values, neither it might be available in a clean tabular format.
- It can be redundant, full of missing values and outliers( Values which are very far from the desired range of a feature).
- It may not be understandable.
- It may not have well defined format.
so before appliying it to the model it needs to be processed. This is called as Data cleaning
REF: https://medium.com/sciforce/data-cleaning-and-preprocessing-for-beginners-25748ee00743
Generally data cleaning reduces errors and improves the data quality. Correcting errors in data and eliminating bad records can be a time consuming and tedious process but it cannot be ignored. Data mining is a key technique for data cleaning.
Data mining is a technique for discovery interesting information in data. Data quality mining is a recent approach applying data mining techniques to identify and recover data quality problems in large databases. Data mining automatically extract hidden and intrinsic information from the collections of data. Data mining has various techniques that are suitable for data cleaning. Three major data mining methods, namely functional dependency mining, association rule mining and Bagging SVMs for data cleaning.