What is the purpose of data cleaning in data analysis?

Data cleaning can be a daunting task due to the fact that as the number of data sources grows, the time required for cleaning the data increases at an exponential rate.

This is due to the vast volume of data generated by additional sources. Data cleaning can solely take up to 80% of the total time required for carrying out a data analysis task.

Nevertheless, there are several reasons for using data cleaning in data analysis. Two of the most important ones are:

  • Cleaning data from different sources helps transform the data into a format that is easy to work with
  • Data cleaning increases the accuracy of a machine learning model