Explain Data pre-processing in detail?

Data preprocessing, a component of data preparation, describes any type of processing performed on raw data to prepare it for another data processing procedure. It has traditionally been an important preliminary step for the data mining process.

There are seven significant steps in data preprocessing in Machine Learning:

  1. Acquire the dataset.
  2. Import all the crucial libraries.
  3. Import the dataset.
  4. Identifying and handling the missing values.
  5. Encoding the categorical data.
  6. Splitting the dataset.
  7. Feature scaling.
  • Data Cleaning: It is also known as scrubbing. This task involves filling of missing values, smoothing or removing noisy data and outliers along with resolving inconsistencies.

  • Data transformation: It is an essential data preprocessing technique that must be performed on the data before data mining to provide patterns that are easier to understand. Data transformation changes the format, structure, or values of the data and converts them into clean, usable data.

  • Data Reduction: During this step data is reduced. The number of records or the number of attributes or dimensions can be reduced. Reduction is performed by keeping in mind that reduced data should produce the same results as original data. Data Discretization: It is considered as a part of data reduction.