What is EDA in Data Science?

The objective of Exploratory Data Analysis (EDA) on any dataset is to ensure that it is free of duplicates, incorrect values, and null values. When we work on developing the model, identifying the relevant characteristics in the dataset and removing the unwanted noise in the dataset that might impair the accuracy of the results.

EDA is commonly used to achieve the following objectives:

  • Evaluating a single variable and examining patterns across time
  • Error-checking the data
  • Confirming assumptions
  • Analyzing the connections between variables

Most of the time in a Machine Learning project is spent on understanding and cleaning the data. The data is first collected and Exploratory Data Analysis (EDA) is performed. The more data you get, the more features can be generated.

The most popular and basic python libraries for EDA are:

  1. Numpy
  2. Pandas
  3. Matplotlib
  4. Seaborn
  5. Scikit learn

For a machine learning model to work accurately, the data need to be fed in the right form. The data is cleaned and visualized using EDA.