Some common Machine Learning, Statistics and Data Science terms starts with E

E

[su_table]

### Word ### Description
Early Stopping Early stopping is a technique for avoiding overfitting when training a machine learning model with iterative method. We set the early stopping in such a way that when the performance has stopped improving on the held-out validation set, the model training stops.

For example, in XGBoost, as you train more and more trees, you will overfit your training dataset. Early stopping enables you to specify a validation dataset and the number of iterations after which the algorithm should stop if the score on your validation dataset didn’t increase.|
|EDA|EDA or exploratory data analysis is a phase used for data science pipeline in which the focus is to understand insights of the data through visualization or by statistical analysis.

The steps involved in EDA are:

  1. Variable IdentificationIn this step, we identify the data type and category of variables
  2. Univariate analysis
  3. Multivariate analysis

|ETL|ETL is the acronym for Extract, Transform and Load. An ETL system has the following properties:

  • It extracts data from the source systems
  • It enforces data quality and consistency standards
  • Delivers data in a presentation-ready format

This data can be used by application developers to build applications and end users for making decisions.|
|Evaluation Metrics`|The purpose of evaluation metric is to measure the quality of the statistical / machine learning model. For example, below are a few evaluation metrics

  1. AUC
  2. ROC score
  3. F-Score
  4. Log-Loss|