Some common Machine Learning, Statistics and Data Science terms starts with E

vishrut-singhal · 28 May 2021 17:47

E

[su_table]

### Word	### Description
Early Stopping	Early stopping is a technique for avoiding overfitting when training a machine learning model with iterative method. We set the early stopping in such a way that when the performance has stopped improving on the held-out validation set, the model training stops.

For example, in XGBoost, as you train more and more trees, you will overfit your training dataset. Early stopping enables you to specify a validation dataset and the number of iterations after which the algorithm should stop if the score on your validation dataset didn’t increase.|
|EDA|EDA or exploratory data analysis is a phase used for data science pipeline in which the focus is to understand insights of the data through visualization or by statistical analysis.

The steps involved in EDA are:

Variable IdentificationIn this step, we identify the data type and category of variables
Univariate analysis
Multivariate analysis

|ETL|ETL is the acronym for Extract, Transform and Load. An ETL system has the following properties:

It extracts data from the source systems
It enforces data quality and consistency standards
Delivers data in a presentation-ready format

This data can be used by application developers to build applications and end users for making decisions.|
|Evaluation Metrics`|The purpose of evaluation metric is to measure the quality of the statistical / machine learning model. For example, below are a few evaluation metrics

AUC
ROC score
F-Score
Log-Loss|