E
[su_table]
### Word | ### Description |
---|---|
Early Stopping | Early stopping is a technique for avoiding overfitting when training a machine learning model with iterative method. We set the early stopping in such a way that when the performance has stopped improving on the held-out validation set, the model training stops. |
For example, in XGBoost, as you train more and more trees, you will overfit your training dataset. Early stopping enables you to specify a validation dataset and the number of iterations after which the algorithm should stop if the score on your validation dataset didn’t increase.|
|EDA|EDA or exploratory data analysis is a phase used for data science pipeline in which the focus is to understand insights of the data through visualization or by statistical analysis.
The steps involved in EDA are:
- Variable IdentificationIn this step, we identify the data type and category of variables
- Univariate analysis
- Multivariate analysis
|ETL|ETL is the acronym for Extract, Transform and Load. An ETL system has the following properties:
- It extracts data from the source systems
- It enforces data quality and consistency standards
- Delivers data in a presentation-ready format
This data can be used by application developers to build applications and end users for making decisions.|
|Evaluation Metrics`|The purpose of evaluation metric is to measure the quality of the statistical / machine learning model. For example, below are a few evaluation metrics
- AUC
- ROC score
- F-Score
- Log-Loss|