Evaluation is always good in any field, right! In the case of machine learning, it is the best practice. The following are the criterias to be kept in mind while doing evaluation in Machine learning.
 Confusion Matrix: It creates a N X N matrix, where N is the number of classes or categories that are to be predicted. Here we have N = 2, so we get 2 X 2 matrix. Suppose there is a problem for our practice which is a binary classification.
There are 4 terms you should keep in mind:
 True Positives: It is the case where we predicted Yes and the real output was also yes.
 True Negatives: It is the case where we predicted No and the real output was also No.
 False Positives: It is the case where we predicted Yes but it was actually No.
 False Negatives: It is the case where we predicted No but it was actually Yes.

Classification Accuracy: Classification accuracy is the accuracy we generally mean, Whenever we use the term accuracy. We calculate this by calculating the ratio of correct predictions by a total number of input Samples.

Logarithmic loss: It is also known as Log loss. Its basic working propaganda is by penalizing the false (False Positive) classification. It usually works well with multiclass classification. Working of Log loss, the classifier should assign a probability for each and every class of all the samples.

Area under Curve: It is one of the widely used metrics and basically used for binary classification. A U C of a classifier is defined as the probability of a classifier that will rank a randomly chosen positive example higher than a negative example. Before going into A U C more, let me make you comfortable with few basic terms.

F1 score: It is a harmonic mean between recall and precision. Its range is [0,1]. This metric usually tells us how precise (It correctly classifies how many instances) and robust (does not miss any significant number of instances) our classifier is.

Mean Absolute Error: It is the average distance between Predicted and original values. Basically it gives how we have predicted from the actual output. However, there is one limitation i.e. it doesnâ€™t give any idea about the direction of the error which is whether we are underpredicting or overpredicting our data.

Mean Squared Error: It is similar to mean absolute error but the difference is it takes the square of average of between predicted and original values. The main advantage to take this metric is here, it is easier to calculate the gradient whereas in the case of mean absolute error it takes complicated programming tools to calculate the gradient.