What do you understand about Information Gain? Also, explain the mathematical formulation associated with it

Information gain is the difference between the entropy of a data segment before the split and after the split i.e, reduction in impurity due to the selection of an attribute.

Some points keep in mind about information gain:

  • The high difference represents high information gain.
  • Higher the difference implies the lower entropy of all data segments resulting from the split.
  • Thus, the higher the difference, the higher the information gain, and the better the feature used for the split.

Mathematically, the information gain can be computed by the equation as follows:

Information Gain = E(S1) – E(S2)

E(S1) denotes the entropy of data belonging to the node before the split.

E(S2) denotes the weighted summation of the entropy of children nodes by considering the weights as the proportion of data instances falling in specific children nodes.