Information gain is the difference between the entropy of a data segment before the split and after the split i.e, reduction in impurity due to the selection of an attribute.
Some points keep in mind about information gain:
- The high difference represents high information gain.
- Higher the difference implies the lower entropy of all data segments resulting from the split.
- Thus, the higher the difference, the higher the information gain, and the better the feature used for the split.
Mathematically, the information gain can be computed by the equation as follows:
Information Gain = E(S1) – E(S2)
– E(S1) denotes the entropy of data belonging to the node before the split.
– E(S2) denotes the weighted summation of the entropy of children nodes by considering the weights as the proportion of data instances falling in specific children nodes.