A node’s Gini impurity is generally lower than that of its parent as the CART training algorithm cost function splits each of the nodes in a way that minimizes the weighted sum of its children’s Gini impurities. However, sometimes it is also possible for a node to have a higher Gini impurity than its parent but in such cases, the increase is more than compensated by a decrease in the other child’s impurity.
For better understanding we consider the following Example:
Consider a node containing four samples of class A and one sample of class B.
Then, its Gini impurity is calculated as 1 – (1/5)2 – (4/5)2= 0.32
Now suppose the dataset is one-dimensional and the instances are arranged in the manner: A, B, A, A, A. We can verify that the algorithm will split this node after the second instance, producing one child node with instances A, B, and the other child node with instances A, A, A.
Then, the first child node’s Gini impurity is 1 – (1/2)2 – (1/2)2 = 0.5 , which is higher than its parent’s. This is compensated for by the fact that the other node is pure, so its overall weighted Gini impurity is 2/5 × 0.5 + 3/5 × 0 = 0.2 , which is lower than the parent’s Gini impurity.