What is pruning in Decision Tree?
Pruning is applied to a decision tree after the training phase. Basically, we let the tree be free to grow as much as allowed by its settings, without applying any explicit restrictions. In the end, we proceed to cut those branches that are not populated sufficiently so as to avoid overfitting the training data. Indeed, branches that are not populated enough are probably overly concentrating on special data points. This is why removing them should help generalization on new unseen data.
Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal decision tree. A too-large tree increases the risk of overfitting, and a small tree may not capture all the important features of the dataset. Therefore, a technique that decreases the size of the learning tree without reducing accuracy is known as Pruning.
There are mainly two types of tree pruning technology used:
- Cost Complexity Pruning
- Reduced Error Pruning.