What is Pruning in Decision Trees, and How Is It Done?

Pruning is a technique in machine learning that reduces the size of decision trees. It reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting.

Pruning can occur in:

  • Top-down fashion. It will traverse nodes and trim subtrees starting at the root
  • Bottom-up fashion. It will begin at the leaf nodes

There is a popular pruning algorithm called reduced error pruning, in which:

  • Starting at the leaves, each node is replaced with its most popular class
  • If the prediction accuracy is not affected, the change is kept
  • There is an advantage of simplicity and speed