Detailed Example 01: How Decision Trees Decide Which Feature to Split#
Consider an example where we are building a decision tree to predict whether a loan given to a person would result in a write-off or not. Our entire population consists of 30 instances. 16 belong to the write-off class and the other 14 belong to the non-write-off class. We have two features, namely “Balance” that can take on two values -> “< 50K” or “>50K” and “Residence” that can take on three values -> “OWN”, “RENT” or “OTHER”.
Overfitting and Pruning#
Overfitting in Decision Tree#
Decision trees are prone to overfitting, meaning they become too complex and capture noise in the training data (perfectly fitting the training data), but performing poorly on unseen data (poor generalization). This often happens with very deep trees with many nodes that achieve perfect classification on the training set (pure leaf nodes).
Pruning Methods#
Pruning is the technique used to combat overfitting by reducing the size of the tree. There are two main strategies:
Pre-pruning (Early Stopping) : Stop growing the tree early before it becomes too complex. How? by setting parameters like:
Maximum depth of the tree.
Minimum number of samples required to split a node.
Minimum number of samples required to be in a leaf node.
Post-pruning: Grow a full tree and then remove branches that do not improve accuracy on validation data. The most common technique is Reduced Error Pruning.