How Decision Trees Decide Which Feature to Split (Using Entropy and Information Gain)

How Decision Trees Decide Which Feature to Split (Using Entropy and Information Gain)#

Decision Trees use Entropy and Information Gain to decide how to split the data at each node. The goal is to create subgroups (child nodes) that are as pure as possible with respect to the target variable.

The core idea of a Decision Tree is recursive splitting: at each node, we pick the feature that best separates the data into purer subgroups.


Step-by-Step Process#

  1. Start at the root node

    • The root node contains the entire dataset.

    • Calculate the Entropy of the root node to measure the overall uncertainty.

  1. Evaluate each feature

    • For each feature, split the dataset according to its possible values.

    • Compute the entropy of each child node and the weighted average entropy: $\( Entropy_{after\_split} = \sum_{v \in Values(A)} \frac{|S_v|}{|S|} \, Entropy(S_v) \)$

    • Calculate Information Gain:

\[ IG(S, A) = Entropy(S) - Entropy_{after\_split} \]
  • A higher IG means the feature produces purer child nodes.

  1. Choose the best feature to split

    • Compare the Information Gain of all features.

    • Select the feature with the highest IG for splitting.

    • This ensures each split reduces the most uncertainty.

      • Creates child nodes that are more homogeneous than the parent

  1. Repeat recursively for each child node

    • Treat each child node as a new node and repeat Steps 1–3.

    • Continue until:

      • Nodes are pure (entropy = 0), or

      • A stopping condition is reached (e.g., max depth, min samples per node).


Key Idea#

  • Entropy tells us how mixed the node is.

  • Information Gain tells us how good a feature is at reducing uncertainty.

  • Decision Trees always pick the feature that maximizes Information Gain to make the next split.