datarekha

Decision Trees: Entropy, Gini & Info Gain

A decision tree splits data to drive down impurity. Entropy and Gini measure that impurity; information gain picks the split — a recurring GATE DA NAT.

9 min read Intermediate GATE DA Lesson 90 of 122

What you'll learn

  • Entropy H = −Σ pᵢ·log2(pᵢ) and Gini = 1 − Σ pᵢ² as impurity measures
  • Information gain = parent impurity minus the weighted child impurity
  • Computing information gain for one split by hand, step by step
  • Why fully grown trees overfit, and how pruning or depth limits help

Before you start

A decision tree asks a sequence of yes/no questions — “is this feature above that threshold?” — and each answer sends a sample down a branch. The art is which question to ask: at every node the tree picks the split that leaves its two children as pure as possible (each child dominated by one class). “Pure” needs a number, and that number is an impurity measure. This same split-on-best-gain idea powers the random forests and gradient-boosted trees that still win most tabular-data problems in industry — so the arithmetic here is the engine inside tools you will actually ship.

Measuring impurity

For a node where class i holds a fraction pᵢ of the samples, two measures dominate GATE:

EntropyH = −∑ pᵢ log2 pᵢGiniG = 1 − ∑ pᵢ²Both are 0 for a pure node, maximal for a 50/50 split.
Two ways to score disorder. Entropy uses log2; Gini uses squared probabilities — they agree closely.
  • Entropy H = −Σ pᵢ·log2(pᵢ) — measured in bits. A pure node (p = 1) has H = 0; a 50/50 binary node has H = 1 (maximal disorder).
  • Gini G = 1 − Σ pᵢ² — a pure node has G = 0; a 50/50 node has G = 0.5. It is the sklearn default, being slightly cheaper than a logarithm.

A split is judged by how much it cuts impurity:

Information gain = H(parent) − Σ (weighted) H(child), where each child is weighted by the fraction of samples it receives.

Higher gain means a better split. Below, add axis-aligned cuts and watch each one carve a region, recolour by majority class, and drop the Gini impurity — the greedy best cut is hinted in green.

How GATE asks this

The bread-and-butter version is a NAT: you are handed a parent node and one proposed split into two children (with class counts), and asked for the information gain — or just the entropy or Gini of a single node. Numbers are kept clean (3/1 splits, 9/1 splits, 50/50 nodes) so the arithmetic is doable in a minute. Conceptual MCQ/MSQ questions probe the rest of the ceiling: a pure node has impurity 0, higher gain is the better split, and a fully grown tree overfits (high variance).

Worked example — information gain of a split

A parent node has 10 positive and 10 negative samples. A candidate split sends them to child A = [9 pos, 1 neg] and child B = [1 pos, 9 neg], each holding 10 samples. What is the information gain?

Step 1 — parent entropy. The parent is 10/20 and 10/20, a perfect 50/50:

H(parent) = −0.5·log2(0.5) − 0.5·log2(0.5)
          = −0.5·(−1) − 0.5·(−1) = 1.0

Step 2 — child entropy. Child A is p = 0.9 and 0.1. Using log2(0.9) ≈ −0.152 and log2(0.1) ≈ −3.322:

H(A) = −0.9·log2(0.9) − 0.1·log2(0.1)
     = −0.9·(−0.152) − 0.1·(−3.322)
     = 0.137 + 0.332 = 0.469

Child B is the mirror image (p = 0.1, 0.9), so H(B) = 0.469 as well.

Step 3 — weighted child entropy. Each child holds half the samples (weight 10/20 = 0.5):

Weighted H = 0.5·0.469 + 0.5·0.469 = 0.469

Step 4 — information gain.

IG = H(parent) − Weighted H = 1.0 − 0.469 = 0.531

So this split buys 0.531 bits of information — a large gain, because it turns a useless 50/50 node into two nearly pure children.

Quick check

Quick check

0/6
Q1A node contains 3 positive and 1 negative sample. What is its entropy in bits? (2 decimals)numerical answer — type a number
Q2A parent node (entropy 1.0) of 8 samples splits into a PURE child of 4 and another PURE child of 4. What is the information gain? (1 decimal)numerical answer — type a number
Q3What is the Gini impurity of a node that is 50% positive and 50% negative? (2 decimals)numerical answer — type a number
Q4Which statements about decision-tree impurity and splitting are TRUE? (select all that apply)select all that apply
Q5Entropy and Gini both measure node impurity. Which fact distinguishes them?
Q6A parent of 10 samples (entropy 1.0, a 5/5 split) splits into child A = [4 pos, 1 neg] and child B = [1 pos, 4 neg]. Each child has entropy ≈ 0.722. What is the information gain? (3 decimals)numerical answer — type a number

Practice this in an interview

All questions
What is the difference between Gini impurity and entropy as splitting criteria in decision trees?

Both measure node impurity but differ in computation and sensitivity. Gini is faster to compute and slightly favors larger partitions, while entropy (information gain) is more sensitive to class probability changes near 0.5. In practice the splits they produce are nearly identical.

What is information gain and how does it relate to entropy in a decision tree split?

Information gain measures how much a split reduces uncertainty (entropy) in the target variable. It is the difference between the parent node's entropy and the weighted average entropy of the child nodes. The split that maximises information gain is selected at each node.

Walk me through exactly how a decision tree chooses a split at each node.

At each node the algorithm iterates over every feature and every candidate threshold, scores each candidate split by the weighted impurity of the two child nodes, and selects the pair that gives the largest impurity reduction. It then recurses on each child until a stopping criterion is met.

What are the pitfalls of impurity-based feature importance in tree ensembles, and how do you get a more reliable estimate?

Impurity-based importance (mean decrease in impurity) is systematically biased toward high-cardinality and continuous features because they offer more candidate splits. Permutation importance and SHAP values are less biased alternatives that measure actual predictive contribution on held-out data.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content