datarekha

Binary Search Trees

A tree that keeps the binary-search invariant alive — left < node < right — so every search and insert follows a single root-to-leaf path instead of scanning all n elements.

8 min read Intermediate Data Structures & Algorithms Lesson 17 of 32

What you'll learn

  • How the BST invariant (left < node < right) makes search and insert O(h) by following one root-to-leaf path
  • Why height h equals log₂n for a balanced tree but degenerates to n when values are inserted in sorted order
  • How in-order traversal visits BST nodes in ascending order — a free sort at O(n)
  • What "balancing" means and why AVL trees and red-black trees exist to prevent the sorted-input worst case

Before you start

A binary search tree is the simplest data structure that keeps data sorted at all times — not as a sorted array you rebuild, but as a live structural invariant you maintain with every insert and delete.

The rule is a single sentence: for every node, all values in the left subtree are smaller, and all values in the right subtree are larger.

That rule is the whole structure. Everything else — fast search, sorted traversal, efficient min/max — falls out of it automatically.

The invariant in action

Before reading further, build a tree. Insert a few numbers and watch how each one gets routed: at every node it visits, the algorithm asks one question — “am I smaller or larger?” — and takes one step left or right. The value lands as soon as there is no child in the chosen direction.

Notice that inserting the values in sorted order collapses the tree into a right-skewed chain — each new value is larger than all previous ones, so it always goes right. That is the degeneration problem balanced BSTs exist to solve. Load the “Sorted (degenerate)” preset to see it clearly, then compare the height vs. log₂n readout.

Insert and search: O(h)

Both insert and search follow the same logic — start at the root and walk toward a leaf, one comparison per level. Here h is the height of the tree — the number of edges on the longest root-to-leaf path.

class Node:
    def __init__(self, val):
        self.val = val
        self.left = self.right = None

def insert(root, val):
    if root is None:
        return Node(val)
    if val < root.val:
        root.left = insert(root.left, val)
    elif val > root.val:
        root.right = insert(root.right, val)
    # equal → duplicate, ignore
    return root

def search(root, val):
    if root is None:
        return False
    if val == root.val:
        return True
    if val < root.val:
        return search(root.left, val)
    return search(root.right, val)

The number of comparisons is at most the height h of the tree — you visit one node per level from root to leaf. Nothing else is touched.

  • Balanced tree: h ≈ log₂n, so operations are O(log n). This holds only when the tree is kept balanced — a plain BST makes no such guarantee.
  • Degenerate tree (sorted inserts): h = n, so operations are O(n) — no better than a linked list scan.

Delete: the tricky case

Deleting is more involved because removing a node can break the invariant for everything below it.

Three cases:

  1. Leaf node (no children): just remove it.
  2. One child: splice the node out — connect its parent directly to its one child.
  3. Two children: find the in-order successor (the smallest value in the right subtree), swap its value into the node being deleted, then delete the successor from the right subtree. The successor is a leaf or has one right child, so you recurse into case 1 or 2.
def delete(root, val):
    if root is None:
        return None
    if val < root.val:
        root.left = delete(root.left, val)
    elif val > root.val:
        root.right = delete(root.right, val)
    else:
        # This is the node to delete.
        if root.left is None:
            return root.right          # case 1 or 2
        if root.right is None:
            return root.left           # case 2
        # Case 3: two children — find in-order successor.
        successor = root.right
        while successor.left:
            successor = successor.left
        root.val = successor.val
        root.right = delete(root.right, successor.val)
    return root

Delete is also O(h) — the recursive walk follows one path, and the successor search descends one more path within the right subtree.

In-order traversal gives sorted output

Walk the tree left → root → right and you visit every value in ascending order. This is not a coincidence: the BST invariant guarantees that every left-subtree value is smaller and every right-subtree value is larger, so the in-order sequence is sorted.

def inorder(root, result=None):
    if result is None:
        result = []
    if root is None:
        return result
    inorder(root.left, result)
    result.append(root.val)
    inorder(root.right, result)
    return result

This gives you a free O(n) sort as a side effect — insert n items in O(n log n) for a balanced tree, traverse in O(n), and you have a sorted list.

Playground: insert + inorder

This snippet builds a BST from an arbitrary list, traverses it in order, and prints the sorted result. Modify the input list to experiment.

Balancing: why AVL and red-black trees exist

The degeneration problem is real. Any time you insert pre-sorted data into a naive BST — a common scenario when building a tree from a sorted file, a sorted database column, or a time-ordered log — every operation becomes O(n).

Balanced BSTs add a rebalancing rule on top of the standard insert/delete:

  • AVL tree: after every insert or delete, check the height difference between left and right subtrees at each ancestor. If the difference exceeds 1, perform a rotation to restore balance. Guarantees height ≤ 1.44 log₂n, so all operations are always O(log n).
  • Red-black tree: uses a two-color labeling scheme (red/black nodes) with five structural rules. Less strictly balanced than AVL, but rotations are cheaper on average. Used in Python’s sortedcontainers.SortedList, Java’s TreeMap, and C++‘s std::map.

You will rarely implement a balanced BST from scratch in application code — standard library implementations are highly optimised. But you need to understand the problem they solve to choose the right tool.

Complexity summary

OperationBalanced (h = log n)Degenerate (h = n)
SearchO(log n)O(n)
InsertO(log n)O(n)
DeleteO(log n)O(n)
In-order traversalO(n)O(n)
Min / MaxO(log n)O(n)

The key insight: all per-node operations are O(h), and the choice of BST variant determines what h is.

Quick check

Quick check

0/3
Q1You insert the values [1, 2, 3, 4, 5] in that order into a plain BST. What is the resulting tree height?
Q2What does in-order traversal of a valid BST always produce?
Q3A balanced BST has n = 1,000,000 nodes. What is the worst-case number of comparisons for a search?

Practice this in an interview

All questions

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content