datarekha

Heaps & Priority Queues

How a heap keeps the best element always at the top — O(1) peek, O(log n) push/pop — and why it is the natural priority queue for Dijkstra, beam search, and streaming top-K.

8 min read Intermediate Data Structures & Algorithms Lesson 15 of 32

What you'll learn

  • How a heap stores a complete binary tree in a plain array with the index formula i → 2i+1, 2i+2
  • What the heap property means and how sift-up / sift-down restore it in O(log n)
  • Why build-heap is O(n) even though it calls sift-down for every node
  • How a size-K min-heap finds the K largest items in a stream in O(n log K) time

Before you start

A heap answers one question very cheaply: what is the best item right now?

Think of it as a tournament bracket. Every match is played locally — a parent always beats both its children. The winner of the whole tournament sits at the root, ready to grab in O(1). When that winner is removed, one re-seeding pass (O(log n)) restores the bracket. That is the whole idea.

The everyday intuition

A hospital triage desk is a min-heap. The patient with the lowest priority number (most urgent) is always at the front. When a new patient arrives, staff slot them in and bubble them up toward the front only as far as their priority demands. When the most urgent patient is called, the desk re-sorts in seconds, not minutes. No one ever scans the whole waiting room.

A task scheduler, a graph shortest-path algorithm, a search engine’s “top-10 results” buffer — all of them are the same desk.

The backing array

A heap is a complete binary tree stored in a plain array. No pointers. The structure comes from index arithmetic:

  • Node at index i has its left child at 2i + 1 and right child at 2i + 2.
  • Node at index i has its parent at ⌊(i − 1) / 2⌋.
  • The root is always at index 0.

A “complete” tree fills every level left-to-right with no gaps, so the array is always dense and cache-friendly.

The heap property

For a min-heap: every parent is less than or equal to both its children. The root is the global minimum.

For a max-heap: every parent is greater than or equal to both its children. The root is the global maximum.

The property does not say anything about the relationship between siblings or cousins — only the direct parent-child line matters.

Use Push / Pop mode to build a heap by hand and watch each sift-up and sift-down step. Toggle between min-heap and max-heap. Then switch to Top-K streaming to see the classic pattern in action.

Sift-up (push)

To push a new value:

  1. Append it to the end of the array.
  2. Compare it with its parent. If it violates the heap property (e.g., it is smaller than its parent in a min-heap), swap them.
  3. Repeat from the parent’s position.

Each swap moves the element one level closer to the root. A complete tree with n nodes has height ⌊log₂ n⌋, so at most ⌊log₂ n⌋ swaps are ever needed — this is O(log n).

Sift-down (pop)

To pop the root:

  1. Record the root value — that is the answer.
  2. Move the last element in the array to the root position (shrink the array by 1).
  3. Compare the new root with both children. If it violates the heap property, swap it with the better child (the smaller one in a min-heap).
  4. Repeat from the new position.

Again O(log n) — at most one path from root to leaf.

Build-heap: O(n)

Given an arbitrary array, you can turn it into a heap in O(n) time — not O(n log n) as you might expect — by running sift-down on every non-leaf node from the bottom up:

import heapq

data = [9, 4, 7, 1, 8, 3, 6, 2, 5]
heapq.heapify(data)   # in-place, O(n)
print(data[0])        # 1 — the minimum is now at the root

The proof that this is O(n) relies on the fact that most nodes are near the bottom, where sift-down paths are short. Concretely: about half the nodes are leaves (sift-down cost 0), about a quarter are one level up (cost at most 1), an eighth are two levels up (cost at most 2), and so on. Summing this geometric series gives O(n) total, not O(n log n).

A heap is a priority queue

The heap is the standard way to implement a priority queue: a collection that always serves the item with the highest (or lowest) priority next. Push is O(log n), pop is O(log n), peek is O(1).

Python’s heapq module exposes this directly — it is always a min-heap. For a max-heap, negate the values.

Top-K in O(n log K)

Classic interview problem: given a stream of n numbers, find the K largest.

The naive approach sorts everything: O(n log n). The heap approach is faster:

  • Maintain a min-heap of size K.
  • For each new number: if it is larger than the current minimum (root), pop the root and push the new number. Otherwise discard.
  • The heap always contains exactly the K largest numbers seen so far.

Each operation is O(log K), and you do at most n of them: O(n log K) total. When K is much smaller than n, this is a substantial win.

See it live

Run this to explore heapq directly: push values, pop the min, try nlargest, and run the top-K streaming pattern on a list of numbers.

Quick check

Quick check

0/4
Q1A node is at index 6 in a heap's backing array. What index is its left child?
Q2You have a min-heap. You call heappop(). What is the time complexity, and what value is returned?
Q3You are merging 100 sorted log files into one sorted output. You keep a min-heap of size 100 (one current entry per file). Each step: pop the smallest entry, write it, push the next entry from that file. What is the total time complexity if the combined output has N lines?
Q4You need to find the 10 largest values in a stream of 10 million numbers. Which approach is most efficient?

Practice this in an interview

All questions

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content