Big-O & Complexity
How to measure whether an algorithm will still work when your data is 1,000× bigger — the one concept that separates working code from scalable code.
What you'll learn
- What Big-O notation actually measures (growth rate, not raw speed)
- The six canonical complexity classes with one-line real-world examples
- How to read a nested loop and estimate its complexity by inspection
- The three most common Big-O pitfalls and why they matter in practice
Before you start
Big-O notation answers one question: as the input grows, how does the work grow?
It does not tell you how fast your laptop is, how good your cache is, or how clever your constants are. It tells you the shape of the growth curve — and that shape is the most important thing to know before you commit to an algorithm.
The everyday intuition
Imagine you are trying to find a friend’s address.
- Phone book, sorted by name: you open to the middle, decide which half they are in, repeat. That is O(log n) — each step cuts the problem in half.
- Pile of unsorted sticky notes: you have to read every single one. That is O(n) — the work grows linearly with the pile.
- Checking every pair of sticky notes to see which two are closest together: that is O(n²) — comparing every pair in a pile of 1,000 notes is about half a million comparisons.
The notation does not care whether the pile is sticky notes or a million-row database table. The shape is the same.
One rule makes all of this work: for large enough n, the biggest term dwarfs everything else. So we keep only that term and throw the rest away. Work of 3n² + 500n + 9000 is simply O(n²) — push n high enough and the n² term buries the others. It is the same reason O(2n) collapses to O(n): Big-O keeps the shape of the growth and discards constants and smaller terms.
The six classes
Think of these as a map of growth shapes. The first few are your everyday companions; the last one, O(2ⁿ), is the cliff you learn to design around. We will walk each one right after.
| Class | Informal name | One-line example |
|---|---|---|
| O(1) | Constant | Dictionary key lookup (d["key"]) |
| O(log n) | Logarithmic | Binary search in a sorted list |
| O(n) | Linear | A single for loop over all items |
| O(n log n) | Linearithmic | Python’s sorted(), merge sort |
| O(n²) | Quadratic | A nested for loop (compare every pair) |
| O(2ⁿ) | Exponential | Generating all subsets of a set |
Drag the slider to n = 32 and notice: O(2ⁿ) has already reached over 4 billion operations while O(n log n) is sitting at 160. Toggle to the log y-axis to see the exponential curve squished enough to read the others. Click a curve in the legend to isolate it.
Walking each class
O(1) — constant time
The work does not change regardless of input size.
cache = {"user_42": {"name": "Alice", "score": 98}}
def get_user(uid):
return cache[uid] # hash lookup — always one step
One step whether the dictionary has 10 entries or 10 million.
O(log n) — logarithmic
Each step halves the remaining search space.
def binary_search(arr, target):
lo, hi = 0, len(arr) - 1
while lo <= hi:
mid = (lo + hi) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
lo = mid + 1
else:
hi = mid - 1
return -1
For a sorted list of 1,000,000 items, this takes at most 20 comparisons (log₂ 1,000,000 ≈ 20).
Why 20? Each comparison throws away half of what’s left: 1,000,000 → 500,000 → 250,000 → … → 1. You can only halve a million about 20 times before nothing remains. That count — how many times you can halve n — is exactly what log₂ n means. Logarithmic time is the signature of “halve the problem every step.”
O(n) — linear
One pass through the data.
def find_max(nums):
best = nums[0]
for x in nums: # visits every element once
if x > best:
best = x
return best
Double the list, double the work. Straightforward.
O(n log n) — linearithmic
The practical sorting floor. Linearithmic just means linear × logarithmic: you do an O(n) pass through the data, and you repeat it about log n times. Merge sort is the clearest picture of this — it splits the list in half log n times (that’s the log n), and stitching each level back together touches every element once (that’s the n). Timsort (Python’s default) achieves the same bound.
data = [5, 2, 8, 1, 9, 3]
data.sort() # O(n log n) — Tim sort under the hood
For 1,000,000 items this is about 20,000,000 operations, not 1,000,000,000,000. The difference is the entire reason efficient sorting algorithms exist.
O(n²) — quadratic
Nested loops over the same input.
def has_duplicate_pair(nums, target):
for i in range(len(nums)):
for j in range(i + 1, len(nums)): # second loop is the culprit
if nums[i] + nums[j] == target:
return True
return False
At n = 10,000, this touches ~50,000,000 pairs. At n = 100,000, ~5,000,000,000. It stops being viable surprisingly fast.
O(2ⁿ) — exponential
Generating all subsets (the power set) of a set requires 2ⁿ subsets. At n = 30, that is over a billion. At n = 60, it exceeds the number of atoms in a grain of sand. These algorithms can only run on small inputs by design.
See the difference live
Run this code. It times a single loop (O(n)) vs a nested loop (O(n²)) for the same input sizes and prints the ratio.
As n doubles, the quadratic time should roughly quadruple. That is the n² growth made visible.
Pitfalls
Quick check
Quick check
Questions about this lesson
What is Big-O notation?
Big-O describes how an algorithm's running time or memory grows as the input grows, ignoring constant factors. It's the language for comparing algorithms by scalability rather than by stopwatch on one machine.
What's the difference between O(1), O(n), and O(n squared)?
O(1) takes the same time regardless of input size; O(n) grows linearly, doubling when the input doubles; O(n squared) grows with the square, so doubling the input quadruples the work. The gaps widen dramatically at scale.
Does Big-O measure actual speed?
No — it describes growth rate, not wall-clock time. An O(n squared) algorithm can beat an O(n log n) one on small inputs because of constants and overhead; Big-O tells you which wins as the data gets large.
Practice this in an interview
All questionsAs the number of features grows, the volume of the feature space increases exponentially, so training data becomes exponentially sparse. Distance-based algorithms degrade because points become approximately equidistant; density estimation requires data that grows exponentially; and overfitting risk rises for any fixed training set size.
OLTP (Online Transaction Processing) systems handle high-throughput, low-latency reads and writes for individual records — think order placement, user authentication. OLAP (Online Analytical Processing) systems handle complex aggregations over millions of rows for business intelligence. Running heavy analytics directly on an OLTP database locks rows, competes for I/O, and slows application queries that customers feel.
OLTP systems handle many small, latency-sensitive transactions that read and write a few rows at a time, so they are optimized for fast point lookups and row-level locking. OLAP systems run infrequent but wide analytical queries over millions of rows, so they benefit from columnar storage, bulk scans, and denormalized schemas that minimize joins.
Binary search halves the search space each iteration to find a target in O(log n). The tricky part is not the idea but the boundary conditions: closed vs. half-open intervals, how to update lo/hi, and when to use lo < hi vs. lo <= hi. One clean template eliminates all the classic bugs.