What is Big-O notation?

Big-O describes how an algorithm's running time or memory grows as the input grows, ignoring constant factors. It's the language for comparing algorithms by scalability rather than by stopwatch on one machine.

What's the difference between O(1), O(n), and O(n squared)?

O(1) takes the same time regardless of input size; O(n) grows linearly, doubling when the input doubles; O(n squared) grows with the square, so doubling the input quadruples the work. The gaps widen dramatically at scale.

Does Big-O measure actual speed?

No — it describes growth rate, not wall-clock time. An O(n squared) algorithm can beat an O(n log n) one on small inputs because of constants and overhead; Big-O tells you which wins as the data gets large.

Big-O & Complexity — DSA

Suppose you need to find one friend’s phone number, and you have two ways to do it.

In the first, the numbers sit in a phone book sorted by name. You open to the middle, see whether your friend’s name falls before or after, and throw away the half that cannot contain them — again and again until one name is left. In the second, the numbers are scribbled on a pile of unsorted sticky notes, and the only way to find your friend is to read the notes one at a time until you hit the right one.

For ten numbers, both feel instant. The interesting question is what happens when the pile is not ten numbers but ten million.

The work grows differently

Let us put numbers on it. With the sorted phone book, each look halves what is left: ten million names shrink to five million, then to two and a half million, and you reach the answer in about twenty-three peeks. With the sticky-note pile, ten million notes might mean ten million reads.

Twenty-three versus ten million. Same task, same computer — but as the input grows, one method strolls while the other collapses. So the thing we really want to measure is not “how fast is this on my laptop today”. It is this: as the input grows, how does the amount of work grow? That question is what Big-O answers.

Naming the idea

The notation that captures how work grows with input size is called Big-O. We write the sticky-note search as O(n) — the work grows in step with n, the number of items — and the phone-book search as O(log n), because each step throws away half of what remains.

Big-O deliberately ignores some things. It does not care how fast your processor is, how clever your cache is, or what the constant factors are. It keeps only the shape of the growth. One rule makes this work: for a large enough input, the biggest term dwarfs everything else. Work of 3n² + 500n + 9000 is simply O(n²), because push n high enough and the n² term buries the other two. For the same reason O(2n) collapses to O(n): Big-O keeps the shape of the curve and discards constants and smaller terms.

Watching one search, step by step

Before any table of classes, let us trace a single concrete search and count the work. Suppose our list is

[8, -4, 7, 17, 0, 2, 19]

and we are looking for 17. With no order to exploit, we check from the left:

8?  no
-4? no
7?  no
17? yes — found, at the fourth position

Four comparisons. If we had been looking for 19, the last element, it would have taken all seven. This counting — how many steps for an input of size n — is exactly what Big-O describes, and it already hints at something important: the same search can cost wildly different amounts depending on what we are looking for. We will come back to that.

The six classes

Think of these as a map of growth shapes. The first few are your everyday companions; the last, O(2ⁿ), is the cliff you learn to design around.

Class	Informal name	One-line example
O(1)	Constant	Dictionary key lookup (`d["key"]`)
O(log n)	Logarithmic	Binary search in a sorted list
O(n)	Linear	A single `for` loop over all items
O(n log n)	Linearithmic	Python’s `sorted()`, merge sort
O(n²)	Quadratic	A nested `for` loop (compare every pair)
O(2ⁿ)	Exponential	Generating all subsets of a set

Same n, wildly different cost: the bottom two rows are why an O(n²) step that is fine in dev pages you in prod.

O(1) — constant time

The work does not change no matter how big the input is.

cache = {"user_42": {"name": "Alice", "score": 98}}

def get_user(uid):
    return cache[uid]   # hash lookup — always one step

One step whether the dictionary holds ten entries or ten million.

O(log n) — logarithmic

Each step halves what is left — the phone-book search from the start of the lesson.

def binary_search(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

For a sorted list of 1,000,000 items this takes at most about 20 comparisons. Why 20? Each step throws away half: 1,000,000 → 500,000 → 250,000 → … → 1. You can only halve a million about twenty times before nothing is left, and how many times you can halve n is precisely what log₂ n means. Logarithmic time is the signature of “halve the problem every step”.

O(n) — linear

One pass through the data — the sticky-note search.

def find_max(nums):
    best = nums[0]
    for x in nums:      # visits every element once
        if x > best:
            best = x
    return best

Double the list, double the work. Straightforward.

O(n log n) — linearithmic

The practical sorting floor. Linearithmic just means linear × logarithmic: you do an O(n) pass through the data, and you repeat it about log n times. Merge sort is the clearest picture — it splits the list in half about log n times (the log n), and stitching each level back together touches every element once (the n).

data = [5, 2, 8, 1, 9, 3]
data.sort()   # O(n log n) — Timsort under the hood

For 1,000,000 items that is about 20,000,000 operations, not 1,000,000,000,000. That gap is the entire reason efficient sorting algorithms exist.

O(n²) — quadratic

Nested loops over the same input.

def has_pair_summing_to(nums, target):
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):   # the second loop is the culprit
            if nums[i] + nums[j] == target:
                return True
    return False

At n = 10,000 this touches about 50,000,000 pairs. At n = 100,000, about 5,000,000,000. It stops being viable surprisingly fast.

O(2ⁿ) — exponential

Generating every subset of a set requires 2ⁿ subsets. At n = 30 that is over a billion; at n = 60 it exceeds the number of atoms in a grain of sand. These algorithms run only on small inputs, by their nature.

Best, worst, and average case

Here is the subtlety the table above quietly hides. Go back to the sticky-note search on our list [8, -4, 7, 17, 0, 2, 19]. We said it is O(n) — but is it always?

It depends entirely on where the value sits.

Best case. The value we want is the very first one we check. One comparison, and we are done — no matter whether the list holds seven items or seven million. That is O(1).
Worst case. The value is the last element, or not in the list at all. Then we check every single item before we can answer. For n items that is n comparisons — O(n).
Average case. If the value is equally likely to be anywhere, then on average we look through about half the list before finding it — roughly n / 2 comparisons. Big-O drops the constant ½, so the average case is also O(n).

So one algorithm has three honest complexities, depending on which input it meets.

One algorithm, three honest complexities: best O(1), worst O(n), average O(n).

So which one does a plain “O(n)” usually refer to? By convention, Big-O quotes the worst case unless it says otherwise. The worst case is a guarantee — “this will never be slower than this” — and a guarantee is what you want when you are deciding whether a system survives its busiest day. The average case is what you tend to feel in everyday use, and it matters too:

Quicksort is O(n²) in the worst case (a pathological pivot every time) but O(n log n) on average — which is why it is used constantly despite that scary worst case.
A Python dict lookup is O(1) on average, but O(n) in the rare worst case where many keys collide into the same hash bucket.

When you read or quote a complexity, always know which case you mean.

Amortized cost: the occasional expensive step

There is one more case worth meeting, because it explains a Python everyday you have already relied on. Appending to a list, nums.append(x), is described as O(1) — but that cannot be quite true, because a list sometimes has to grow its underlying storage, copying every element to a bigger block, which is O(n).

The resolution is amortized cost. A Python list grows its capacity in big jumps, so those expensive O(n) copies happen rarely, and the cheap O(1) appends happen constantly in between. Spread the rare expensive step across the many cheap ones, and the average cost per append works out to O(1). That averaged-over-a-sequence figure is what “O(1) amortized” means.

Notice the difference from average case. Average case averages over different possible inputs; amortized averages over a sequence of operations on the same structure. Both turn a scary occasional cost into a calm typical one — but they average over different things.

Watch the gap widen

Let us make the growth visible by counting the work rather than timing it — a count is exact and does not depend on how fast your machine is. A single loop does n steps; a nested loop does n × n.

def linear_steps(n):
    steps = 0
    for i in range(n):
        steps += 1             # one step per item
    return steps

def quadratic_steps(n):
    steps = 0
    for i in range(n):
        for j in range(n):     # a full inner loop for every outer step
            steps += 1
    return steps

for n in [500, 1000, 2000, 4000]:
    print(n, linear_steps(n), quadratic_steps(n))

This prints:

500    500      250000
1000   1000     1000000
2000   2000     4000000
4000   4000     16000000

Read down the two right-hand columns. Each time n doubles, the linear count merely doubles — 500, 1000, 2000 — while the quadratic count quadruples — 250,000, then 1,000,000, then 4,000,000. That four-fold jump for a two-fold input is the n² growth made visible, and it is exactly why an O(n²) step that feels instant on 500 rows can freeze on 50,000.

A thing to chew on

A function runs a single loop from 0 to n, and inside that loop it runs a binary search over a sorted array of size n. The loop is O(n); each binary search is O(log n). What is the complexity of the whole thing — and have you seen that class somewhere already in this lesson?

Summary

Big-O measures how the work grows as the input grows — not the speed of any one machine.
Keep only the dominant term; drop constants and smaller terms. 3n² + 500n is O(n²).
The same algorithm can have a best, worst, and average case; plain Big-O usually means the worst case (a guarantee).
Amortized O(1), like list.append, spreads a rare expensive step across many cheap ones.
O(n log n) is the practical floor for comparison sorting; O(2ⁿ) is the cliff you design around.

Practice

Recall. In one line each: what does Big-O measure, and which case does it quote by default?

Apply. A loop halves a number until it reaches 1 (n = n // 2 each step). Roughly how many steps, as a function of the starting n? Then check yourself below.

Quick check

0/4

Q1A single loop runs n times, and inside each pass it runs a binary search (O(log n)) over a list of size n. What is the overall complexity?

Q2Linear search through a list is best case O(1) and worst case O(n). When someone just says 'linear search is O(n)', which case are they quoting?

Q3list.append is called O(1) amortized, even though it occasionally copies the whole list (O(n)) to grow. Why is 'amortized O(1)' fair?

Q4A colleague says their O(2ⁿ) algorithm is fine because 'n only goes up to 60'. What is the problem?

Big-O & Complexity

What you'll learn

Before you start