datarekha

Python Built-ins & Their Cost

Every built-in operation has a hidden Big-O. Knowing a handful of them is 80% of writing fast Python.

7 min read Beginner Data Structures & Algorithms Lesson 4 of 32

What you'll learn

  • The amortized complexity of the list, dict, set, and deque operations you use most
  • Why membership tests and insert(0) on lists silently cause O(n^2) programs
  • When to swap a list for a set, dict, or deque and why it matters
  • How to measure the difference with stdlib tools you already have

Before you start

Every time you write x in my_list or my_list.insert(0, x), Python is doing work proportional to the length of the list. Call either one inside a loop over a large dataset and your program goes from fast to unusably slow without any obvious error. The fix is almost always one data-structure swap — but only if you know which swap to make.

The complexity table

This covers the operations that show up constantly. “Avg” means amortized average case; worst-case for hashing is O(n) in pathological inputs (all keys collide), but in practice you never see it.

StructureOperationCost
listlst[i] — index by positionO(1)
listlst.append(x)O(1) amortized
listlst.pop() — remove lastO(1)
listlst.pop(0) — remove firstO(n)
listlst.insert(0, x) — prependO(n)
listx in lst — membership testO(n)
dictd[k], d[k] = v, del d[k]O(1) avg
dictk in d — membership testO(1) avg
setx in s — membership testO(1) avg
sets.add(x), s.discard(x)O(1) avg
dequedq.append(x), dq.pop()O(1)
dequedq.appendleft(x), dq.popleft()O(1)

The list stands out: O(1) at the tail, O(n) everywhere else. That asymmetry is the source of most Python performance bugs.

Why list is O(n) at the front

Python lists are backed by a contiguous array. When you call insert(0, x), Python has to shift every existing element one slot to the right to make room. A list with a million items means a million shifts — every single time. pop(0) is symmetric: all remaining elements shift left.

append and pop() at the tail avoid this entirely because nothing needs to move. Occasional resizing doubles the array capacity, but spreading that cost across all insertions makes the per-call cost O(1) on average — which is what amortized O(1) means: expensive rarely enough that the long-run average stays constant.

Proving it with time.perf_counter

The playground below runs two comparisons and prints timing. It uses only stdlib so it runs in Pyodide without any extra installs.

You should see the list membership test run roughly 100-500x slower than the set at 50k items, and the prepend loop gap widen as ITERS grows — that’s O(n) vs O(1) made visible.

The DS framing — why this shapes your choices

These aren’t arbitrary trivia. The complexity table is the reason for the canonical data-structure advice:

Use set for deduplication and membership. Converting a list to a set is O(n) once, after which every lookup is O(1). If you need “have I seen this ID before?”, a set is the right structure.

Use dict for lookups and joins. Building a {id: record} dict from a list is O(n). Looking up by key is O(1). If you join two datasets by key in a loop, the dict side must be built first — that’s the entire logic behind hash joins in SQL engines.

Use deque for sliding windows and queues. deque (double-ended queue) is a doubly-linked block structure with O(1) operations at both ends. Any time you consume from the left and append to the right (or vice versa), deque keeps every step O(1). A list doing the same work is O(n) per step, O(n^2) overall.

Quick check

Quick check

0/3
Q1You have a list of 200,000 transaction IDs and need to check, for each of 200,000 incoming events, whether its ID is already in that list. What is the complexity of the naive approach (`id in transaction_list` in a loop)?
Q2Which operation on a Python list is O(1) amortized?
Q3You need to build a queue that frequently adds items at one end and removes them from the other. Which structure gives O(1) for both operations?

What’s next

Now that you know why certain structures are fast, the next lesson puts that knowledge to work: we’ll look at the two most common algorithmic patterns in data processing — the frequency count (where Counter shines) and the two-pointer / sliding-window technique (where deque shines).

Practice this in an interview

All questions
What set operations does Python support, and where are they practically useful in data work?

Python sets support union, intersection, difference, and symmetric difference as both operators and methods, all running in O(min(m,n)) to O(m+n) time. They are useful for deduplication, membership testing in large collections, and computing overlaps between datasets — operations that would be expensive with lists.

How do you reverse a list and remove duplicates in Python, and what are the performance implications of each approach?

Reversing a list is O(n) whether you use slice notation or list.reverse(). Deduplication is O(n) with a set conversion but O(n²) if you check membership against a list. Understanding when order must be preserved changes which tool to reach for.

How do list, dict, and set comprehensions work in Python, and when should you avoid them?

Comprehensions are syntactic sugar for building a new collection by iterating over an iterable and optionally filtering elements. They are faster than equivalent for-loops because the iteration runs at the C level inside the interpreter. Avoid them when the expression is too complex to read at a glance — a plain loop with descriptive variable names is preferable.

What is the difference between CPU-bound and I/O-bound work, and how does the choice affect concurrency strategy in Python?

CPU-bound work keeps the processor busy the whole time — matrix multiplication, compression, parsing. I/O-bound work spends most of its time waiting for a slow external resource — network, disk, database. The distinction directly determines which concurrency primitive to reach for: multiprocessing for CPU-bound (bypasses the GIL), threading or asyncio for I/O-bound (GIL released during waits).

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content