datarekha
Python Medium Asked at GoogleAsked at AmazonAsked at MetaAsked at Databricks

What is the difference between a generator and a list, and when should you prefer a generator?

The short answer

A list materialises all values in memory at once; a generator produces values one at a time on demand, using O(1) memory regardless of the sequence length. Prefer generators for large or infinite sequences, pipelines, and any situation where you do not need random access.

How to think about it

The core trade-off

A list is a finished container — all values computed and stored upfront. A generator is a recipe — it computes the next value only when asked. The memory difference is dramatic: a list of 10 million items takes ~85 MB; the equivalent generator takes ~104 bytes.

The trade-off is that generators are single-pass: once exhausted, they’re gone. No random access by index. If you need to iterate more than once, convert to a list first or recreate the generator.

Syntax comparison

A list comprehension builds everything immediately. Swap [] for () and you get a lazy generator expression instead:

import sys

big_list = [x * x for x in range(10_000_000)]
print(sys.getsizeof(big_list))   # ~85 MB

big_gen = (x * x for x in range(10_000_000))
print(sys.getsizeof(big_gen))    # ~104 bytes

Generator functions use yield to suspend and resume:

def read_chunks(filepath, size=4096):
    with open(filepath, "rb") as f:
        while chunk := f.read(size):
            yield chunk

for chunk in read_chunks("dataset.bin"):
    process(chunk)   # only one chunk in memory at a time
list [0..N]All N items in RAMgeneratorOne item per next() call
Lists allocate all items upfront; generators yield one item per next() call.

Try it: see the memory difference and the exhaustion trap

Generators compose into pipelines

The real power is composability — each stage of a pipeline is a generator, the whole thing runs in constant memory, and sum (or any consumer) drives all stages:

lines   = (line.strip() for line in open("log.txt"))
records = (line.split(",") for line in lines if line)
values  = (float(r[2]) for r in records)
total   = sum(values)   # entire pipeline runs in constant memory

The key insight

Every for loop, sum, list, and join drives a generator by calling next() in a loop. A generator just decides how much work to do per call. That’s the entire model — the syntax (yield) is just Python’s way of letting a function pause and resume.

Learn it properly Generators

Keep practising

All Python questions

Explore further

Skip to content