Write Python to read a CSV file line by line and compute a column aggregate without loading the entire file into memory.
Using the csv module with a generator or a running accumulator keeps memory use constant — O(1) space — regardless of file size. This matters when files are larger than available RAM, a common situation in data engineering pipelines.
How to think about it
The interviewer is checking two things: can you use the csv module correctly, and do you understand the memory implication of iterating row by row vs loading everything at once. The key word in the question is “without loading the entire file into memory” — that tells you to reach for a running accumulator, not pd.read_csv.
The trick for the playground: Pyodide (the browser Python) has no real filesystem, so we simulate the file with io.StringIO. The actual logic is identical to what you’d write against a real file.
Running sum — O(n) time, O(1) space
Why this is O(1) space
csv.DictReader wraps the file handle, which is itself a lazy iterator. Each iteration yields one dict and discards it before asking for the next line. At any moment only one row lives in memory, regardless of whether the file is 1 KB or 1 TB.
Compare that to list(reader) at the top — that materialises every row into a list before you process any of them, blowing memory for large files.
Real file version
import csv
from collections import defaultdict
def sum_column(filepath: str, col: str) -> float:
total = 0.0
with open(filepath, newline="", encoding="utf-8") as fh:
reader = csv.DictReader(fh)
for row in reader:
total += float(row[col])
return total
revenue = sum_column("sales.csv", "revenue")
Use newline="" (not newline="\n") when opening so the csv module handles line endings cross-platform correctly. The with block ensures the file is closed even if an exception occurs mid-read.