Python Easy Asked at AmazonAsked at Databricks

Write Python to read a CSV file line by line and compute a column aggregate without loading the entire file into memory.

For Data Analyst Data Engineer Data Scientist

The short answer

Using the csv module with a generator or a running accumulator keeps memory use constant — O(1) space — regardless of file size. This matters when files are larger than available RAM, a common situation in data engineering pipelines.

How to think about it

The interviewer is checking two things: can you use the csv module correctly, and do you understand the memory implication of iterating row by row vs loading everything at once. The key word in the question is “without loading the entire file into memory” — that tells you to reach for a running accumulator, not pd.read_csv.

The trick for the playground: Pyodide (the browser Python) has no real filesystem, so we simulate the file with io.StringIO. The actual logic is identical to what you’d write against a real file.

Running sum — O(n) time, O(1) space

Why this is O(1) space

csv.DictReader wraps the file handle, which is itself a lazy iterator. Each iteration yields one dict and discards it before asking for the next line. At any moment only one row lives in memory, regardless of whether the file is 1 KB or 1 TB.

Compare that to list(reader) at the top — that materialises every row into a list before you process any of them, blowing memory for large files.

Real file version

import csv
from collections import defaultdict

def sum_column(filepath: str, col: str) -> float:
    total = 0.0
    with open(filepath, newline="", encoding="utf-8") as fh:
        reader = csv.DictReader(fh)
        for row in reader:
            total += float(row[col])
    return total

revenue = sum_column("sales.csv", "revenue")

Use newline="" (not newline="\n") when opening so the csv module handles line endings cross-platform correctly. The with block ensures the file is closed even if an exception occurs mid-read.

Learn it properly Functions

Write Python to read a CSV file line by line and compute a column aggregate without loading the entire file into memory.

Running sum — O(n) time, O(1) space

Why this is O(1) space

Real file version

Keep practising

Explore further