When would you use a Python list versus a NumPy array, and what are the performance trade-offs?
Python lists are heterogeneous, pointer-based, and general-purpose. NumPy arrays are homogeneous, stored as contiguous typed memory, and support vectorised operations that run at C speed. For numerical work on more than a few hundred elements, NumPy is almost always faster and more memory-efficient.
How to think about it
What the interviewer wants to hear
In a data science or ML interview, this question is really about whether you understand why NumPy is fast — not just that it is. The answer is all about memory layout. Python lists store pointers to arbitrary objects; NumPy stores raw values of a single type contiguously, like a C array.
Memory layout — the root of everything
A Python list stores an array of pointers to Python objects. Each object carries a full header: type tag, reference count, value. For a list of a million floats, you have a million separate objects scattered in memory, plus a million pointers.
A NumPy array stores raw float64 values back-to-back, 8 bytes each, no headers, no pointers. That means:
- Dramatically less memory.
- Cache-friendly access — the CPU can prefetch contiguous memory efficiently.
- Operations can be dispatched to compiled BLAS/LAPACK routines entirely outside the Python interpreter loop.
import sys, numpy as np
py = [1.0] * 1_000
arr = np.ones(1_000, dtype=np.float64)
sys.getsizeof(py) # ~8,056 bytes — pointers only, ignores float objects
arr.nbytes # 8,000 bytes — raw doubles, nothing else
Interactive benchmark — measure the gap
When to stick with a plain list
- Heterogeneous elements:
[42, "hello", None, True] - Dynamic append-heavy workloads where you do not know the final size ahead of time
- Small collections where the NumPy import overhead outweighs the gains
- Non-numeric data (strings, objects, nested dicts)
Choosing the right tool
| Need | Use |
|---|---|
| General-purpose mixed data | list |
| Numeric computation / ML features | numpy.ndarray |
| Tabular data with named columns | pandas.DataFrame |
| Typed compact sequence (stdlib only) | array.array |