datarekha
Python Easy Asked at GoogleAsked at MetaAsked at NetflixAsked at Databricks

Why is NumPy significantly faster than Python for-loops for numerical computation, and what is vectorization?

The short answer

NumPy operations execute compiled C code over contiguous memory blocks in a single call, while a Python loop incurs interpreter overhead and dynamic type checks on every element. Vectorization means expressing an operation over an entire array at once so the hot path never re-enters the Python interpreter.

How to think about it

The mental model to lead with

Every Python object — even a plain integer — is a heap-allocated struct with a reference count, a type pointer, and then the actual value. A Python list is an array of pointers to those structs. A NumPy array is a raw block of C doubles (or ints) packed directly in memory, with no per-element Python overhead.

When NumPy does a + b, it dispatches once to a compiled C kernel that walks through raw memory with SIMD instructions. When a Python loop does the same thing, it executes that dispatch machinery on every single element.

The cost of a Python loop

Every iteration:

  1. Fetches the next boxed Python object (reference count check, type dispatch)
  2. Calls the operator through the interpreter bytecode
  3. Allocates a new Python object for the result

For 10 million numbers, that is 10 million type checks and 10 million heap allocations.

See the speedup yourself

Vectorization in practice

Instead of looping to apply an operation row by row, express it over the whole array:

# Avoid — Python loop defeats NumPy
result = []
for row in matrix:
    result.append(row.sum())

# Use — stays in C
row_sums = matrix.sum(axis=1)

Broadcasting rules let NumPy extend scalar and lower-dimensional operations across higher-dimensional arrays without copying data, compounding the advantage.

Keep practising

All Python questions

Explore further

Skip to content