Why is NumPy significantly faster than Python for-loops for numerical computation, and what is vectorization?

NumPy operations execute compiled C code over contiguous memory blocks in a single call, while a Python loop incurs interpreter overhead and dynamic type checks on every element. Vectorization means expressing an operation over an entire array at once so the hot path never re-enters the Python interpreter.

Why is pandas slow, and what are the main strategies to speed it up?

pandas is slow primarily because Python loops bypass NumPy's vectorized C kernels, object-dtype columns prevent SIMD optimizations, and keeping entire datasets in memory limits scalability. The fixes are vectorization, categorical encoding, eval/query for large frames, chunking for out-of-core data, and switching to Polars or DuckDB for compute-heavy pipelines.

When would you use a Python list versus a NumPy array, and what are the performance trade-offs?

Python lists are heterogeneous, pointer-based, and general-purpose. NumPy arrays are homogeneous, stored as contiguous typed memory, and support vectorised operations that run at C speed. For numerical work on more than a few hundred elements, NumPy is almost always faster and more memory-efficient.

When should you use apply, map, or applymap versus vectorized pandas operations, and what are the performance implications?

Vectorized pandas and NumPy operations operate on entire arrays in compiled C/Fortran code and should always be your first choice. apply runs a Python function row- or column-wise in a Python loop, map transforms a single Series element-by-element, and applymap (DataFrame.map in pandas 2.1+) applies a function to every scalar — all three are orders of magnitude slower than vectorized equivalents.

Vectorization vs Loops — DSA

What you'll learn

Why an O(n) Python loop and an O(n) NumPy call share complexity but differ hugely in constants

What vectorization means, and how NumPy runs the loop in C over contiguous typed memory

How broadcasting removes explicit loops without changing the algorithm

Why df.apply and iterrows are slow — a Python call per row

Big-O tells you how the work grows as the data grows. It says nothing about how fast a single step is — and that is exactly where numeric Python lives. Two algorithms can both be O(n) and still differ by 100× in real time.

Take summing a million numbers. A Python for loop and np.sum both touch every element once; both are O(n). Yet on a normal laptop the NumPy version finishes in around a millisecond while the Python loop takes fifty to a hundred times longer. Same complexity class, wildly different constant — and the constant is the whole story.

Where the time actually goes

A Python loop does far more than arithmetic per step. For each element the interpreter fetches an object reference, checks its type (it has no static guarantee it is even a number), unboxes the value from inside the Python object, dispatches the + through a method table, and allocates a new object to hold the result. That is five-plus interpreter operations per element, each chasing a pointer into heap memory scattered across many cache lines.

NumPy skips every one of those. An ndarray holds its values as raw typed bytes — a block of float64s packed side by side, no per-element object, no headers — and np.sum hands that block to one C function whose tight inner loop the CPU can pipeline and run with SIMD (one instruction adding four or eight floats at once). The difference is not the algorithm; it is the architecture: one loop runs in the Python interpreter, the other in compiled C with full hardware access.

The list chases a pointer to a fresh object per element; the array streams values the CPU prefetcher loves.

Vectorization is a mindset, not just `sum`

Vectorization means expressing an operation over a whole array at once, so a compiled C loop does the iterating instead of Python. It is not only about replacing sum — it is about writing whole-array math:

import numpy as np
temps_c = np.array([0.0, 20.0, 37.0, 100.0])

# Slow — an explicit Python loop, one interpreter round-trip per element
temps_f = [t * 9/5 + 32 for t in temps_c]

# Fast — broadcasting: the scalars apply across the whole array inside C
temps_f = temps_c * 9/5 + 32

That second form uses broadcasting: NumPy spreads the scalars across the array in compiled code, with no Python loop in sight. It reads like scalar math and runs like a batch operation, and it generalises to A + B, A * B, boolean masks, and slicing — each compiling to the same tight C iteration. The same logic explains the pandas trap: df.apply(func, axis=1) calls your Python function once per row, and df.iterrows() builds a fresh Series per row — half a million Python calls for half a million rows. Express it on whole columns instead, and pandas hands the work to NumPy:

# Slow — a Python call per row
df["result"] = df.apply(lambda row: row["a"] * 2 + row["b"], axis=1)

# Fast — vectorised over columns, no Python loop
df["result"] = df["a"] * 2 + df["b"]

Both are O(n). The constants differ by that same 50-to-100× — which is why this is the single highest-return habit in everyday data work: the moment you catch yourself looping over a NumPy array or a DataFrame, look for the one-line array expression that replaces it.

Practice

Quick check

0/3

Q1A Python for-loop sum and np.sum over the same array are both O(n). Why is np.sum 50-100× faster?

Q2Fastest way to compute result = a*2 + b for every row of a large DataFrame?

Q3Vectorising a list comprehension cuts a step from 8 s to 0.1 s. What changed?

Vectorization vs Loops

What you'll learn

Before you start

Where the time actually goes

Vectorization is a mindset, not just `sum`

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further

What you'll learn

Before you start

Where the time actually goes

Vectorization is a mindset, not just sum

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further

Vectorization is a mindset, not just `sum`