Hash Tables & Linear Probing
A hash function turns a key into a slot for ~O(1) lookup; when slots collide, linear probing walks to the next free one. The open-addressing idea GATE keeps testing.
What you'll learn
- A hash table maps keys to slots via a hash function for average O(1) lookup
- Collisions resolved two ways: chaining (a list per slot) vs open addressing
- Linear probing tries slots h, h+1, h+2, … mod m until one is empty
- Load factor alpha = n/m, and expected unsuccessful-search probes grow like 1/(1 - alpha)
Before you start
A dictionary lookup does not search and does not loop — it computes the address.
A hash table runs a hash function on the key to get a slot number, then
goes straight there. Store and retrieve both cost one computation plus one slot
read, so lookup is O(1) on average, whether the table holds ten keys or ten
million. It is the engine under Python’s dict, every GROUP BY and hash join in a
database, and the deduplication step in most data pipelines — which is why GATE keeps
testing it.
The catch: two different keys can land on the same slot — a collision. How you resolve collisions is the whole subject, and it is exactly what GATE drills.
Hashing into slots
Pick a table of m slots and a hash function h(key) that returns an index in
0 … m−1. A typical exam hash is arithmetic, e.g. h(x) = 3x mod 10 for a
size-10 table. Insert a key by computing its slot; look it up by recomputing the
same slot. Deterministic and direct.
When h sends two keys to the same slot, you need a collision strategy. There are
two classic families.
Chaining vs open addressing
- Chaining — each slot holds a list (linked list or small array) of all keys
that hash there. Insert appends to the list; lookup hashes to the slot, then
scans its list. Average lookup cost is
O(1 + alpha)wherealphais the load factor below. - Open addressing — every key lives in the array itself. On a collision you
probe for another slot. Linear probing tries the next slots in order:
h, h+1, h+2, … (mod m)until an empty one appears. Lookup follows the same path until it finds the key or hits an empty slot.
The load factor is alpha = n / m (items over slots). Chaining tolerates
alpha > 1; open addressing cannot exceed alpha = 1 (the array fills up) and
slows sharply well before that.
How GATE asks this
The signature question is a NAT: a small table, an arithmetic hash, linear
probing, and a fixed insertion order — “which slot does key k end up in?” You
trace the probe sequence by hand. GATE DA has asked this in both 2024 and 2025.
A second NAT flavour gives a load factor and asks for the expected number of
probes, using the open-addressing estimate 1/(1 − alpha) (asked 2024).
Worked example — a real GATE DA 2025 question
Table size
m = 10, hashh(x) = 3x mod 10, collisions resolved by linear probing. Insert1, 4, 5, 6, 14, 15in this order. Where do14and15land?
Compute each home slot, and when it is occupied, step forward +1 (mod 10) until
empty:
h(1) = 3·1 mod 10 = 3 → slot 3 empty ⇒ slot 3
h(4) = 3·4 mod 10 = 2 → slot 2 empty ⇒ slot 2
h(5) = 3·5 mod 10 = 5 → slot 5 empty ⇒ slot 5
h(6) = 3·6 mod 10 = 8 → slot 8 empty ⇒ slot 8
h(14) = 3·14 mod 10 = 2 → slot 2 holds 4 (collision)
→ probe 3 holds 1 (collision)
→ probe 4 empty ⇒ slot 4
h(15) = 3·15 mod 10 = 5 → slot 5 holds 5 (collision)
→ probe 6 empty ⇒ slot 6
So 14 lands in slot 4 and 15 lands in slot 6 — the answer to the 2025
question. The final table:
For the expected-probe variant (2024): an insertion behaves like an
unsuccessful search — it probes until it hits an empty slot — so its expected
probe count is about 1/(1 − alpha). At alpha = 0.5 that is 1/(1 − 0.5) = 2
probes; at alpha = 0.9 it explodes to 10.
Quick check
Quick check
Practice this in an interview
All questionsBinary search halves the search space each iteration to find a target in O(log n). The tricky part is not the idea but the boundary conditions: closed vs. half-open intervals, how to update lo/hi, and when to use lo < hi vs. lo <= hi. One clean template eliminates all the classic bugs.
CPython dicts are open-addressing hash tables. On lookup, Python calls __hash__ on the key to find a slot, then uses __eq__ to confirm the match. A valid dict key must be hashable — immutable by convention — and two objects that compare equal must have the same hash. Hash collisions are resolved by probing, which is why worst-case lookup degrades from O(1) to O(n).
Sort each word's characters to get a canonical key, then bucket words by that key using a hash map. This turns an O(n²) brute-force comparison into a clean O(n · k log k) single pass, where k is the max word length.
Use a variable-size sliding window with a hash map that records the most recent index of each character. When the right pointer hits a character already in the window, jump the left pointer to one past that character's last position — skipping over the repeat in one move rather than crawling one step at a time.