Bubble, Insertion & Selection Sort
The three elementary O(n²) sorts every programmer should understand — they teach the cost model that underpins every more efficient algorithm you'll ever use.
What you'll learn
- The mental model for each of the three elementary sorts in one sentence
- Why all three are O(n²) in comparisons yet differ sharply in swap count
- When insertion sort beats its O(n²) label — and why it lives inside Timsort
- How to instrument a sort with a comparison counter to measure cost directly
Before you start
Pick up a hand of playing cards and sort them without thinking about it. You probably pluck each new card from the right, scan left until you find where it belongs, and slide it in. That is insertion sort — the most natural sorting algorithm humans use, and one of the three O(n²) classics worth knowing cold.
None of these algorithms will appear in your production code for sorting large data. But they are the clearest possible demonstration of the cost model: how many comparisons does it take to find the right order, and how many swaps does it take to achieve it? Every efficient sort you ever encounter — merge sort, quicksort, Timsort — is designed to reduce those two numbers.
The mental model for each
Bubble sort — repeatedly walk the array and swap any adjacent pair that is out of order. Each full pass “bubbles” the largest unsorted element to its final position. One pass = one element settled.
Insertion sort — treat the left portion as a growing sorted hand. For each new element, walk left and swap it down until it finds its place. The hand stays sorted at all times.
Selection sort — find the smallest unsorted element, then swap it to the front. One scan = one element settled, but always with exactly one swap per pass (even when nothing needs moving, a no-op swap still costs a comparison scan).
The race makes the difference concrete. Insertion sort accumulates swaps only when elements are far from their final position; on a nearly-sorted array it barely moves anything. Selection sort’s swap count is always at most n−1, but its comparison count is identical to bubble sort. Bubble sort is the worst of both worlds — many comparisons and many swaps — yet it is the most commonly taught because the code is two nested loops and nothing else.
Why they are all O(n²)
Each algorithm has an outer loop that runs n times and an inner loop that, on average, also runs proportional to n times. The comparison count grows as the area of a triangle under the line n — roughly n²/2 comparisons for n elements.
| Algorithm | Comparisons (worst) | Swaps (worst) | Best case |
|---|---|---|---|
| Bubble | O(n²) | O(n²) | O(n) — already sorted |
| Insertion | O(n²) | O(n²) | O(n) — already sorted |
| Selection | O(n²) | O(n) | O(n²) — always scans fully |
The big-O class is the same for all three. The constant matters in practice, which is why insertion sort is the one that survives in real-world code while bubble and selection are teaching tools.
Live code: insertion sort with a counter
Run this to see the comparison count printed alongside the sorted result. The counter makes the cost visible rather than theoretical.
Notice the comparison counts: 11 for the already-sorted case (one per element, each breaks immediately), 66 for the reverse-sorted case (1 + 2 + … + 11). That is the O(n) best case vs. the O(n²) worst case in action for n = 12.
You rarely write these — but you read their fingerprint everywhere
When you call sorted() on a 10-element list, Timsort is invoking insertion sort internally. When you see a nested loop scanning an array, your first instinct should be “that is probably O(n²) — is there a sort or a hash map that eliminates the inner loop?” That instinct is built by understanding why these three algorithms look the way they do.
The broader lesson: every algorithm has a cost model — the dominant operation (comparisons, memory reads, cache misses) and how many times it executes. Elementary sorts make that model transparent because there is nowhere to hide.
Quick check
Quick check
Practice this in an interview
All questionsBecause the array is sorted, you can place one pointer at the start and one at the end, then squeeze them inward. If the sum is too big, move the right pointer left; if too small, move the left pointer right. This converges in one pass with O(1) extra space.
Even after rotation, one of the two halves around mid is always fully sorted. Check which half is sorted, then decide whether the target falls inside it. If yes, narrow to that half; if no, search the other. This keeps binary search's O(log n) guarantee.
Binary search halves the search space each iteration to find a target in O(log n). The tricky part is not the idea but the boundary conditions: closed vs. half-open intervals, how to update lo/hi, and when to use lo < hi vs. lo <= hi. One clean template eliminates all the classic bugs.
Maintain a min-heap of size k. Stream every element through: push it onto the heap, then if the heap exceeds size k, pop the minimum. After processing all elements, the heap's minimum is the kth largest — it is the smallest among the top-k values seen so far.