Python Easy Asked at AmazonAsked at MicrosoftAsked at Stripe

Write a function that returns all duplicate values in a list. What is the optimal time complexity?

For Data Analyst Data Scientist Data Engineer ML Engineer

The short answer

The optimal solution is O(n) using a set to track seen elements and a second set to collect duplicates, avoiding any nested iteration. A Counter-based approach is equally O(n) and often more readable.

How to think about it

The opening line to say out loud is “I want O(n), not O(n²).” The naive move — compare every pair with a nested loop — is quadratic, and interviewers treat that as the thing to avoid, not the answer. The O(n) insight is a hash set: membership checks cost O(1), so a single pass is enough. Two clean patterns both hit O(n); pick by taste — the two-set form is a touch faster, the Counter form reads better and gives you counts for free.

A worked example

from collections import Counter

# Two sets: 'seen' tracks everything, 'dupes' collects repeats — one pass
def find_duplicates_sets(nums):
    seen, dupes = set(), set()
    for n in nums:
        if n in seen:
            dupes.add(n)
        else:
            seen.add(n)
    return sorted(dupes)              # sorted only for stable output

# Counter: most readable, and trivially extends to "with counts"
def find_duplicates_counter(nums):
    return sorted(val for val, cnt in Counter(nums).items() if cnt > 1)

def duplicates_with_counts(nums):
    return {val: cnt for val, cnt in Counter(nums).items() if cnt > 1}

data = [1, 2, 3, 2, 4, 3, 5, 1, 1]
print("Two-set    :", find_duplicates_sets(data))
print("Counter    :", find_duplicates_counter(data))
print("With counts:", duplicates_with_counts(data))

Two-set    : [1, 2, 3]
Counter    : [1, 2, 3]
With counts: {1: 3, 2: 2, 3: 2}

Both functions return the same duplicates — 1, 2, 3 — in a single linear pass. The difference is what they can also tell you: the Counter form hands back exact counts (1 appeared three times) almost for free, which is why it’s the one to reach for when the follow-up asks “and how many times did each repeat?”

Why the two-set form is a touch faster

The two-set loop can act the moment it sees a repeat — no need to finish counting — whereas Counter always completes a full frequency pass before you filter. On huge inputs where duplicates show up early, the two-set version wins on constant factors. Both are O(n) time and O(n) space, though, so either is a fine answer; just don’t reach for sort-then-scan, which is O(n log n) and never optimal here.

Learn it properly Lists

Write a function that returns all duplicate values in a list. What is the optimal time complexity?

A worked example

Why the two-set form is a touch faster

Keep practising

Explore further