datarekha
Python Easy Asked at AmazonAsked at GoogleAsked at MetaAsked at Microsoft

Given a new data problem, how do you decide whether to use a list, dict, or set?

The short answer

Choose a list when order matters and you need indexed access or duplicates. Choose a dict when you need to map keys to values and look up by key in O(1). Choose a set when you need uniqueness, fast membership testing, or set-algebra operations. Getting this choice wrong usually means either incorrect results (keeping duplicates when you needed uniqueness) or avoidable O(n) lookups.

How to think about it

How a strong candidate frames this

This is a design question, not a syntax question. The interviewer wants to hear you reason about access patterns — how the data will be used — not just list the features of each type. Start with the question “what do I need to do with this data?” and let the answer drive the choice.

Decision criteria

QuestionAnswerUse
Do I need order or positional access?Yeslist
Do I need duplicates?Yeslist
Do I need to look up by a key?Yesdict
Do I need to associate extra data with each key?Yesdict
Do I only need membership, uniqueness, or set algebra?Yesset
Do I need iteration in insertion order?Yeslist or dict (ordered from 3.7+)

Concrete examples with the reasoning

# list — ordered pipeline of records to process sequentially
events = [{"ts": 1, "action": "click"}, {"ts": 2, "action": "buy"}]

# dict — O(1) lookup: given a user_id, get their profile
user_profiles = {user["id"]: user for user in raw_users}
profile = user_profiles[42]   # O(1) regardless of how many users

# set — deduplicate & test membership at O(1)
seen_ids = set()
for record in stream:
    if record["id"] in seen_ids:   # O(1) — far faster than `in list`
        continue
    seen_ids.add(record["id"])
    process(record)

Interactive demo — see the O(1) vs O(n) difference

Replace list membership checks with a set

This is the single most impactful swap in data pipelines:

# O(n) per check — slow for large lists
valid_codes = ["USD", "EUR", "GBP", "JPY"]
if code in valid_codes:   # linear scan every time
    ...

# O(1) per check — hash lookup
valid_codes = {"USD", "EUR", "GBP", "JPY"}
if code in valid_codes:
    ...

When you need both lookup and order

Use a dict (Python 3.7+ dicts preserve insertion order) or collections.OrderedDict for clarity. If you need sorted key access, sortedcontainers.SortedDict provides O(log n) operations.

Learn it properly Dictionaries

Keep practising

All Python questions

Explore further

Skip to content