Given a new data problem, how do you decide whether to use a list, dict, or set?
Choose a list when order matters and you need indexed access or duplicates. Choose a dict when you need to map keys to values and look up by key in O(1). Choose a set when you need uniqueness, fast membership testing, or set-algebra operations. Getting this choice wrong usually means either incorrect results (keeping duplicates when you needed uniqueness) or avoidable O(n) lookups.
How to think about it
How a strong candidate frames this
This is a design question, not a syntax question. The interviewer wants to hear you reason about access patterns — how the data will be used — not just list the features of each type. Start with the question “what do I need to do with this data?” and let the answer drive the choice.
Decision criteria
| Question | Answer | Use |
|---|---|---|
| Do I need order or positional access? | Yes | list |
| Do I need duplicates? | Yes | list |
| Do I need to look up by a key? | Yes | dict |
| Do I need to associate extra data with each key? | Yes | dict |
| Do I only need membership, uniqueness, or set algebra? | Yes | set |
| Do I need iteration in insertion order? | Yes | list or dict (ordered from 3.7+) |
Concrete examples with the reasoning
# list — ordered pipeline of records to process sequentially
events = [{"ts": 1, "action": "click"}, {"ts": 2, "action": "buy"}]
# dict — O(1) lookup: given a user_id, get their profile
user_profiles = {user["id"]: user for user in raw_users}
profile = user_profiles[42] # O(1) regardless of how many users
# set — deduplicate & test membership at O(1)
seen_ids = set()
for record in stream:
if record["id"] in seen_ids: # O(1) — far faster than `in list`
continue
seen_ids.add(record["id"])
process(record)
Interactive demo — see the O(1) vs O(n) difference
Replace list membership checks with a set
This is the single most impactful swap in data pipelines:
# O(n) per check — slow for large lists
valid_codes = ["USD", "EUR", "GBP", "JPY"]
if code in valid_codes: # linear scan every time
...
# O(1) per check — hash lookup
valid_codes = {"USD", "EUR", "GBP", "JPY"}
if code in valid_codes:
...
When you need both lookup and order
Use a dict (Python 3.7+ dicts preserve insertion order) or collections.OrderedDict for clarity. If you need sorted key access, sortedcontainers.SortedDict provides O(log n) operations.