What is the difference between a shallow copy and a deep copy, and when does it matter?
A shallow copy creates a new container but populates it with references to the same inner objects. A deep copy creates a new container and recursively copies every nested object. The difference only matters when the data structure contains mutable nested objects — for flat structures of immutables, shallow copy is sufficient and faster.
How to think about it
The question is really about Python’s reference model. A copy operation creates a new container, but how deep does it go? Shallow copy stops at the first level — it duplicates the container but shares the contents. Deep copy recurses all the way down, making every nested object independent.
The distinction only bites you when nested objects are mutable. If your list contains only ints or strings, shallow copy is perfectly safe and faster. If it contains lists, dicts, or class instances, a shallow copy gives you a false sense of independence.
Run the three scenarios
The mental model
original = [[1, 2], [3, 4]]
After shallow = copy.copy(original):
original ──► [ ref_A, ref_B ]
shallow ──► [ ref_A, ref_B ] ← different list, same inner refs
After deep = copy.deepcopy(original):
original ──► [ ref_A, ref_B ]
deep ──► [ ref_A2, ref_B2 ] ← entirely separate objects
When to use which
| Scenario | Use |
|---|---|
| Flat list of ints / strings / tuples | shallow copy — fast, safe |
| Nested mutable structure you will modify | deep copy |
| DataFrame (pandas) | df.copy() — always deep by default |
| Object graph with circular references | deepcopy handles them; manual recursion does not |
deepcopy is noticeably slower on large nested structures because it traverses every object and maintains a memo dict to handle shared references and cycles.