What set operations does Python support, and where are they practically useful in data work?

For Data Analyst Data Scientist Data Engineer ML Engineer

The short answer

Python sets support union, intersection, difference, and symmetric difference as both operators and methods, all running in O(min(m,n)) to O(m+n) time. They are useful for deduplication, membership testing in large collections, and computing overlaps between datasets — operations that would be expensive with lists.

How to think about it

Set operations are a one-liner answer to a whole family of data questions: what do two user cohorts share, which records sit in one dataset but not another, which IDs leaked from the training set into the test set. The syntax is tight and the performance is genuinely good — O(1) membership testing makes a set dramatically faster than a list for anything that asks “is this in there?” over and over.

A worked example

The four core operations, then the cohort-analysis pattern they’re made for:

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

print("union        (a | b):", a | b)          # everything in either
print("intersection (a & b):", a & b)          # in both
print("difference   (a - b):", a - b)          # in a, not b
print("symm diff    (a ^ b):", a ^ b)          # in exactly one

# Churn analysis falls straight out of set algebra
last_month = {"alice", "bob", "carol", "dave"}
this_month = {"bob", "carol", "eve", "frank"}

print("Returning:", sorted(last_month & this_month))   # stayed
print("Churned  :", sorted(last_month - this_month))   # left
print("New      :", sorted(this_month - last_month))   # joined
print("All ever :", sorted(last_month | this_month))

union        (a | b): {1, 2, 3, 4, 5, 6}
intersection (a & b): {3, 4}
difference   (a - b): {1, 2}
symm diff    (a ^ b): {1, 2, 5, 6}
Returning: ['bob', 'carol']
Churned  : ['alice', 'dave']
New      : ['eve', 'frank']
All ever : ['alice', 'bob', 'carol', 'dave', 'eve', 'frank']

Four lines of set algebra answer the four questions a churn report exists to answer. (The cohort results are wrapped in sorted() only so the output is stable to read — a set keeps no order of its own.)

Operators vs methods

The operator forms (|, &, -, ^) read best, but they demand that both sides already be sets. The method forms (union, intersection, …) accept any iterable, which is handier when one side is still a list:

a.union([5, 6])     # {1, 2, 3, 4, 5, 6} — a list argument is fine
a | [5, 6]          # TypeError — the operator needs a set on the right

Subset and superset checks

{1, 2}    <= {1, 2, 3}    # True  — subset (or equal)
{1, 2}    <  {1, 2, 3}    # True  — proper subset (strictly smaller)
{1, 2, 3} >= {1, 2}       # True  — superset

These read beautifully for permission checks, feature-flag validation, or confirming a DataFrame has every required column.

Learn it properly Sets

What set operations does Python support, and where are they practically useful in data work?

A worked example

Operators vs methods

Subset and superset checks

Keep practising

Explore further