What set operations does Python support, and where are they practically useful in data work?
Python sets support union, intersection, difference, and symmetric difference as both operators and methods, all running in O(min(m,n)) to O(m+n) time. They are useful for deduplication, membership testing in large collections, and computing overlaps between datasets — operations that would be expensive with lists.
How to think about it
Set operations are a one-liner solution to a whole class of data problems: find what two user cohorts have in common, what records exist in one dataset but not another, which IDs appear in both a training set and a test set. The syntax is concise and the performance is genuinely good — O(1) membership testing makes sets dramatically faster than lists for anything that involves repeated lookups.
All the operations with real data patterns
Operators vs methods — which to prefer
The operator syntax (|, &, -, ^) is concise and readable but requires both operands to be sets. The method equivalents (union, intersection, etc.) accept any iterable, which makes them more flexible when one side hasn’t been converted yet:
# This works even though [5, 6] is a list, not a set
a.union([5, 6]) # {1, 2, 3, 4, 5, 6}
a | [5, 6] # TypeError
Subset and superset checks
{1, 2} <= {1, 2, 3} # True — subset (or equal)
{1, 2} < {1, 2, 3} # True — proper subset (strictly smaller)
{1, 2, 3} >= {1, 2} # True — superset
These are handy for feature flag validation, permission checking, or confirming that a required set of columns is present in a DataFrame.