Type hints are documentation that can't go stale
A docstring lies without consequence, but a type hint that disagrees with the code is caught by your editor before you even save the file.
A senior engineer at a mid-size fintech pushed a refactor in late 2024. She renamed a function parameter — amount to amount_cents — to make the unit explicit. She updated the implementation, updated the call sites she could find, and updated the docstring. Three weeks later, production threw a TypeError at 2 AM from a call site she missed. The docstring had said amount_cents for three weeks. The code at that call site still passed a float in dollars. Nobody noticed because the docstring is read by humans, not machines.
Type hints would have caught this in under a second. That is the entire argument, really. Everything else is elaboration.
What type hints are and aren’t
A Python type hint (introduced in PEP 484, shipping in 3.5, maturing through 3.9-3.12) is an annotation on a variable, parameter, or return value. It looks like def compute_tax(income: float, rate: float) -> float:. The interpreter reads it, stores it in __annotations__, and then proceeds to ignore it at runtime entirely. Pass a string where a float is expected and Python will not complain — at least not until the string hits an operation that fails.
This trips people up. They expect enforcement and find none. They conclude the feature is cosmetic. They are wrong about what the feature is for.
Type hints are a specification language. Their target audience is not the Python runtime — it is the tooling layer sitting on top: type checkers like mypy and pyright (the engine behind Pylance in VS Code), editors doing autocomplete, documentation generators, and other automated systems. The hints are checked statically, meaning before the program ever runs, by tools that read the source code and reason about what types flow through it.
This is a different contract from a docstring. A docstring is prose. It can say anything. It has no enforcer. The most careful developer cannot guarantee that someone who renames a parameter also remembers to update every docstring that mentions it. Static type checking closes that loop mechanically.
The docstring trap
Here is what drift looks like in practice.
A function ships with a clear docstring: “Takes user_id (int) and days_back (int), returns a list of transaction dicts.” Six months later, the return type changes to a list of Transaction dataclass instances because the team decided to stop passing raw dicts around. The implementation is updated. The tests are updated. The docstring is not — either because the developer forgot, or because updating prose feels low priority, or because nobody had a failing test to remind them.
Now every new developer who reads that function signature gets a lie. The editor’s hover card shows the old docstring. The function actually returns dataclass instances. This is the rot. It is invisible until someone relies on the wrong information.
Type hints don’t rot this way. If the function signature says -> list[Transaction] and someone reads -> and sees Transaction, they know the type checker has verified that the actual return statements in the function body produce values that satisfy Transaction. If the implementation were changed to return raw dicts without updating the annotation, mypy would raise an error. The specification and the implementation are checked for consistency by a machine, on every run, with no human memory required.
Gradual typing: you don’t have to do it all at once
The designers of Python’s type system made a pragmatic choice. They did not require you to annotate everything. A function with no annotations is simply not checked — mypy treats it as Any, the escape hatch that means “I have no information about this type.” This lets you adopt type hints incrementally, which is the only realistic path in an existing codebase.
The idiom is to start at the edges. A codebase has natural seams: the public API of a module, the boundary where one team’s code hands data to another team’s, the inputs and outputs of a pipeline stage. These are the highest-leverage annotation points because they are where misunderstandings between callers and callees occur. Annotating the insides of private helper functions matters less than annotating the surfaces that face outward.
As type coverage grows, the checker gets more power. It can now trace type information from an annotated function through to its callers, catch incompatible assignments further downstream, and flag operations that only make sense on one type but not another. The return is superlinear: each annotation you add narrows the ambiguity for all code that touches it.
The friction is real but bounded. Writing -> dict[str, list[int]] takes a few seconds. Writing -> dict[str, list[Transaction]] requires knowing what Transaction is. Importing it into the type annotation requires thinking about circular imports, which occasionally requires from __future__ import annotations or a TYPE_CHECKING guard. These are solvable problems, not deep ones. The ceremony is on the order of minutes per function, not hours.
Where the payoff concentrates
Not all code benefits equally. Three areas have dramatically higher signal-to-noise ratios.
Function boundaries. The signature of a function — its parameter types and return type — is a contract between the writer and every caller. This is the most valuable place to annotate because it is read most often and because violations here propagate. A bug in a private helper that you call once is contained. A bug in a public function that fifty callers depend on is a systemic issue. Type hints at function boundaries give the type checker the information it needs to trace types from call sites inward and from return values outward.
Data models. Anywhere you represent structured data — configuration objects, API request/response shapes, database rows, domain entities — type hints pay out disproportionately. The modern idiom is dataclass or pydantic.BaseModel. A dataclass with annotated fields gets you free __init__ generation and makes the shape of your data explicit. A Pydantic model goes further: it parses and validates incoming data, using the type annotations as the schema. Here the annotations do double duty — they are checked statically and they drive runtime behavior. That is unusual; usually type hints are purely static. Pydantic is the exception that proves why the annotations are worth having at all.
Optional values. One of the most insidious bug classes in Python is the unexpected None. A function that says it returns a User but actually returns None in some edge case will blow up wherever the caller does user.email without checking. The type hint -> User | None (or -> Optional[User] in older syntax) forces the caller to handle both cases if the type checker is running in strict mode. It turns a silent runtime crash into a compile-time (or rather, type-check-time) error.
The narrowing trick
One pattern that feels almost magical once you internalize it is type narrowing. When you write:
def process(value: str | None) -> str:
if value is None:
return "default"
return value.upper()
the type checker understands that inside the if branch, value is None, and after the early return, value must be str. It narrows the type based on the control flow. This means you get errors precisely where they matter: if you tried to call value.upper() before the None check, mypy would flag it. The checker is not just verifying annotations — it is reasoning about your code’s logic.
This reasoning extends to isinstance checks, assert statements, and assignment expressions. The more your code uses explicit control flow, the more the checker can infer. Cryptic, clever code that collapses many branches into one expression is not just hard for humans to read — it is hard for the checker to reason about. Type hints reward clarity with verification power.
The cost is honest
Type hints are not free. Annotations add characters. Complex generics add cognitive load. Callable[[str, int], list[float]] is not more readable than no annotation at all if you encounter it without context. Deep generic nesting can grow into something that looks more like a type algebra problem than a function signature. Some libraries have incomplete or incorrect stubs (the .pyi files that tell the type checker about a library’s API), which produces false positives that require # type: ignore comments — noise that has to be managed.
The right way to think about this cost is as an investment in a class of correctness you would otherwise not have at all. The alternative is not zero cost — the alternative is runtime surprises, incorrect docstrings, onboarding friction for new engineers who cannot trust what they read, and a growing cognitive load as the codebase ages. Docstrings are maintained by discipline alone; type hints are maintained by the checker, automatically, on every run.
The people who benefit most are not the original authors. The original author knows what type each function expects because they just wrote it. The people who benefit are the future maintainers — including you, six months from now, when you have forgotten what _process_batch returns and you need to call it from a new context. You hover over the function in your editor. You see the signature. The editor autocompletes the return type’s attributes. You do not have to open the implementation.
The posture
The productive stance is not “I will type-hint this codebase completely before anything else ships.” That way lies annotation paralysis. The productive stance is: annotate the surfaces that face outward, annotate the data models, and let mypy run in CI (continuous integration, the automated checks that run on every code change) at a level of strictness that matches your team’s current discipline.
mypy --strict on a codebase that has never seen type hints is a wall of errors. Start with mypy --ignore-missing-imports, which suppresses complaints about untyped third-party libraries. Add one annotated module at a time. Turn on --disallow-untyped-defs when you are ready to require annotations on new functions. Ratchet up slowly.
The goal is not 100% coverage as an end in itself. The goal is that the parts of your codebase that matter most are checked, and that the checker is running in CI so that drift is caught before it merges. At that point, your type hints are documentation that is enforced by a machine on every pull request. No docstring has ever been able to claim that.