datarekha

What are .pyc files and what role does Python bytecode play?

The short answer

When Python imports a module, it compiles the source to platform-independent bytecode and caches it in a .pyc file inside __pycache__. On subsequent imports the cached bytecode is loaded directly if the source is unchanged, skipping the parse-and-compile step. Bytecode is not machine code — it is still interpreted by the CPython virtual machine.

How to think about it

This question is testing whether you understand what happens between writing .py source code and the CPU actually running it. The answer has two layers: the compilation step that produces bytecode, and the caching step that avoids redoing that work on every import.

The compilation pipeline

Every time CPython imports a module it runs your source through four stages:

source.py


Tokeniser  →  tokens (keywords, names, literals, operators)


Parser     →  AST (Abstract Syntax Tree)


Compiler   →  bytecode (code object)


CPython VM →  executes instructions

The result of the compiler step is a marshalled code object saved to __pycache__/source.cpython-312.pyc. The filename encodes the interpreter version so stale caches from different Python releases never collide.

What bytecode actually looks like

You can inspect it with dis:

import dis

def add(a, b):
    return a + b

dis.dis(add)
# RESUME          0
# LOAD_FAST       0 (a)
# LOAD_FAST       1 (b)
# BINARY_OP       0 (+)
# RETURN_VALUE

These are stack-machine instructions interpreted by the CPython VM — not native machine code. That is why Python is slower than C for tight loops but fast enough for most real work.

What is inside a .pyc file

A .pyc file is a small binary header followed by the marshalled code object:

Bytes 0–3   : magic number  (changes each CPython release that changes bytecode)
Bytes 4–7   : bit field     (0 = timestamp-based, 1 = hash-based invalidation)
Bytes 8–15  : source mtime + source size  (or hash if hash-based)
Bytes 16+   : marshalled code object

The magic number prevents a Python 3.11 interpreter from loading a 3.12 .pyc by mistake.

Invalidation

By default CPython regenerates the .pyc when the source file’s mtime or size changes. You can opt into content-hash-based invalidation with --check-hash-based-pycs always or by creating hash-based pycs via py_compile.

Deployment without source

You can ship only .pyc files and the interpreter will run them. Pre-compile a whole directory with:

python -m compileall src/

This is not a security measure — bytecode is trivially decompiled — but it does prevent accidental edits on production hosts.

Keep practising

All Python questions

Explore further

Skip to content