NLP & LLMs Easy Asked at OpenAIAsked at AnthropicAsked at Google

What are tokens in an LLM and why is API pricing per token rather than per word or character?

For AI / LLM Engineer ML Engineer Data Scientist

The short answer

A token is the smallest unit a language model processes — typically a word, sub-word fragment, or punctuation mark produced by a byte-pair encoding (BPE) or similar algorithm. Pricing is per token because each token requires one forward-pass position in the attention matrix, directly driving compute and memory cost regardless of whether it maps to a full word or a single letter.

How to think about it

Text is continuous; neural networks need discrete, fixed-size inputs. Tokenization bridges the gap by splitting text into a vocabulary of sub-word units and mapping each to an integer index.

How tokenization works (BPE)

Byte-pair encoding (BPE) and its variants (WordPiece, SentencePiece, tiktoken) start from individual characters and iteratively merge the most frequent adjacent pair into a single token. The vocabulary is trained on a large corpus. Common English words like “the” or “cat” become single tokens; rarer or composite words are split — “unbelievable” might encode as [“un”, “believ”, “able”].

Rules of thumb

~1 token ≈ 4 characters of English prose.
1000 tokens ≈ 750 words.
Code and non-English text are generally less efficient (more tokens per word) because they were underrepresented in vocabulary training.
Numbers are tokenized digit-by-digit in many tokenizers, making arithmetic expensive.

Why pricing is per token, not per word

Every token occupies one position in the attention computation. A sequence of N tokens requires N² attention operations (standard attention) and N KV-cache entries. The model’s cost — measured in FLOPs and memory bandwidth — scales with token count, not word count. Charging per token aligns the price with the actual resource usage on the provider’s hardware.

import tiktoken

enc = tiktoken.get_encoding("cl100k_base")
tokens = enc.encode("Retrieval-augmented generation reduces hallucination.")
print(len(tokens))   # 8
print(tokens)        # [3880, 30079, 12, 2, 51214, 9659, 17990, 13]

Practical implications

Action	Token impact
Long system prompts	Charged on every API call, not once
Verbose few-shot examples	Multiply cost by number of examples
Whitespace padding	Every space may be its own token
JSON output	Brackets and quotes add tokens; compressed format saves cost