datarekha
NLP & LLMs Easy Asked at OpenAIAsked at AnthropicAsked at Google

What are tokens in an LLM and why is API pricing per token rather than per word or character?

The short answer

A token is the smallest unit a language model processes — typically a word, sub-word fragment, or punctuation mark produced by a byte-pair encoding (BPE) or similar algorithm. Pricing is per token because each token requires one forward-pass position in the attention matrix, directly driving compute and memory cost regardless of whether it maps to a full word or a single letter.

How to think about it

Text is continuous; neural networks need discrete, fixed-size inputs. Tokenization bridges the gap by splitting text into a vocabulary of sub-word units and mapping each to an integer index.

How tokenization works (BPE)

Byte-pair encoding (BPE) and its variants (WordPiece, SentencePiece, tiktoken) start from individual characters and iteratively merge the most frequent adjacent pair into a single token. The vocabulary is trained on a large corpus. Common English words like “the” or “cat” become single tokens; rarer or composite words are split — “unbelievable” might encode as [“un”, “believ”, “able”].

Rules of thumb

  • ~1 token ≈ 4 characters of English prose.
  • 1000 tokens ≈ 750 words.
  • Code and non-English text are generally less efficient (more tokens per word) because they were underrepresented in vocabulary training.
  • Numbers are tokenized digit-by-digit in many tokenizers, making arithmetic expensive.

Why pricing is per token, not per word

Every token occupies one position in the attention computation. A sequence of N tokens requires N² attention operations (standard attention) and N KV-cache entries. The model’s cost — measured in FLOPs and memory bandwidth — scales with token count, not word count. Charging per token aligns the price with the actual resource usage on the provider’s hardware.

import tiktoken

enc = tiktoken.get_encoding("cl100k_base")
tokens = enc.encode("Retrieval-augmented generation reduces hallucination.")
print(len(tokens))   # 8
print(tokens)        # [3880, 30079, 12, 2, 51214, 9659, 17990, 13]

Practical implications

ActionToken impact
Long system promptsCharged on every API call, not once
Verbose few-shot examplesMultiply cost by number of examples
Whitespace paddingEvery space may be its own token
JSON outputBrackets and quotes add tokens; compressed format saves cost

Keep practising

All NLP & LLMs questions

Explore further

Skip to content