datarekha
LLMs June 8, 2026

Beyond next-token: world models and the next paradigm

World models predict the next state of the world, not the next token — making them simulators agents can plan inside. The two camps racing past LLMs in 2026.

8 min read · by datarekha · aiworld-modelsagentsembodied-aiprediction

Large language models are, at heart, extraordinary next-token predictors. Trained on enough text, predicting the next word turns out to require a startling amount of competence — but it is still, fundamentally, a model of language. A growing camp of researchers argues that the path to the next leap is a model not of words, but of the world itself.

What a world model actually predicts

The difference is in what the model is trained to anticipate. An LLM, given “the cat sat on the,” predicts the next token: “mat.” A world model, given a state of the world and an action, predicts the next state of the world: where the ball rolls if you push it, what the room looks like after you take a step, what happens next if a car brakes. World models are learned representations and simulators that maintain state, predict dynamics, and support counterfactual reasoning for planning and control.

That last part is the prize. If a model can simulate “what happens if I do X,” an agent can plan by imagining consequences before acting — which is exactly what you need for robotics, autonomous driving, and any agent that has to act in a physical or interactive environment, not just chat about one.

LLMnext tokenthecatsaton thematpredictWorldmodelnext statestate taction: push →model simulates physicsstate t+1

Two philosophies fighting it out

There is a genuine intellectual split over how to build one, and it is one of the more interesting debates in AI right now. One path compresses the world to understand it; the other renders the world to predict it:

Why this could matter more than another LLM

The honest framing is that world models are not a replacement for language models so much as a different substrate. Language is humanity’s compressed knowledge; the physical world is the thing language is about. An agent that only ever learned from text knows what people say happens when you drop a glass; an agent with a world model can simulate it. For anything embodied — robots, self-driving, agents acting in real or virtual environments — that difference is the whole game.

It is too early to crown a winner, and plenty of the hype will not survive contact with reality. But the underlying idea is one of the most exciting in AI: moving from models that predict our words to models that predict our world — and can therefore imagine, plan, and act within it. If the transformer was the architecture that defined the language era, the race now is to find the one that defines whatever comes after it.

Skip to content