World Models, Playable

EP 01

Imagination — rehearsing futures that never happen

This agent (●) wants the goal (★). Before moving a muscle, it runs candidate futures through its internal model — the faint ghost paths are literally its imagination. Click the grid to add/remove walls and watch it re-dream instantly. Then let it act.

Ghost trails = imagined rollouts (brighter = judged better by the model). Solid trail = the one action sequence that actually gets executed.

The point: the agent “experienced” dozens of futures and paid for none of them. Imagination is cheap; reality is expensive. That asymmetry is the entire business case for world models.

EP 02

Surprise — when reality disagrees with the dream

The agent's model predicts this ball's flight — the dotted line. Now sabotage it: switch on a hidden wind the model doesn't know about. Prediction and reality split apart, and the gap between them — prediction error — is the red meter. That error signal is precisely what the model learns from.

model believes: no wind

After a windy flight, click “Update model” — the model absorbs the error, and its next prediction accounts for wind.

Surprise is the teacher. A world model isn't trained by being told the truth — it's trained by being wrong, measuring exactly how wrong, and adjusting. Babies do this with gravity; robots do it with gradient descent.

EP 03

The race — trial-and-error vs. thinking ahead

Two agents, identical maze, same goal. Gray is model-free: it only learns by bumping into things, step after costly step. Red carries a world model: it plans the route internally first, then walks it. Count the steps.

Model-free agent uses random exploration with wall-memory (a crude Q-learner's childhood). Planner runs breadth-first search inside its model, then executes.

Why this matters now: LLM agents that “think step by step” are inching toward this — simulating consequences in text before acting. The bet behind world-model research (LeCun, DeepMind, and others) is that real planning needs a real internal simulator, not just next-word reflexes.