Imagination — rehearsing futures that never happen
This agent (●) wants the goal (★). Before moving a muscle, it runs candidate futures through its internal model — the faint ghost paths are literally its imagination. Click the grid to add/remove walls and watch it re-dream instantly. Then let it act.
Ghost trails = imagined rollouts (brighter = judged better by the model). Solid trail = the one action sequence that actually gets executed.
Surprise — when reality disagrees with the dream
The agent's model predicts this ball's flight — the dotted line. Now sabotage it: switch on a hidden wind the model doesn't know about. Prediction and reality split apart, and the gap between them — prediction error — is the red meter. That error signal is precisely what the model learns from.
After a windy flight, click “Update model” — the model absorbs the error, and its next prediction accounts for wind.
The race — trial-and-error vs. thinking ahead
Two agents, identical maze, same goal. Gray is model-free: it only learns by bumping into things, step after costly step. Red carries a world model: it plans the route internally first, then walks it. Count the steps.
Model-free agent uses random exploration with wall-memory (a crude Q-learner's childhood). Planner runs breadth-first search inside its model, then executes.