Large Language Models, Playable

EP 01

Models don't read words — they read tokens

Before a model sees your text, it gets chopped into tokens — frequent chunks that may be whole words, pieces of words, or punctuation. Type anything below and watch it happen. Notice: common words survive whole, rare words get split.

Simplified subword tokenizer for illustration — real ones (BPE) learn their chunks from data, but split text the same way in spirit.

Why it matters: models are priced, limited and sometimes confused per token. “Unbelievable” costing 3 tokens while “cat” costs 1 is why long rare words eat your context window faster.

EP 02

Watch it guess — a tiny LLM living in this page

This is a real (tiny!) language model trained right here in your browser on a few paragraphs of text. At every step it looks at recent words, computes a probability for each possible next word, and samples one. The bars are its actual internal probabilities — GPT does the same thing with 100,000+ options instead of a handful.

Temperature 0.8

Temperature 0.8 — balanced: mostly picks likely words, occasionally surprises.

Turn the knob: at temperature 0 it always takes the top bar (deterministic, repetitive). Near 2 it treats bad options almost like good ones (creative, then incoherent). That one slider is most of the difference between a boring assistant and a hallucinating one.

EP 03

Why context is everything

Same machine, one change: how many recent words it's allowed to see before guessing. Watch the predictions for the sentence below sharpen as you give the model more memory. This is the intuition behind “context windows” — and why models with amnesia ramble.

Model sees last

The pattern: blind → grammar soup. One word → plausible phrases. Two words → it locks onto the sentence. Real LLMs push this from 2 words to hundreds of thousands of tokens — that leap, plus scale, is the whole revolution.