Models don't read words — they read tokens
Before a model sees your text, it gets chopped into tokens — frequent chunks that may be whole words, pieces of words, or punctuation. Type anything below and watch it happen. Notice: common words survive whole, rare words get split.
Simplified subword tokenizer for illustration — real ones (BPE) learn their chunks from data, but split text the same way in spirit.
Watch it guess — a tiny LLM living in this page
This is a real (tiny!) language model trained right here in your browser on a few paragraphs of text. At every step it looks at recent words, computes a probability for each possible next word, and samples one. The bars are its actual internal probabilities — GPT does the same thing with 100,000+ options instead of a handful.
Temperature 0.8 — balanced: mostly picks likely words, occasionally surprises.
Why context is everything
Same machine, one change: how many recent words it's allowed to see before guessing. Watch the predictions for the sentence below sharpen as you give the model more memory. This is the intuition behind “context windows” — and why models with amnesia ramble.