Lesson 01 / 8·7 minFree

How LLMs Actually Work

A mental model for why AI gives better answers when you communicate more clearly

Written by the RadarTrek editorial team · June 2026

Large Language Models (LLMs) like Claude, ChatGPT, and Gemini are trained to predict the most likely next word given all the words before it. That sounds simple — and the implication is profound: the model produces the most probable response given your prompt, not necessarily the correct or most useful one.

The autocomplete analogy

💡

An LLM is incredibly sophisticated autocomplete

Your phone autocomplete predicts your next word from your history and common phrases. LLMs do the same thing at a scale of hundreds of billions of words from the internet — they produce the most statistically likely continuation of your text. This is why they sound fluent and confident even when wrong: fluent, confident text is common on the internet.

What this means for prompting

Context shapes the output — The model continues your text — setting the right context tells it what kind of text to produce
Vague prompts get vague answers — "Write a summary" could mean 3 sentences or 3 pages — the model guesses from context
Models do not have opinions — Asking "what should I do?" gets you the most common advice from training data — not independent reasoning
The model has no memory between sessions — Every new conversation starts blank — the model only knows what is in the current prompt window

Tokens — the currency of context

Models do not process words — they process tokens. A token is roughly 3-4 characters or 0.75 words. A standard model context of 100,000 tokens is about 75,000 words — roughly a novel. Claude 3.7 Sonnet has a 200,000-token context window. This limit is why long documents sometimes get confused — the model may not be able to hold the full document and your question simultaneously.

Anatomy of a Good Prompt