How LLMs Actually Work
A mental model for why AI gives better answers when you communicate more clearly
Large Language Models (LLMs) like Claude, ChatGPT, and Gemini are trained to predict the most likely next word given all the words before it. That sounds simple โ and the implication is profound: the model produces the most probable response given your prompt, not necessarily the correct or most useful one.
The autocomplete analogy
An LLM is incredibly sophisticated autocomplete
Your phone autocomplete predicts your next word from your history and common phrases. LLMs do the same thing at a scale of hundreds of billions of words from the internet โ they produce the most statistically likely continuation of your text. This is why they sound fluent and confident even when wrong: fluent, confident text is common on the internet.
What this means for prompting
- Context shapes the output โ The model continues your text โ setting the right context tells it what kind of text to produce
- Vague prompts get vague answers โ "Write a summary" could mean 3 sentences or 3 pages โ the model guesses from context
- Models do not have opinions โ Asking "what should I do?" gets you the most common advice from training data โ not independent reasoning
- The model has no memory between sessions โ Every new conversation starts blank โ the model only knows what is in the current prompt window
Tokens โ the currency of context
Models do not process words โ they process tokens. A token is roughly 3-4 characters or 0.75 words. A standard model context of 100,000 tokens is about 75,000 words โ roughly a novel. Claude 3.7 Sonnet has a 200,000-token context window. This limit is why long documents sometimes get confused โ the model may not be able to hold the full document and your question simultaneously.