What RAG Is — and Why Fine-Tuning Is Usually the Wrong Answer
The mental model that clarifies when to use RAG, fine-tuning, or neither
Two questions come up constantly when builders add AI to their products: "Should I fine-tune the model on my data?" and "Should I use RAG?" They are solving different problems, and confusing them is expensive.
Fine-tuning vs RAG — the real distinction
- Use RAG when — Your data changes frequently, you want factual grounding from specific documents, or you need the model to cite its sources
- Use fine-tuning when — You need a specific response format or style the base model will not produce, or you need better performance on a narrow, stable task type
- Use neither when — A well-engineered system prompt with examples solves the problem — this is the case far more often than people expect
Fine-tuning changes how the model thinks. RAG changes what it can read.
Imagine hiring an expert consultant. Fine-tuning is like sending them to a 6-month retraining programme so they have new skills baked in. RAG is like giving them a briefing pack of relevant documents before each meeting. The briefing pack is faster, cheaper, and can be updated daily. Retraining only makes sense when the task itself requires a different skill set.
What RAG actually does
- Ingest phase (offline) — Split documents into chunks → embed each chunk as a vector → store in a vector database
- Query phase (real-time) — Embed the user's question → find the most similar stored chunks → include them in the Claude prompt
- Generation phase — Claude reads the retrieved chunks and answers the question based on your actual content — not training data
When RAG fails
- Retrieval misses the relevant chunk — The answer exists in your docs but the similarity search does not surface it — usually a chunking or embedding model problem
- The retrieved chunk is out of context — A chunk that makes sense in isolation fails when it is missing the surrounding document context
- The model ignores the retrieved context — This happens when the system prompt is not explicit enough about using the provided context
RAG does not prevent all hallucinations
If the answer is not in your knowledge base, Claude may still try to answer from general training data. Always instruct it explicitly: "If the answer is not in the provided context, say you do not have that information."
Try this
Write out three questions a user of your product would ask that require specific knowledge only you have (not public knowledge). These are the exact queries your RAG system needs to answer correctly. Keep this list — lesson 7 uses it to evaluate your retrieval quality.