Lesson 01 / 8·8 minFree

What RAG Is — and Why Fine-Tuning Is Usually the Wrong Answer

The mental model that clarifies when to use RAG, fine-tuning, or neither

Two questions come up constantly when builders add AI to their products: "Should I fine-tune the model on my data?" and "Should I use RAG?" They are solving different problems, and confusing them is expensive.

Fine-tuning vs RAG — the real distinction

Use RAG when — Your data changes frequently, you want factual grounding from specific documents, or you need the model to cite its sources
Use fine-tuning when — You need a specific response format or style the base model will not produce, or you need better performance on a narrow, stable task type
Use neither when — A well-engineered system prompt with examples solves the problem — this is the case far more often than people expect

💡

Fine-tuning changes how the model thinks. RAG changes what it can read.

Imagine hiring an expert consultant. Fine-tuning is like sending them to a 6-month retraining programme so they have new skills baked in. RAG is like giving them a briefing pack of relevant documents before each meeting. The briefing pack is faster, cheaper, and can be updated daily. Retraining only makes sense when the task itself requires a different skill set.

What RAG actually does

Ingest phase (offline) — Split documents into chunks → embed each chunk as a vector → store in a vector database
Query phase (real-time) — Embed the user's question → find the most similar stored chunks → include them in the Claude prompt
Generation phase — Claude reads the retrieved chunks and answers the question based on your actual content — not training data

When RAG fails

Retrieval misses the relevant chunk — The answer exists in your docs but the similarity search does not surface it — usually a chunking or embedding model problem
The retrieved chunk is out of context — A chunk that makes sense in isolation fails when it is missing the surrounding document context
The model ignores the retrieved context — This happens when the system prompt is not explicit enough about using the provided context

⚠

RAG does not prevent all hallucinations

If the answer is not in your knowledge base, Claude may still try to answer from general training data. Always instruct it explicitly: "If the answer is not in the provided context, say you do not have that information."

🎯

Try this

Write out three questions a user of your product would ask that require specific knowledge only you have (not public knowledge). These are the exact queries your RAG system needs to answer correctly. Keep this list — lesson 7 uses it to evaluate your retrieval quality.

Vector Embeddings — What They Are and How to Choose a Model