Branch Module · Deep Dive

MCP & RAG

Give your agent a brain and real-world connections. RAG lets it search your documents. MCP lets it use tools. This is how serious AI is built.

75 min Free Advanced

Why context windows aren't enough

Every AI model has a context window — a limit on how much text it can hold in "working memory" at once. GPT-4 can hold roughly 128,000 tokens. Claude can hold 200,000. Sounds huge until you realise a single company's document library is millions of tokens. A café with three years of supplier emails has more context than any model can hold.

So the simple question "what did we pay for coffee beans last October?" is unanswerable if you just paste the AI into your inbox. The context overflows. The AI can't see it all.

RAG and MCP are two different solutions to this problem — and together they're what separates a toy chatbot from a genuinely useful agent.

The bottleneck

A model with a 200k token context window can hold roughly 150,000 words — about two full novels. Your business data, emails, documents, and records almost certainly exceed this. RAG solves the "what does it know" problem. MCP solves the "what can it do" problem.

Retrieval-Augmented Generation

RAG is a technique that gives an AI model access to a large knowledge base — your documents, your data — without needing to fit it all in the context window at once. Instead of loading everything, it searches first, then reads only the relevant parts.

The name describes the process: Retrieval (find the right documents) → Augmented (inject them into the prompt) → Generation (the AI answers using what it found). It's like giving an AI a library card instead of memorising every book.

Your questionuser asks
🔍Searchfind relevant docs
📄Injectadd to prompt
🧠AI answersusing real data

The "search" step typically uses vector similarity — your documents are converted into mathematical representations (embeddings) and stored in a vector database. When you ask a question, it's also converted into an embedding, and the database finds the documents whose embeddings are closest to your question. It's semantic search, not keyword search — it finds meaning, not just matching words.

📚
Knowledge base
Any text you want the AI to know: PDFs, emails, notes, menus, product specs, FAQs. Chunks are indexed into a vector database.
🗂️
Vector store
Stores document embeddings for fast semantic search. Common options: Pinecone, Chroma, Supabase pgvector, Qdrant. Many have free tiers.
🔎
Retrieval
The question is embedded and matched against stored chunks. Top N most relevant chunks are pulled out and fed to the model as context.
✍️
Grounded answer
The AI answers using real documents — not guessing from training data. It can cite sources and hallucination drops dramatically.
Raglan use case: café knowledge bot

Upload your menu, supplier list, allergen info, and opening hours as text files. A RAG-powered chatbot can now answer "what's gluten-free on the menu?", "who supplies our oat milk?", or "are you open Good Friday?" — correctly, every time, from your actual documents.

// Conceptual RAG flow (Python pseudocode) // 1. Index your documents (run once) documents = load("menu.pdf", "suppliers.txt", "faq.md") chunks = split_into_chunks(documents, size=500) embeddings = embed(chunks) # converts text → numbers vector_db.store(chunks, embeddings) # saved for later // 2. Answer a question (run every time) question = "Is the banana bread vegan?" q_embedding = embed(question) relevant_chunks = vector_db.search(q_embedding, top_k=3) prompt = f"Answer using this context: {relevant_chunks}\n\nQuestion: {question}" answer = claude.ask(prompt) # grounded in real data

Model Context Protocol

While RAG gives your AI knowledge, MCP gives it hands. The Model Context Protocol — published by Anthropic in November 2024 — is an open standard that lets AI models connect to external tools and data sources in a consistent, interoperable way.

Before MCP, every developer had to build their own custom integration to connect an AI to, say, a Google Calendar or a database. Each integration was a one-off. MCP standardises this — if a service offers an MCP server, any AI that supports MCP can connect to it immediately. It's like a USB standard for AI tools.

🧠
Your AI Agent
Claude, Gemini, GPT-4 — any MCP-compatible model
── MCP connections ──
📅
Google CalendarCheck and create events
📊
Google SheetsRead & write rows
🌐
Web searchLive search results
📦
Your databaseQuery any SQL/NoSQL
📧
Email / SlackSend messages
🔧
Custom toolsAny API you build

When you use Claude Code (covered in another branch), every tool it has — reading files, running terminal commands, searching the web — is an MCP connection. When Claude Code says "let me check your package.json" and actually reads the file, that's MCP at work.

MCP is also why AI assistants in 2026 can book things, update spreadsheets, and trigger workflows — not just answer questions. The "tool use" era of AI runs on protocols like MCP.

RAG vs MCP — when to use which

RAG and MCP solve different problems. In practice, agents often use both. Here's how to think about the distinction:

RAG — Retrieval
For reading static knowledge
  • Document libraries, PDFs, wikis
  • Knowledge that doesn't change minute-to-minute
  • Answering questions from your own data
  • Reducing hallucination with cited sources
  • Works offline once indexed
  • Example: "What does our warranty say about water damage?"
MCP — Tools
For taking real-world actions
  • Live data (weather, bookings, stock)
  • Creating, updating, or deleting things
  • Triggering workflows in other systems
  • Reading data that changes constantly
  • Requires network connection to tool
  • Example: "Book a lesson for Wednesday 10am"
A real Raglan example: surf school agent

RAG handles: "What's your refund policy?" (read from a static policy doc)
MCP handles: "Book me in for Saturday 9am" (write to the booking Google Sheet)
Both together: an agent that knows your business and can actually do things in it.

Where to begin without a developer

You don't need to build RAG or MCP from scratch. The ecosystem has matured fast — here are practical entry points for each level.

Design your knowledge architecture

The most valuable exercise isn't writing code — it's deciding what your agent needs to know and what it needs to be able to do. Answer these questions as if you were briefing a developer to build it.

Knowledge & Tool Architecture Canvas
Think about a business or project you care about. What would make an AI agent genuinely useful for it?

Saved. Use this as a brief if you ever commission someone to build it — or feed it to Claude Code as a starting point.

🧠

You understand RAG and MCP
Two foundational concepts that separate shallow AI use from genuinely powerful agents. Most people who talk about AI have never heard of either.