π Translation: Translated from Korean.
TL;DR
- RAG (Retrieval Augmented Generation) is a technology that allows LLMs to search and answer from external documents
- Solves LLM hallucination problems and lack of up-to-date information
- Store documents in a vector database, find content similar to the question, and pass it to the LLM
- In this series, we build a complete Retrieval Augmented Generation system and AI chatbot with TypeScript
- GitHub: my-first-rag
1. What is RAG?
LLM Limitations: Hallucination and Information Gap
LLMs (Large Language Models) like ChatGPT and Claude show remarkable capabilities. However, they have critical limitations.
First Problem: Hallucination
What happens if you ask an LLM, “What’s our company’s vacation policy?” The LLM answers confidently. But that content is completely made up. This is the hallucination problem. LLMs tend to answer confidently even when they don’t know.
Second Problem: Lack of Up-to-date Information
LLMs only know data up to their training point. A model trained in 2024 doesn’t know libraries released in 2025 or changed APIs. It certainly doesn’t know your internal documents or latest product information.
How RAG Solves These Problems
RAG stands for Retrieval Augmented Generation. The core idea of RAG is simple:
“Before the LLM answers, let it first search and read relevant documents”
Let’s look at how this system works:
- User asks a question: “What’s our company’s vacation policy?”
- The system searches for relevant documents in the vector database
- Passes the retrieved documents (e.g., company policy handbook) to the LLM
- The LLM generates accurate answers by referencing the documents
This significantly reduces LLM hallucination problems. If content isn’t in the documents, we can guide the LLM to answer “I cannot find this in the documents.”
Real-World RAG Applications
RAG technology is already being used in various places:
- Enterprise AI Chatbots: Q&A based on internal documents
- Customer Support Bots: Automatic responses based on FAQs and manuals
- Legal/Medical AI: Information provision based on case law or papers
- Code Assistants: Coding support based on project documentation
The system we’ll build in this series will also be the foundation for such AI chatbots.
2. Retrieval Augmented Generation vs Fine-tuning: What to Choose?
There are two main ways to improve LLMs for specific domains.
Fine-tuning Approach
Fine-tuning is a method of additionally training the LLM itself.
Pros:
- Model “internalizes” domain knowledge
- No additional search step needed during inference
Cons:
- GPU resources and time needed for training
- Retraining required when adding new information
- Costs increase as model size grows
Retrieval Augmented Generation Approach
Retrieval Augmented Generation provides external knowledge through search without modifying the LLM.
Pros:
- Document additions/modifications are immediately reflected
- No GPU training needed, cost-effective
- Can clearly show sources
- Mitigates LLM hallucination problems
Cons:
- Depends on search quality
- Requires vector database operation
- Context window limitations
When to Choose Retrieval Augmented Generation?
| Situation | Recommended Approach |
|---|---|
| Documents update frequently | Retrieval Augmented Generation |
| Source citation needed | Retrieval Augmented Generation |
| Quick deployment needed | Retrieval Augmented Generation |
| Domain language style change | Fine-tuning |
| Offline environment | Fine-tuning |
For most enterprise AI chatbot projects, Retrieval Augmented Generation is the more practical choice. We’ll build our system using this approach in this series.
3. Retrieval Augmented Generation Architecture in Detail
A Retrieval Augmented Generation system consists of two main pipelines.
3.1 Indexing Pipeline (Offline)
This is the process of storing documents in the vector database. Once executed, documents become searchable.
[Documents] β [Chunking] β [Embedding] β [Vector Database Storage]
Step-by-step explanation:
- Document Loading: Convert various format documents (PDF, markdown, web pages) to text
- Chunking: Split long documents into smaller pieces (chunks). Usually 500-1000 tokens
- Embedding: Convert each chunk into a vector (number array). Similar meaning texts become similar vectors
- Storage: Store vectors in the vector database. Use Supabase, Pinecone, etc.
3.2 Retrieval Pipeline (Online)
This is the process of answering user questions. It operates in real-time.
[Question] β [Embedding] β [Vector Search] β [Context Construction] β [LLM Answer Generation]
Step-by-step explanation:
- Question Embedding: Convert user question into a vector
- Vector Search: Search for document chunks similar to the question vector in the vector database
- Context Construction: Include retrieved chunks in the LLM prompt
- Answer Generation: LLM generates answers referencing the context
3.3 Overall System Structure
// Basic System Interface
interface RAGSystem {
// Indexing Pipeline
ingest(documents: Document[]): Promise<void>;
// Retrieval Pipeline
query(question: string): Promise<Answer>;
}
interface Document {
content: string;
metadata: {
source: string;
title?: string;
[key: string]: unknown;
};
}
interface Answer {
text: string;
sources: Source[];
}
interface Source {
content: string;
metadata: Document['metadata'];
score: number;
}
This interface is the skeleton of the Retrieval Augmented Generation system we’ll implement over 6 days.
4. Project Setup
Let’s start building our project now.
4.1 Create Repository
# Create project directory mkdir my-first-rag cd my-first-rag # Initialize Git git init # Create package.json npm init -y
4.2 TypeScript Configuration
# Install TypeScript and essential packages npm install typescript @types/node tsx -D # Create tsconfig.json npx tsc --init
Modify tsconfig.json as follows:
{
"compilerOptions": {
"target": "ES2022",
"module": "NodeNext",
"moduleResolution": "NodeNext",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"outDir": "./dist",
"rootDir": "./src",
"declaration": true
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist"]
}
4.3 Install Required Packages
# Anthropic Claude SDK (LLM) npm install @anthropic-ai/sdk # Document processing npm install pdf-parse gray-matter # Embedding and Vector DB (detailed in Day 3) npm install @supabase/supabase-js # Utilities npm install dotenv zod
4.4 Project Structure
my-first-rag/ βββ src/ β βββ index.ts # Entry point β βββ rag/ β β βββ simple-rag.ts # Basic class β β βββ document-loader.ts # Document loader β β βββ chunker.ts # Chunking β β βββ embedder.ts # Embedding β β βββ vector-store.ts # Vector store β β βββ retriever.ts # Retrieval β β βββ generator.ts # Answer generation β βββ utils/ β βββ logger.ts # Logging βββ examples/ β βββ day1-basic.ts # Day 1 example βββ documents/ # Documents to index βββ .env # Environment variables βββ package.json βββ tsconfig.json
4.5 Environment Variables Setup
Create a .env file:
# Anthropic API (Claude) ANTHROPIC_API_KEY=your-api-key-here # Supabase (Vector Database) SUPABASE_URL=your-supabase-url SUPABASE_KEY=your-supabase-anon-key # Voyage AI (Embedding) - Used in Day 3 VOYAGE_API_KEY=your-voyage-api-key
4.6 Basic Skeleton Code
src/rag/simple-rag.ts:
import Anthropic from '@anthropic-ai/sdk';
export interface RAGConfig {
anthropicApiKey: string;
supabaseUrl?: string;
supabaseKey?: string;
}
export class SimpleRAG {
private client: Anthropic;
constructor(config: RAGConfig) {
this.client = new Anthropic({
apiKey: config.anthropicApiKey,
});
}
// Implement in Day 2: Document indexing
async ingest(documents: string[]): Promise<void> {
console.log(`π ${documents.length} documents to be indexed`);
// TODO: Chunking β Embedding β Storage
}
// Complete in Day 5: Q&A
async query(question: string): Promise<string> {
console.log(`β Question: ${question}`);
// TODO: Find relevant documents via vector search
const relevantDocs = '(Retrieved documents go here)';
// Ask LLM with context
const response = await this.client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [
{
role: 'user',
content: `Please answer the question by referring to the following documents.
Documents:
${relevantDocs}
Question: ${question}
If the content is not in the documents, please answer "I cannot find this in the documents."`,
},
],
});
const textBlock = response.content[0];
if (textBlock.type === 'text') {
return textBlock.text;
}
return 'Unable to generate an answer.';
}
}
src/index.ts:
import { SimpleRAG } from './rag/simple-rag';
import 'dotenv/config';
async function main() {
const rag = new SimpleRAG({
anthropicApiKey: process.env.ANTHROPIC_API_KEY!,
});
// Day 1: Verify basic structure
console.log('π System initialization complete');
// Test question (no search functionality yet)
const answer = await rag.query('What is retrieval augmented generation?');
console.log('π¬ Answer:', answer);
}
main().catch(console.error);
Let’s run it:
npx tsx src/index.ts
Since there’s no vector database search functionality yet, meaningful answers won’t come out. From Day 2, we’ll implement each component in earnest.
5. Extending to an AI Chatbot
The Retrieval Augmented Generation system we’re building becomes the core engine of an AI chatbot. In Day 6, we’ll extend this system as follows:
- Conversation History Management: Maintain previous conversation context
- Streaming Responses: Real-time answer output
- Source Display: Provide links to source documents for answers
- API Server: Provide AI chatbot service via REST API
Here’s why Retrieval Augmented Generation is important when building an AI chatbot:
| Regular Chatbot | Retrieval-based AI Chatbot |
|---|---|
| Relies only on training data | Can search latest documents |
| Serious hallucination problems | Accurate answers based on documents |
| Unclear sources | Clear source provision |
| Difficult to update | Immediate reflection by adding documents |
6. Conclusion and Preview
Let’s summarize what we learned today:
Key Points
- RAG (Retrieval Augmented Generation) solves LLM hallucination and information gap problems
- RAG Architecture: Indexing Pipeline + Retrieval Pipeline
- Store documents in vector database and search based on similarity
- More flexible and cost-effective than fine-tuning
- Core technology for AI chatbot development
Day 2 Preview: Document Processing and Chunking Strategies
In the next article, we’ll cover document processing, the first step of this approach:
- Loading PDF, markdown, and text files
- Effective chunking strategies (fixed-size vs semantic-based)
- Improving search quality with metadata management
Check out the full code on GitHub:
https://github.com/dh1789/my-first-rag
π Series Index
RAG (6/6)
- π Day 1: RAG Concepts and Architecture (Current)
- Day 2: Document Processing and Chunking
- Day 3: Embeddings and Vector Database
- Day 4: Search Optimization and Reranking
- Day 5: Claude Integration and Answer Generation
- Day 6: Production Deployment and Optimization
π GitHub Repository
Leave A Comment