> ๐ **Translation**: This article was translated from [Korean](https://beomanro.com/?p=351).
TL;DR
- Context injection is the core technique for incorporating search results into LLM prompts
- Prompt engineering determines 80% of RAG answer generation quality
- “If you don’t know, say you don’t know” instruction is key to preventing hallucinations
- Source citations increase the reliability of generated answers
- Streaming responses improve user experience
๐ก When I first integrated Claude into a RAG system, finding and understanding relevant resources was the most challenging part.
By comparing official documentation with various examples, I identified the core patterns of prompt engineering. In this post, I’ll share what I learned from context injection to answer generation.
1. Understanding Context Injection
1.1 What is Context Injection?
In RAG systems, Context Injection is the process of including retrieved documents in the LLM prompt. This is the heart of RAG and determines the quality of answer generation.
// Basic structure of context injection
interface RAGContext {
query: string; // User question
documents: Document[]; // Retrieved documents
maxTokens: number; // Maximum context length
}
async function injectContext(context: RAGContext): Promise<string> {
const { query, documents } = context;
// Combine documents into a single context
const contextText = documents
.map((doc, i) => `[Document ${i + 1}]\n${doc.content}\nSource: ${doc.source}`)
.join('\n\n');
return `Answer the question based on the following documents:
${contextText}
Question: ${query}`;
}
1.2 Context Window Management
Claude’s context window varies by model:
| Model | Context Window |
|---|---|
| Claude 3.5 Sonnet | 200K tokens |
| Claude 3.5 Haiku | 200K tokens |
| Claude 3 Opus | 200K tokens |
// Context length management
function manageContextWindow(
documents: Document[],
maxTokens: number = 100000
): Document[] {
let totalTokens = 0;
const selectedDocs: Document[] = [];
for (const doc of documents) {
const docTokens = estimateTokens(doc.content);
if (totalTokens + docTokens > maxTokens) {
break;
}
selectedDocs.push(doc);
totalTokens += docTokens;
}
console.log(`Context: ${selectedDocs.length} documents, ~${totalTokens} tokens`);
return selectedDocs;
}
// Token count estimation (roughly 4 characters = 1 token)
function estimateTokens(text: string): number {
return Math.ceil(text.length / 4);
}
1.3 Long Context Handling Strategies
Even with 200K tokens, efficient usage is essential:
// Priority-based chunk selection
interface RankedDocument extends Document {
score: number;
rank: number;
}
function selectTopDocuments(
documents: RankedDocument[],
options: {
maxDocs: number;
maxTokens: number;
minScore: number;
}
): Document[] {
return documents
.filter(doc => doc.score >= options.minScore)
.sort((a, b) => b.score - a.score)
.slice(0, options.maxDocs)
.filter((_, i, arr) => {
const totalTokens = arr
.slice(0, i + 1)
.reduce((sum, d) => sum + estimateTokens(d.content), 0);
return totalTokens <= options.maxTokens;
});
}
2. RAG Prompt Design
Prompt engineering is the critical factor that determines answer generation quality in RAG systems.
2.1 Basic RAG Prompt Template
Prompt engineering determines 80% of RAG answer generation quality. Here’s a battle-tested prompt template:
const RAG_SYSTEM_PROMPT = `You are an AI assistant that provides accurate answers based on the provided documents.
## Core Rules
1. **Document-based answers**: Use ONLY information from the provided documents.
2. **Cite sources**: Indicate the source of information using [Document N] format.
3. **Acknowledge uncertainty**: If information isn't in the documents, say "I cannot find that information in the provided documents."
4. **No speculation**: Do not guess or fabricate content not explicitly stated in the documents.
## Answer Format
- Clear and structured responses
- Use markdown when helpful (lists, code blocks, etc.)
- List referenced document numbers at the end of answers`;
function buildRAGPrompt(query: string, documents: Document[]): string {
const contextSection = documents
.map((doc, i) => `[Document ${i + 1}]
Title: ${doc.title || 'N/A'}
Content: ${doc.content}
Source: ${doc.source}`)
.join('\n\n---\n\n');
return `${RAG_SYSTEM_PROMPT}
---
## Reference Documents
${contextSection}
---
## Question
${query}
---
Please answer the question based on the documents above.`;
}
2.2 Hallucination Prevention Techniques
A key aspect of prompt engineering is preventing Claude from fabricating content not present in the documents (hallucination):
// Enhanced prompt for hallucination prevention const ANTI_HALLUCINATION_PROMPT = `## Important: Hallucination Prevention Rules You MUST say "I don't know" in these situations: 1. When the documents don't contain information relevant to the question 2. When document information is incomplete or ambiguous 3. When the question is outside the scope of the documents Incorrect examples: - "It's probably ~" (speculation) - "Generally, ~" (using knowledge outside documents) - Providing specific numbers or dates not in the documents Correct examples: - "I cannot find that information in the provided documents." - "According to [Document N], ~. However, there is no information about ~."`;
2.3 Citation Requirements
Request clear source citations to increase answer reliability:
const CITATION_PROMPT = `## Citation Rules When writing answers, cite sources in the following format: 1. **Inline citations**: Add [Document N] at the end of sentences Example: "TypeScript is a statically typed language [Document 1]." 2. **Reference list**: Summarize used documents at the end Example: --- ๐ References: - [Document 1] TypeScript Official Documentation - [Document 3] Project README 3. **Multiple sources**: List all when referencing multiple documents Example: "This feature was added in version 2.0 [Document 1, Document 2]."`;
3. Claude Integration Implementation
3.1 Anthropic SDK Setup
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
interface GenerateOptions {
model?: string;
maxTokens?: number;
temperature?: number;
stream?: boolean;
}
const DEFAULT_OPTIONS: GenerateOptions = {
model: 'claude-sonnet-4-20250514',
maxTokens: 4096,
temperature: 0, // Low temperature recommended for RAG
stream: false,
};
3.2 Answer Generation Function
async function generateAnswer(
query: string,
documents: Document[],
options: GenerateOptions = {}
): Promise<string> {
const config = { ...DEFAULT_OPTIONS, ...options };
// Context window management
const selectedDocs = manageContextWindow(documents, 100000);
// Build RAG prompt
const prompt = buildRAGPrompt(query, selectedDocs);
try {
const response = await anthropic.messages.create({
model: config.model!,
max_tokens: config.maxTokens!,
temperature: config.temperature!,
messages: [
{
role: 'user',
content: prompt,
},
],
});
// Extract response text
const textBlock = response.content.find(block => block.type === 'text');
if (!textBlock || textBlock.type !== 'text') {
throw new Error('No text response from Claude');
}
return textBlock.text;
} catch (error) {
console.error('Claude API error:', error);
throw error;
}
}
3.3 Streaming Response Implementation
Implement streaming responses for better user experience:
async function* generateAnswerStream(
query: string,
documents: Document[],
options: GenerateOptions = {}
): AsyncGenerator<string> {
const config = { ...DEFAULT_OPTIONS, ...options, stream: true };
const selectedDocs = manageContextWindow(documents, 100000);
const prompt = buildRAGPrompt(query, selectedDocs);
const stream = await anthropic.messages.stream({
model: config.model!,
max_tokens: config.maxTokens!,
temperature: config.temperature!,
messages: [
{
role: 'user',
content: prompt,
},
],
});
for await (const event of stream) {
if (
event.type === 'content_block_delta' &&
event.delta.type === 'text_delta'
) {
yield event.delta.text;
}
}
}
// Usage example
async function streamExample() {
const query = 'What are the main features of TypeScript?';
const documents = await searchDocuments(query);
process.stdout.write('Answer: ');
for await (const chunk of generateAnswerStream(query, documents)) {
process.stdout.write(chunk);
}
console.log('\n--- Streaming complete ---');
}
4. Citation Extraction and Display
4.1 Citation Parsing
Extract citation information from Claude’s response:
interface Citation {
documentIndex: number;
documentTitle: string;
source: string;
}
function extractCitations(
answer: string,
documents: Document[]
): Citation[] {
// Match [Document N] pattern
const citationPattern = /\[Document\s*(\d+)\]/g;
const matches = answer.matchAll(citationPattern);
const citedIndices = new Set<number>();
for (const match of matches) {
const index = parseInt(match[1], 10) - 1;
if (index >= 0 && index < documents.length) {
citedIndices.add(index);
}
}
return Array.from(citedIndices).map(index => ({
documentIndex: index + 1,
documentTitle: documents[index].title || `Document ${index + 1}`,
source: documents[index].source,
}));
}
4.2 Answer Formatting
Generate the final answer with citation information:
interface FormattedAnswer {
content: string;
citations: Citation[];
metadata: {
model: string;
documentsUsed: number;
generatedAt: string;
};
}
function formatAnswer(
rawAnswer: string,
documents: Document[],
model: string
): FormattedAnswer {
const citations = extractCitations(rawAnswer, documents);
// Add citation section if not present
let content = rawAnswer;
if (!rawAnswer.includes('๐ References') && citations.length > 0) {
content += '\n\n---\n๐ **References:**\n';
content += citations
.map(c => `- [Document ${c.documentIndex}] ${c.documentTitle}`)
.join('\n');
}
return {
content,
citations,
metadata: {
model,
documentsUsed: citations.length,
generatedAt: new Date().toISOString(),
},
};
}
5. Complete RAG Pipeline Integration
5.1 Complete RAG Class
import Anthropic from '@anthropic-ai/sdk';
interface RAGConfig {
anthropicApiKey: string;
model?: string;
maxContextTokens?: number;
temperature?: number;
}
class RAGGenerator {
private anthropic: Anthropic;
private config: Required<RAGConfig>;
constructor(config: RAGConfig) {
this.anthropic = new Anthropic({
apiKey: config.anthropicApiKey,
});
this.config = {
anthropicApiKey: config.anthropicApiKey,
model: config.model || 'claude-sonnet-4-20250514',
maxContextTokens: config.maxContextTokens || 100000,
temperature: config.temperature || 0,
};
}
async generate(
query: string,
documents: Document[]
): Promise<FormattedAnswer> {
// 1. Context management
const selectedDocs = manageContextWindow(
documents,
this.config.maxContextTokens
);
// 2. Build prompt
const prompt = buildRAGPrompt(query, selectedDocs);
// 3. Call Claude
const response = await this.anthropic.messages.create({
model: this.config.model,
max_tokens: 4096,
temperature: this.config.temperature,
messages: [{ role: 'user', content: prompt }],
});
// 4. Extract response
const textBlock = response.content.find(b => b.type === 'text');
if (!textBlock || textBlock.type !== 'text') {
throw new Error('No text response');
}
// 5. Format and extract citations
return formatAnswer(textBlock.text, selectedDocs, this.config.model);
}
}
5.2 Usage Example
// Initialize RAG system
const rag = new RAGGenerator({
anthropicApiKey: process.env.ANTHROPIC_API_KEY!,
model: 'claude-sonnet-4-20250514',
temperature: 0,
});
// Search + Answer generation
async function askQuestion(query: string) {
// 1. Search (hybrid search implemented in Day 4)
const documents = await hybridSearch(query, {
topK: 5,
alpha: 0.7,
});
// 2. Generate answer
const answer = await rag.generate(query, documents);
console.log('=== Answer ===');
console.log(answer.content);
console.log('\n=== Metadata ===');
console.log(`Documents used: ${answer.metadata.documentsUsed}`);
console.log(`Model: ${answer.metadata.model}`);
return answer;
}
// Execute
askQuestion('How do you use generics in TypeScript?');
6. Quality Improvement Tips
Here are tips to further enhance prompt engineering and answer generation quality.
6.1 Temperature Settings
Use low temperature for RAG answer generation:
// Recommended RAG settings const RAG_TEMPERATURE = 0; // Most deterministic answers // When creative answers are needed const CREATIVE_TEMPERATURE = 0.3;
6.2 Error Handling
async function safeGenerate(
query: string,
documents: Document[]
): Promise<FormattedAnswer | null> {
try {
return await rag.generate(query, documents);
} catch (error) {
if (error instanceof Anthropic.APIError) {
console.error(`API error (${error.status}):`, error.message);
if (error.status === 429) {
console.log('Rate limit - retrying shortly...');
await sleep(5000);
return safeGenerate(query, documents);
}
}
return null;
}
}
6.3 Answer Quality Validation
function validateAnswer(answer: FormattedAnswer): boolean {
// 1. Check for citations
if (answer.citations.length === 0) {
console.warn('Warning: Answer without citations');
return false;
}
// 2. Check for "unknown" patterns
const unknownPatterns = [
'cannot find',
'no information',
'unable to confirm',
];
const hasUnknown = unknownPatterns.some(p =>
answer.content.toLowerCase().includes(p)
);
if (hasUnknown) {
console.info('Info: Question appears to be outside document scope');
}
return true;
}
Conclusion
In this post, we covered the core of RAG systemsโClaude integration and answer generation:
- Context Injection: Effectively delivering search results to the LLM
- Prompt Engineering: Designing prompts for hallucination prevention and source citations
- Answer Generation: High-quality response generation using the Claude API
- Streaming Responses: Improving user experience
- Citation Extraction: Enhancing answer reliability
Mastering prompt engineering and context injection will significantly improve your RAG system’s answer generation quality.
In Day 6, we’ll cover production deployment and optimization.
๐ Series Index
RAG (6/6)
- Day 1: RAG Concepts and Architecture
- Day 2: Document Processing and Chunking
- Day 3: Embeddings and Vector Database
- Day 4: Search Optimization and Reranking
- ๐ Day 5: Claude Integration and Answer Generation (Current)
- Day 6: Production Deployment and Optimization
๐ GitHub Repository
Leave A Comment