> ๐ŸŒ **Translation**: This article was translated from [Korean](https://beomanro.com/?p=351).

TL;DR



  • Context injection is the core technique for incorporating search results into LLM prompts
  • Prompt engineering determines 80% of RAG answer generation quality
  • “If you don’t know, say you don’t know” instruction is key to preventing hallucinations
  • Source citations increase the reliability of generated answers
  • Streaming responses improve user experience

๐Ÿ’ก When I first integrated Claude into a RAG system, finding and understanding relevant resources was the most challenging part.
By comparing official documentation with various examples, I identified the core patterns of prompt engineering. In this post, I’ll share what I learned from context injection to answer generation.


1. Understanding Context Injection

1.1 What is Context Injection?

In RAG systems, Context Injection is the process of including retrieved documents in the LLM prompt. This is the heart of RAG and determines the quality of answer generation.

// Basic structure of context injection
interface RAGContext {
  query: string;           // User question
  documents: Document[];   // Retrieved documents
  maxTokens: number;       // Maximum context length
}

async function injectContext(context: RAGContext): Promise<string> {
  const { query, documents } = context;

  // Combine documents into a single context
  const contextText = documents
    .map((doc, i) => `[Document ${i + 1}]\n${doc.content}\nSource: ${doc.source}`)
    .join('\n\n');

  return `Answer the question based on the following documents:

${contextText}

Question: ${query}`;
}

1.2 Context Window Management

Claude’s context window varies by model:

Model Context Window
Claude 3.5 Sonnet 200K tokens
Claude 3.5 Haiku 200K tokens
Claude 3 Opus 200K tokens
// Context length management
function manageContextWindow(
  documents: Document[],
  maxTokens: number = 100000
): Document[] {
  let totalTokens = 0;
  const selectedDocs: Document[] = [];

  for (const doc of documents) {
    const docTokens = estimateTokens(doc.content);

    if (totalTokens + docTokens > maxTokens) {
      break;
    }

    selectedDocs.push(doc);
    totalTokens += docTokens;
  }

  console.log(`Context: ${selectedDocs.length} documents, ~${totalTokens} tokens`);
  return selectedDocs;
}

// Token count estimation (roughly 4 characters = 1 token)
function estimateTokens(text: string): number {
  return Math.ceil(text.length / 4);
}

1.3 Long Context Handling Strategies

Even with 200K tokens, efficient usage is essential:

// Priority-based chunk selection
interface RankedDocument extends Document {
  score: number;
  rank: number;
}

function selectTopDocuments(
  documents: RankedDocument[],
  options: {
    maxDocs: number;
    maxTokens: number;
    minScore: number;
  }
): Document[] {
  return documents
    .filter(doc => doc.score >= options.minScore)
    .sort((a, b) => b.score - a.score)
    .slice(0, options.maxDocs)
    .filter((_, i, arr) => {
      const totalTokens = arr
        .slice(0, i + 1)
        .reduce((sum, d) => sum + estimateTokens(d.content), 0);
      return totalTokens <= options.maxTokens;
    });
}

2. RAG Prompt Design

Prompt engineering is the critical factor that determines answer generation quality in RAG systems.

2.1 Basic RAG Prompt Template

Prompt engineering determines 80% of RAG answer generation quality. Here’s a battle-tested prompt template:

const RAG_SYSTEM_PROMPT = `You are an AI assistant that provides accurate answers based on the provided documents.

## Core Rules

1. **Document-based answers**: Use ONLY information from the provided documents.
2. **Cite sources**: Indicate the source of information using [Document N] format.
3. **Acknowledge uncertainty**: If information isn't in the documents, say "I cannot find that information in the provided documents."
4. **No speculation**: Do not guess or fabricate content not explicitly stated in the documents.

## Answer Format

- Clear and structured responses
- Use markdown when helpful (lists, code blocks, etc.)
- List referenced document numbers at the end of answers`;

function buildRAGPrompt(query: string, documents: Document[]): string {
  const contextSection = documents
    .map((doc, i) => `[Document ${i + 1}]
Title: ${doc.title || 'N/A'}
Content: ${doc.content}
Source: ${doc.source}`)
    .join('\n\n---\n\n');

  return `${RAG_SYSTEM_PROMPT}

---

## Reference Documents

${contextSection}

---

## Question

${query}

---

Please answer the question based on the documents above.`;
}

2.2 Hallucination Prevention Techniques

A key aspect of prompt engineering is preventing Claude from fabricating content not present in the documents (hallucination):

// Enhanced prompt for hallucination prevention
const ANTI_HALLUCINATION_PROMPT = `## Important: Hallucination Prevention Rules

You MUST say "I don't know" in these situations:

1. When the documents don't contain information relevant to the question
2. When document information is incomplete or ambiguous
3. When the question is outside the scope of the documents

Incorrect examples:
- "It's probably ~" (speculation)
- "Generally, ~" (using knowledge outside documents)
- Providing specific numbers or dates not in the documents

Correct examples:
- "I cannot find that information in the provided documents."
- "According to [Document N], ~. However, there is no information about ~."`;

2.3 Citation Requirements

Request clear source citations to increase answer reliability:

const CITATION_PROMPT = `## Citation Rules

When writing answers, cite sources in the following format:

1. **Inline citations**: Add [Document N] at the end of sentences
   Example: "TypeScript is a statically typed language [Document 1]."

2. **Reference list**: Summarize used documents at the end
   Example:
   ---
   ๐Ÿ“š References:
   - [Document 1] TypeScript Official Documentation
   - [Document 3] Project README

3. **Multiple sources**: List all when referencing multiple documents
   Example: "This feature was added in version 2.0 [Document 1, Document 2]."`;

3. Claude Integration Implementation

3.1 Anthropic SDK Setup

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

interface GenerateOptions {
  model?: string;
  maxTokens?: number;
  temperature?: number;
  stream?: boolean;
}

const DEFAULT_OPTIONS: GenerateOptions = {
  model: 'claude-sonnet-4-20250514',
  maxTokens: 4096,
  temperature: 0,  // Low temperature recommended for RAG
  stream: false,
};

3.2 Answer Generation Function

async function generateAnswer(
  query: string,
  documents: Document[],
  options: GenerateOptions = {}
): Promise<string> {
  const config = { ...DEFAULT_OPTIONS, ...options };

  // Context window management
  const selectedDocs = manageContextWindow(documents, 100000);

  // Build RAG prompt
  const prompt = buildRAGPrompt(query, selectedDocs);

  try {
    const response = await anthropic.messages.create({
      model: config.model!,
      max_tokens: config.maxTokens!,
      temperature: config.temperature!,
      messages: [
        {
          role: 'user',
          content: prompt,
        },
      ],
    });

    // Extract response text
    const textBlock = response.content.find(block => block.type === 'text');
    if (!textBlock || textBlock.type !== 'text') {
      throw new Error('No text response from Claude');
    }

    return textBlock.text;
  } catch (error) {
    console.error('Claude API error:', error);
    throw error;
  }
}

3.3 Streaming Response Implementation

Implement streaming responses for better user experience:



async function* generateAnswerStream(
  query: string,
  documents: Document[],
  options: GenerateOptions = {}
): AsyncGenerator<string> {
  const config = { ...DEFAULT_OPTIONS, ...options, stream: true };
  const selectedDocs = manageContextWindow(documents, 100000);
  const prompt = buildRAGPrompt(query, selectedDocs);

  const stream = await anthropic.messages.stream({
    model: config.model!,
    max_tokens: config.maxTokens!,
    temperature: config.temperature!,
    messages: [
      {
        role: 'user',
        content: prompt,
      },
    ],
  });

  for await (const event of stream) {
    if (
      event.type === 'content_block_delta' &&
      event.delta.type === 'text_delta'
    ) {
      yield event.delta.text;
    }
  }
}

// Usage example
async function streamExample() {
  const query = 'What are the main features of TypeScript?';
  const documents = await searchDocuments(query);

  process.stdout.write('Answer: ');

  for await (const chunk of generateAnswerStream(query, documents)) {
    process.stdout.write(chunk);
  }

  console.log('\n--- Streaming complete ---');
}

4. Citation Extraction and Display

4.1 Citation Parsing

Extract citation information from Claude’s response:

interface Citation {
  documentIndex: number;
  documentTitle: string;
  source: string;
}

function extractCitations(
  answer: string,
  documents: Document[]
): Citation[] {
  // Match [Document N] pattern
  const citationPattern = /\[Document\s*(\d+)\]/g;
  const matches = answer.matchAll(citationPattern);

  const citedIndices = new Set<number>();

  for (const match of matches) {
    const index = parseInt(match[1], 10) - 1;
    if (index >= 0 && index < documents.length) {
      citedIndices.add(index);
    }
  }

  return Array.from(citedIndices).map(index => ({
    documentIndex: index + 1,
    documentTitle: documents[index].title || `Document ${index + 1}`,
    source: documents[index].source,
  }));
}

4.2 Answer Formatting

Generate the final answer with citation information:

interface FormattedAnswer {
  content: string;
  citations: Citation[];
  metadata: {
    model: string;
    documentsUsed: number;
    generatedAt: string;
  };
}

function formatAnswer(
  rawAnswer: string,
  documents: Document[],
  model: string
): FormattedAnswer {
  const citations = extractCitations(rawAnswer, documents);

  // Add citation section if not present
  let content = rawAnswer;
  if (!rawAnswer.includes('๐Ÿ“š References') && citations.length > 0) {
    content += '\n\n---\n๐Ÿ“š **References:**\n';
    content += citations
      .map(c => `- [Document ${c.documentIndex}] ${c.documentTitle}`)
      .join('\n');
  }

  return {
    content,
    citations,
    metadata: {
      model,
      documentsUsed: citations.length,
      generatedAt: new Date().toISOString(),
    },
  };
}

5. Complete RAG Pipeline Integration

5.1 Complete RAG Class

import Anthropic from '@anthropic-ai/sdk';

interface RAGConfig {
  anthropicApiKey: string;
  model?: string;
  maxContextTokens?: number;
  temperature?: number;
}

class RAGGenerator {
  private anthropic: Anthropic;
  private config: Required<RAGConfig>;

  constructor(config: RAGConfig) {
    this.anthropic = new Anthropic({
      apiKey: config.anthropicApiKey,
    });

    this.config = {
      anthropicApiKey: config.anthropicApiKey,
      model: config.model || 'claude-sonnet-4-20250514',
      maxContextTokens: config.maxContextTokens || 100000,
      temperature: config.temperature || 0,
    };
  }

  async generate(
    query: string,
    documents: Document[]
  ): Promise<FormattedAnswer> {
    // 1. Context management
    const selectedDocs = manageContextWindow(
      documents,
      this.config.maxContextTokens
    );

    // 2. Build prompt
    const prompt = buildRAGPrompt(query, selectedDocs);

    // 3. Call Claude
    const response = await this.anthropic.messages.create({
      model: this.config.model,
      max_tokens: 4096,
      temperature: this.config.temperature,
      messages: [{ role: 'user', content: prompt }],
    });

    // 4. Extract response
    const textBlock = response.content.find(b => b.type === 'text');
    if (!textBlock || textBlock.type !== 'text') {
      throw new Error('No text response');
    }

    // 5. Format and extract citations
    return formatAnswer(textBlock.text, selectedDocs, this.config.model);
  }
}

5.2 Usage Example

// Initialize RAG system
const rag = new RAGGenerator({
  anthropicApiKey: process.env.ANTHROPIC_API_KEY!,
  model: 'claude-sonnet-4-20250514',
  temperature: 0,
});

// Search + Answer generation
async function askQuestion(query: string) {
  // 1. Search (hybrid search implemented in Day 4)
  const documents = await hybridSearch(query, {
    topK: 5,
    alpha: 0.7,
  });

  // 2. Generate answer
  const answer = await rag.generate(query, documents);

  console.log('=== Answer ===');
  console.log(answer.content);
  console.log('\n=== Metadata ===');
  console.log(`Documents used: ${answer.metadata.documentsUsed}`);
  console.log(`Model: ${answer.metadata.model}`);

  return answer;
}

// Execute
askQuestion('How do you use generics in TypeScript?');

6. Quality Improvement Tips

Here are tips to further enhance prompt engineering and answer generation quality.

6.1 Temperature Settings

Use low temperature for RAG answer generation:

// Recommended RAG settings
const RAG_TEMPERATURE = 0;  // Most deterministic answers

// When creative answers are needed
const CREATIVE_TEMPERATURE = 0.3;

6.2 Error Handling

async function safeGenerate(
  query: string,
  documents: Document[]
): Promise<FormattedAnswer | null> {
  try {
    return await rag.generate(query, documents);
  } catch (error) {
    if (error instanceof Anthropic.APIError) {
      console.error(`API error (${error.status}):`, error.message);

      if (error.status === 429) {
        console.log('Rate limit - retrying shortly...');
        await sleep(5000);
        return safeGenerate(query, documents);
      }
    }

    return null;
  }
}

6.3 Answer Quality Validation

function validateAnswer(answer: FormattedAnswer): boolean {
  // 1. Check for citations
  if (answer.citations.length === 0) {
    console.warn('Warning: Answer without citations');
    return false;
  }

  // 2. Check for "unknown" patterns
  const unknownPatterns = [
    'cannot find',
    'no information',
    'unable to confirm',
  ];

  const hasUnknown = unknownPatterns.some(p =>
    answer.content.toLowerCase().includes(p)
  );

  if (hasUnknown) {
    console.info('Info: Question appears to be outside document scope');
  }

  return true;
}

Conclusion

In this post, we covered the core of RAG systemsโ€”Claude integration and answer generation:

  1. Context Injection: Effectively delivering search results to the LLM
  2. Prompt Engineering: Designing prompts for hallucination prevention and source citations
  3. Answer Generation: High-quality response generation using the Claude API
  4. Streaming Responses: Improving user experience
  5. Citation Extraction: Enhancing answer reliability

Mastering prompt engineering and context injection will significantly improve your RAG system’s answer generation quality.

In Day 6, we’ll cover production deployment and optimization.



๐Ÿ“š Series Index

RAG (6/6)

  1. Day 1: RAG Concepts and Architecture
  2. Day 2: Document Processing and Chunking
  3. Day 3: Embeddings and Vector Database
  4. Day 4: Search Optimization and Reranking
  5. ๐Ÿ‘‰ Day 5: Claude Integration and Answer Generation (Current)
  6. Day 6: Production Deployment and Optimization

๐Ÿ”— GitHub Repository