TL;DR
- Semantic search finds documents based on meaning but struggles with exact keyword matching
- Keyword search (BM25) excels at precise term matching but can’t understand synonyms or context
- Hybrid search combines both approaches to significantly improve search quality
- Reranking reorders search results to surface the most relevant documents
- Search parameter tuning optimizes RAG system performance
- GitHub: my-first-rag
💡 Why I wrote this article
Adapting to a new environment without proper documentation or guides was challenging. I wasted time repeatedly solving the same problems and searching for information that someone already knew. I’m writing this series to help others avoid these repetitive struggles and to deepen my own understanding through the process of organizing this knowledge.
1. The Core Challenge of RAG Search
1.1 Why Search Matters
In RAG systems, search optimization is the key factor that determines answer quality. No matter how powerful your LLM is, retrieving the wrong documents leads to wrong answers.
// Search quality determines RAG quality const query = "How to use type guards in TypeScript"; // Poor search results -> Poor answers const badResults = ["JavaScript basics", "Python type hints"]; // Good search results -> Good answers const goodResults = [ "TypeScript type guard patterns", "Implementing custom type guards" ];
1.2 Limitations of Semantic Search
The semantic search we implemented in Day 3 finds documents based on meaning. However, it has several limitations:
// Example of semantic search limitations const query = "RFC 2119 MUST keyword"; // Semantic search results - semantically similar documents const semanticResults = [ "Standard document writing guidelines", // Relevance: Medium "How to define mandatory requirements", // Relevance: Medium "Documentation best practices" // Relevance: Low ]; // The document we actually want const expectedResult = "RFC 2119 Standard Keyword Definitions - MUST, SHOULD, MAY";
Problems with semantic search:
- May miss exact keywords (RFC 2119)
- Vulnerable to proper nouns and abbreviations
- Difficulty recognizing new terms or domain-specific terminology
2. Comparing Search Approaches
💪 To be honest…
When I first studied search optimization, I struggled because the concepts weren’t clear in my mind. BM25, TF-IDF, semantic search, reranking… There were many terms, but I couldn’t grasp how they differed or when to use each one. Eventually, I understood them one by one by writing code and comparing results. In this article, I’ll share the areas where I struggled.
2.1 Keyword Search (BM25)
BM25 is a traditional keyword-based search algorithm. It forms the foundation of search optimization.
// src/rag/retrievers/bm25-retriever.ts
import { BM25 } from 'bm25-ts';
export class BM25Retriever {
private index: BM25;
private documents: Document[];
constructor() {
this.index = new BM25();
this.documents = [];
}
async index(documents: Document[]): Promise<void> {
this.documents = documents;
// Tokenize and index documents
const tokenizedDocs = documents.map(doc =>
this.tokenize(doc.content)
);
this.index.addDocuments(tokenizedDocs);
}
async search(query: string, topK: number = 5): Promise<SearchResult[]> {
const tokens = this.tokenize(query);
const scores = this.index.search(tokens);
return scores
.map((score, idx) => ({
document: this.documents[idx],
score,
method: 'bm25' as const
}))
.sort((a, b) => b.score - a.score)
.slice(0, topK);
}
private tokenize(text: string): string[] {
// Tokenization for Korean + English
return text
.toLowerCase()
.replace(/[^\w\s가-힣]/g, ' ')
.split(/\s+/)
.filter(token => token.length > 1);
}
}
Advantages of BM25:
- Precise keyword matching
- Fast search speed
- Higher weight for rare terms
2.2 Semantic Search Implementation
Semantic search finds semantically related documents based on vector similarity.
// src/rag/retrievers/semantic-retriever.ts
import { SupabaseVectorStore } from '../stores/supabase-store';
import { VoyageEmbedder } from '../embedders/voyage-embedder';
export class SemanticRetriever {
constructor(
private vectorStore: SupabaseVectorStore,
private embedder: VoyageEmbedder
) {}
async search(query: string, topK: number = 5): Promise<SearchResult[]> {
// Convert query to vector
const queryVector = await this.embedder.embed(query, 'query');
// Vector similarity search
const results = await this.vectorStore.search(queryVector, topK);
return results.map(result => ({
document: result.document,
score: result.similarity,
method: 'semantic' as const
}));
}
}
2.3 Hybrid Search
Hybrid search combines BM25 and semantic search. It leverages the strengths of both approaches to achieve search optimization.
// src/rag/retrievers/hybrid-retriever.ts
export interface HybridConfig {
semanticWeight: number; // Semantic search weight (0-1)
bm25Weight: number; // BM25 weight (0-1)
topK: number;
fusionMethod: 'rrf' | 'weighted';
}
export class HybridRetriever {
constructor(
private semanticRetriever: SemanticRetriever,
private bm25Retriever: BM25Retriever,
private config: HybridConfig
) {}
async search(query: string): Promise<SearchResult[]> {
// Run both searches in parallel
const [semanticResults, bm25Results] = await Promise.all([
this.semanticRetriever.search(query, this.config.topK * 2),
this.bm25Retriever.search(query, this.config.topK * 2)
]);
// Fuse results
if (this.config.fusionMethod === 'rrf') {
return this.reciprocalRankFusion(semanticResults, bm25Results);
}
return this.weightedFusion(semanticResults, bm25Results);
}
// Reciprocal Rank Fusion - rank-based fusion
private reciprocalRankFusion(
semanticResults: SearchResult[],
bm25Results: SearchResult[]
): SearchResult[] {
const k = 60; // RRF constant
const scores = new Map<string, number>();
// Calculate scores for semantic search results
semanticResults.forEach((result, rank) => {
const docId = result.document.id;
const rrfScore = 1 / (k + rank + 1);
scores.set(docId, (scores.get(docId) || 0) + rrfScore * this.config.semanticWeight);
});
// Add BM25 result scores
bm25Results.forEach((result, rank) => {
const docId = result.document.id;
const rrfScore = 1 / (k + rank + 1);
scores.set(docId, (scores.get(docId) || 0) + rrfScore * this.config.bm25Weight);
});
// Sort by score
const allDocs = new Map([
...semanticResults.map(r => [r.document.id, r.document]),
...bm25Results.map(r => [r.document.id, r.document])
]);
return Array.from(scores.entries())
.sort((a, b) => b[1] - a[1])
.slice(0, this.config.topK)
.map(([docId, score]) => ({
document: allDocs.get(docId)!,
score,
method: 'hybrid' as const
}));
}
// Weighted fusion
private weightedFusion(
semanticResults: SearchResult[],
bm25Results: SearchResult[]
): SearchResult[] {
const scores = new Map<string, { score: number; document: Document }>();
// Normalize scores and apply weights
const maxSemantic = Math.max(...semanticResults.map(r => r.score));
const maxBm25 = Math.max(...bm25Results.map(r => r.score));
semanticResults.forEach(result => {
const normalizedScore = result.score / maxSemantic;
const weightedScore = normalizedScore * this.config.semanticWeight;
scores.set(result.document.id, {
score: weightedScore,
document: result.document
});
});
bm25Results.forEach(result => {
const normalizedScore = result.score / maxBm25;
const weightedScore = normalizedScore * this.config.bm25Weight;
const existing = scores.get(result.document.id);
if (existing) {
existing.score += weightedScore;
} else {
scores.set(result.document.id, {
score: weightedScore,
document: result.document
});
}
});
return Array.from(scores.values())
.sort((a, b) => b.score - a.score)
.slice(0, this.config.topK)
.map(item => ({
document: item.document,
score: item.score,
method: 'hybrid' as const
}));
}
}
3. Search Parameter Tuning
3.1 Top-K Configuration
In search optimization, the top-k value determines the number of search results.
// Top-k configuration guide
interface TopKConfig {
// General Q&A
simple: 3,
// Complex questions
complex: 5,
// Comprehensive analysis
comprehensive: 10,
// With reranking (search more, then filter)
withReranking: 20
}
// Dynamic top-k determination
function determineTopK(query: string): number {
const complexity = analyzeQueryComplexity(query);
if (complexity.isMultiHop) return 10;
if (complexity.requiresComparison) return 8;
if (complexity.isFactual) return 3;
return 5; // Default
}
3.2 Similarity Threshold
Filter out results with low similarity:
// src/rag/retrievers/filtered-retriever.ts
export class FilteredRetriever {
constructor(
private retriever: HybridRetriever,
private minScore: number = 0.7
) {}
async search(query: string, topK: number): Promise<SearchResult[]> {
const results = await this.retriever.search(query);
// Return only results above threshold
const filtered = results.filter(r => r.score >= this.minScore);
// Ensure minimum number of results
if (filtered.length < 2 && results.length >= 2) {
return results.slice(0, 2);
}
return filtered.slice(0, topK);
}
}
3.3 Metadata Filtering
Apply metadata-based filtering for search optimization:
// Metadata filter definition
interface MetadataFilter {
field: string;
operator: 'eq' | 'ne' | 'gt' | 'lt' | 'in' | 'contains';
value: any;
}
// Apply metadata filter in Supabase
async function searchWithFilter(
queryVector: number[],
filters: MetadataFilter[],
topK: number
): Promise<SearchResult[]> {
let query = supabase
.rpc('match_documents', {
query_embedding: queryVector,
match_count: topK
});
// Apply filters
filters.forEach(filter => {
switch (filter.operator) {
case 'eq':
query = query.eq(`metadata->>${filter.field}`, filter.value);
break;
case 'contains':
query = query.contains('metadata', { [filter.field]: filter.value });
break;
case 'in':
query = query.in(`metadata->>${filter.field}`, filter.value);
break;
}
});
const { data, error } = await query;
return data || [];
}
// Usage example
const results = await searchWithFilter(
queryVector,
[
{ field: 'category', operator: 'eq', value: 'typescript' },
{ field: 'date', operator: 'gt', value: '2024-01-01' }
],
10
);
4. Improving Search Quality with Reranking
4.1 Why Reranking is Needed
Reorder initial search results with a more sophisticated model. Reranking significantly improves search quality.
// Comparison before and after reranking
const query = "TypeScript generic type inference";
// Initial search results (semantic search)
const initialResults = [
{ title: "TypeScript basic types", score: 0.85 },
{ title: "Generic programming concepts", score: 0.83 },
{ title: "Advanced TypeScript generic type inference", score: 0.81 }, // Most relevant
{ title: "Type system comparison", score: 0.80 }
];
// Results after reranking
const rerankedResults = [
{ title: "Advanced TypeScript generic type inference", score: 0.95 }, // Moved to #1
{ title: "Generic programming concepts", score: 0.78 },
{ title: "TypeScript basic types", score: 0.65 },
{ title: "Type system comparison", score: 0.45 }
];
4.2 Cohere Rerank Implementation
Use the Cohere Rerank API for reranking:
// src/rag/rerankers/cohere-reranker.ts
import { CohereClient } from 'cohere-ai';
export class CohereReranker {
private client: CohereClient;
constructor(apiKey: string) {
this.client = new CohereClient({ token: apiKey });
}
async rerank(
query: string,
documents: SearchResult[],
topK: number = 5
): Promise<SearchResult[]> {
if (documents.length === 0) return [];
const response = await this.client.rerank({
model: 'rerank-multilingual-v3.0',
query,
documents: documents.map(d => d.document.content),
topN: topK,
returnDocuments: false
});
return response.results.map(result => ({
document: documents[result.index].document,
score: result.relevanceScore,
method: 'reranked' as const
}));
}
}
4.3 Cross-Encoder Reranking
Reranking using a Cross-Encoder model that runs locally:
// src/rag/rerankers/cross-encoder-reranker.ts
import { pipeline } from '@xenova/transformers';
export class CrossEncoderReranker {
private model: any;
private modelName = 'cross-encoder/ms-marco-MiniLM-L-6-v2';
async initialize(): Promise<void> {
this.model = await pipeline(
'text-classification',
this.modelName
);
}
async rerank(
query: string,
documents: SearchResult[],
topK: number = 5
): Promise<SearchResult[]> {
// Create query-document pairs
const pairs = documents.map(doc => ({
text: query,
text_pair: doc.document.content.slice(0, 512) // Token limit
}));
// Calculate relevance scores
const scores = await Promise.all(
pairs.map(async pair => {
const result = await this.model(pair.text, { text_pair: pair.text_pair });
return result[0].score;
})
);
// Sort by score
return documents
.map((doc, idx) => ({
...doc,
score: scores[idx],
method: 'cross-encoder' as const
}))
.sort((a, b) => b.score - a.score)
.slice(0, topK);
}
}
4.4 Integrating Reranking Pipeline
Integrate reranking into the complete search pipeline:
// src/rag/retrievers/reranking-pipeline.ts
export class RerankingPipeline {
constructor(
private retriever: HybridRetriever,
private reranker: CohereReranker,
private config: {
initialTopK: number;
finalTopK: number;
minScoreThreshold: number;
}
) {}
async search(query: string): Promise<SearchResult[]> {
// Step 1: Extract candidates with hybrid search
const candidates = await this.retriever.search(query);
console.log(`[Search] ${candidates.length} candidate documents retrieved`);
// Step 2: Reorder with reranking
const reranked = await this.reranker.rerank(
query,
candidates,
this.config.finalTopK * 2
);
console.log(`[Reranking] Top ${reranked.length} documents reordered`);
// Step 3: Threshold filtering
const filtered = reranked.filter(
r => r.score >= this.config.minScoreThreshold
);
return filtered.slice(0, this.config.finalTopK);
}
}
// Usage example
const pipeline = new RerankingPipeline(
hybridRetriever,
cohereReranker,
{
initialTopK: 20, // Initial search: 20
finalTopK: 5, // Final results: 5
minScoreThreshold: 0.5
}
);
5. Evaluating Search Performance
5.1 Evaluation Metrics
Metrics for measuring search optimization results:
// src/rag/evaluation/metrics.ts
export interface EvaluationMetrics {
// Precision@K: Ratio of relevant documents in top K
precisionAtK: number;
// Recall@K: Ratio of relevant documents included in top K
recallAtK: number;
// MRR: Mean reciprocal rank of first relevant document
mrr: number;
// NDCG: Relevance score considering ranking
ndcg: number;
}
export function calculateMetrics(
results: SearchResult[],
relevantDocIds: Set<string>,
k: number
): EvaluationMetrics {
const topK = results.slice(0, k);
// Precision@K
const relevantInTopK = topK.filter(r =>
relevantDocIds.has(r.document.id)
).length;
const precisionAtK = relevantInTopK / k;
// Recall@K
const recallAtK = relevantInTopK / relevantDocIds.size;
// MRR
const firstRelevantRank = results.findIndex(r =>
relevantDocIds.has(r.document.id)
);
const mrr = firstRelevantRank >= 0 ? 1 / (firstRelevantRank + 1) : 0;
// NDCG calculation
const ndcg = calculateNDCG(results, relevantDocIds, k);
return { precisionAtK, recallAtK, mrr, ndcg };
}
5.2 A/B Testing
Compare the effectiveness of hybrid search and reranking:
// src/rag/evaluation/ab-test.ts
export async function runABTest(
queries: TestQuery[],
retrievers: {
semantic: SemanticRetriever;
hybrid: HybridRetriever;
reranking: RerankingPipeline;
}
): Promise<ABTestResults> {
const results = {
semantic: { precisionSum: 0, mrrSum: 0 },
hybrid: { precisionSum: 0, mrrSum: 0 },
reranking: { precisionSum: 0, mrrSum: 0 }
};
for (const { query, relevantDocs } of queries) {
// Search with each method
const semanticResults = await retrievers.semantic.search(query, 5);
const hybridResults = await retrievers.hybrid.search(query);
const rerankingResults = await retrievers.reranking.search(query);
// Calculate metrics
const relevantSet = new Set(relevantDocs);
const semanticMetrics = calculateMetrics(semanticResults, relevantSet, 5);
const hybridMetrics = calculateMetrics(hybridResults, relevantSet, 5);
const rerankingMetrics = calculateMetrics(rerankingResults, relevantSet, 5);
results.semantic.precisionSum += semanticMetrics.precisionAtK;
results.semantic.mrrSum += semanticMetrics.mrr;
results.hybrid.precisionSum += hybridMetrics.precisionAtK;
results.hybrid.mrrSum += hybridMetrics.mrr;
results.reranking.precisionSum += rerankingMetrics.precisionAtK;
results.reranking.mrrSum += rerankingMetrics.mrr;
}
const n = queries.length;
return {
semantic: {
avgPrecision: results.semantic.precisionSum / n,
avgMRR: results.semantic.mrrSum / n
},
hybrid: {
avgPrecision: results.hybrid.precisionSum / n,
avgMRR: results.hybrid.mrrSum / n
},
reranking: {
avgPrecision: results.reranking.precisionSum / n,
avgMRR: results.reranking.mrrSum / n
}
};
}
5.3 Performance Comparison Results
Example of actual test results:
| Method | Precision@5 | MRR | Response Time |
|---|---|---|---|
| Semantic Search | 0.65 | 0.72 | 120ms |
| Hybrid Search | 0.78 | 0.85 | 180ms |
| Hybrid + Reranking | 0.89 | 0.94 | 350ms |
Combining hybrid search with reranking significantly improves search quality.
6. Practical Application Tips
🛠️ How I plan to apply this
Our team has accumulated issue histories, cautions, and other information that needs to be remembered scattered across various places—Slack, Notion, Confluence, even personal notes. I plan to build a RAG system for this information to create a tool that quickly finds relevant documents when someone asks, “I think I’ve seen this error before.” Hybrid search was essential because it needs to handle both error codes (exact matching) and error situation descriptions (semantic matching) simultaneously.
6.1 Search Method Selection Guide
// Select search method based on context
function selectRetriever(context: QueryContext): Retriever {
// Exact term search (code names, API names, etc.)
if (context.hasExactTerms) {
return bm25Retriever;
}
// Conceptual questions
if (context.isConceptual) {
return semanticRetriever;
}
// Complex questions - hybrid + reranking
return rerankingPipeline;
}
6.2 Cost Optimization
Optimization considering reranking API costs:
// Conditional reranking
async function smartRerank(
query: string,
results: SearchResult[]
): Promise<SearchResult[]> {
// Skip reranking if top result score is high enough
if (results[0]?.score > 0.9 && results[1]?.score < 0.7) {
console.log('[Optimization] Clear result, skipping reranking');
return results;
}
// Perform reranking if top results have similar scores
const topScoreGap = results[0]?.score - results[4]?.score;
if (topScoreGap < 0.1) {
console.log('[Optimization] Small score difference, performing reranking');
return await reranker.rerank(query, results, 5);
}
return results;
}
7. Complete Code Integration
7.1 Final Search System
// src/rag/search-system.ts
export class RAGSearchSystem {
private semanticRetriever: SemanticRetriever;
private bm25Retriever: BM25Retriever;
private hybridRetriever: HybridRetriever;
private reranker: CohereReranker;
constructor(config: SearchSystemConfig) {
this.semanticRetriever = new SemanticRetriever(
config.vectorStore,
config.embedder
);
this.bm25Retriever = new BM25Retriever();
this.hybridRetriever = new HybridRetriever(
this.semanticRetriever,
this.bm25Retriever,
{
semanticWeight: 0.7,
bm25Weight: 0.3,
topK: 20,
fusionMethod: 'rrf'
}
);
this.reranker = new CohereReranker(config.cohereApiKey);
}
async search(
query: string,
options: SearchOptions = {}
): Promise<SearchResult[]> {
const {
topK = 5,
useReranking = true,
filters = []
} = options;
// Hybrid search
let results = await this.hybridRetriever.search(query);
// Apply metadata filters
if (filters.length > 0) {
results = this.applyFilters(results, filters);
}
// Reranking
if (useReranking && results.length > topK) {
results = await this.reranker.rerank(query, results, topK);
}
return results.slice(0, topK);
}
private applyFilters(
results: SearchResult[],
filters: MetadataFilter[]
): SearchResult[] {
return results.filter(result =>
filters.every(filter =>
this.matchFilter(result.document.metadata, filter)
)
);
}
private matchFilter(metadata: any, filter: MetadataFilter): boolean {
const value = metadata[filter.field];
switch (filter.operator) {
case 'eq': return value === filter.value;
case 'ne': return value !== filter.value;
case 'contains': return value?.includes(filter.value);
case 'in': return filter.value.includes(value);
default: return true;
}
}
}
Conclusion
In Day 4, we covered search optimization for RAG systems:
- Limitations of semantic search and the need for BM25 keyword search
- Combining the strengths of both approaches with hybrid search
- Improving search result quality with reranking
- Search parameter tuning and performance evaluation
In Day 5, we’ll explore how to pass retrieved documents to Claude to generate answers.
📚 Series Index
RAG (6/6)
- Day 1: RAG Concepts and Architecture
- Day 2: Document Processing and Chunking
- Day 3: Embeddings and Vector Database
- 👉 Day 4: Search Optimization and Reranking (Current)
- Day 5: Claude Integration and Answer Generation
- Day 6: Production Deployment and Optimization
Leave A Comment