RAG Pipeline¶
Planned Architecture (Future Phases)
The RAG pipeline is implemented in Phase 3 (Weeks 8–11).
Overview¶
Indexing¶
Document Chunking¶
# Chunking configuration
CHUNK_SIZE = 512
CHUNK_OVERLAP = 64
METADATA_FIELDS = ["company_id", "filing_id", "period", "section"]
Embedding Model¶
Indexing Pipeline¶
Retrieval¶
Dense Retrieval (pgvector)¶
SELECT content, 1 - (embedding <=> $1::vector) AS score
FROM document_chunks
ORDER BY embedding <=> $1::vector
LIMIT 10;
Sparse Retrieval (Elasticsearch)¶
Planned Architecture (Future Phases)
Elasticsearch hybrid retrieval is implemented in Phase 3.