← Back
RAG · k=5
Production · KUB

RAG Knowledge System

Sole architect & developer · 2025–2026

5 min → 30 sec document lookup

Problem

KUB's 17-person analytics team depended on 20+ internal documents to answer day-to-day questions. When the person who owned a document wasn't around, finding an answer could take anywhere from five minutes to over an hour. Using ChatGPT or any external API was completely off the table too. Sending internal operational data outside the network wasn't something KUB's data governance policy would allow.


What I Built

User query
  → Multi-query expansion (3 LLM-rephrased variants)
  → Hybrid retrieval: BM25 (keyword) + ChromaDB (dense vector)
  → Candidate pool: up to 60 pairs, content-hash deduped
  → Cross-encoder reranking → top-5 results
  → HAL (internal LLM via Ollama) generates answer
  → Streamed to user · feedback logged via LangSmith

Results

Document lookups that used to take five minutes now finish in about 30 seconds. The old process of tracking down whoever owned a document just doesn't happen anymore. Seventeen analysts use it daily across 20+ operational documents, all running without a single external API call.


Stack

LangChainChromaDBOllamaBM25Cross-encoderRAGASLangSmithDockerStreamlitPython
GitHub

Next Project

DeepAR Gas Forecasting