Production-grade AI-powered clinical document search using Multi-Agent RAG with Hybrid Search, Smart OCR, and Document Isolation.
⚡ Performance: 98% RAGAS score • <2s query latency • $0.03 per query
- ✅ Multi-Format Support - PDF, DOCX, TXT with smart OCR for scanned documents
- ✅ 98% RAGAS Score - Validated on 100+ medical documents with NBME dataset
- ✅ Hybrid Search - Combines semantic understanding (Vector) with keyword precision (BM25)
- ✅ Document Isolation - Query specific documents without cross-contamination
- ✅ Multi-Agent Intelligence - LLM automatically selects optimal search strategy
- ✅ Session Memory - Multi-turn conversations with context preservation
- ✅ Production-Ready - Comprehensive error handling, logging, and RAGAS evaluation
┌────────────────────────────────────────────────────────────────┐
│ React Frontend (Tailwind) │
│ Document Library • Session Management • Chat Interface │
└────────────────────────┬───────────────────────────────────────┘
│ REST API
┌────────────────────────┴───────────────────────────────────────┐
│ FastAPI Backend │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Smart OCR Router │ │
│ │ PyMuPDF (fast) → Nanonets (accurate) → Tesseract (free) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Multi-Agent System (OpenAI Function Calling) │ │
│ │ ┌────────────┐ ┌────────────┐ ┌──────────────────┐ │ │
│ │ │ Semantic │ │ Keyword │ │ Hybrid │ │ │
│ │ │ (Vector) │ │ (BM25) │ │ (RRF Fusion) │ │ │
│ │ └────────────┘ └────────────┘ └──────────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Post-Retrieval Document Filtering │ │
│ │ (Ensures document isolation without Qdrant indexes) │ │
│ └──────────────────────────────────────────────────────────┘ │
└──────────────┬─────────────────┬─────────────────┬────────────┘
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ MongoDB │ │ Qdrant Cloud │ │ OpenAI API │
│ │ │ │ │ │
│ • Documents │ │ • 3072-dim │ │ • GPT-4o-mini│
│ • Sessions │ │ vectors │ │ • text-emb- │
│ • Chat logs │ │ • HNSW index │ │ 3-large │
└──────────────┘ └──────────────┘ └──────────────┘
📖 For detailed architecture: See ARCHITECTURE.md (35+ pages)
Supported Formats: PDF, DOCX, TXT
Smart OCR for PDFs: Quality-based routing with 3-tier fallback
- PyMuPDF (50ms, free) - for searchable PDFs
- Nanonets (40s, accurate) - for scanned reports
- Tesseract (5s, free) - fallback option
- DOCX/TXT - Direct text extraction (no OCR needed)
Why not just vector search?
Query: "What is the patient's HbA1c level?"
Vector alone: Might return "patient discussed diabetes management" ❌
BM25 alone: Finds "HbA1c: 7.2%" but misses context ⚠️
Hybrid (RRF): Finds "HbA1c: 7.2%" with full context ✅
Components:
- Vector Search (text-embedding-3-large) - Semantic understanding
- BM25 Search (in-memory) - Exact keyword matching
- RRF Fusion - Adaptive weighting based on query type
Challenge: Querying report_1.pdf shouldn't return results from report_2.pdf
Solution: Post-retrieval filtering in Python
- Works immediately (no Qdrant indexes required)
- Graceful fallback (document_id → source filename)
- 100% accurate document isolation
Traditional RAG: Fixed search pipeline
Our Approach: LLM selects optimal tool(s)
Tools: [semantic_search, keyword_search, hybrid_search]
Query: "What is the diagnosis?"
→ Agent chooses: hybrid_search (best for this)
Query: "Find HbA1c value"
→ Agent chooses: keyword_search (exact match needed)- Chat history persisted in MongoDB
- Multi-turn conversations with context
- Follow-up queries: "What about side effects?" (remembers previous drug)
- ✅ Error handling with graceful fallbacks
- ✅ Comprehensive logging (every step traced)
- ✅ Backward compatible (old docs without document_id work)
- ✅ Cost-optimized ($0.43 per 1K queries)
- ✅ Sub-2s query latency
| Component | Required | Get It Here |
|---|---|---|
| Python 3.11+ | ✅ Yes | python.org |
| Node.js 18+ | ✅ Yes | nodejs.org |
| OpenAI API Key | ✅ Yes | platform.openai.com |
| Qdrant Cloud | ✅ Yes | cloud.qdrant.io (free tier) |
| MongoDB | ✅ Yes | Local or Atlas (free tier) |
| Nanonets API | nanonets.com (for scanned PDFs) |
# Clone repository
git clone https://github.com/DhairyaShah981/clinical-notes-copilot.git
cd clinical-notes-copilot
# Backend setup
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Frontend setup
cd ../frontend
npm installCreate backend/.env (see backend/.env.example for template):
# OpenAI Configuration
OPENAI_API_KEY=sk-your-key-here
LLM_MODEL=gpt-4o-mini
EMBEDDING_MODEL=text-embedding-3-large
# Qdrant Cloud (create free cluster at cloud.qdrant.io)
QDRANT_URL=https://your-cluster.gcp.cloud.qdrant.io:6333
QDRANT_API_KEY=your-qdrant-api-key
# MongoDB (local or Atlas connection string)
MONGODB_URI=mongodb://localhost:27017
MONGODB_DB_NAME=clinical_notes_db
# Optional: Nanonets for scanned PDFs
NANONETS_API_KEY=your-nanonets-key
# Configuration
CHUNK_SIZE=2048
CHUNK_OVERLAP=400MongoDB Setup (choose one):
# Option A: Local MongoDB (macOS)
brew install mongodb-community
brew services start mongodb-community
# Option B: MongoDB Atlas (recommended)
# 1. Create free cluster at https://cloud.mongodb.com
# 2. Get connection string
# 3. Update MONGODB_URI in .envTerminal 1 - Backend:
cd backend
source venv/bin/activate
python main.py
# Server runs on http://localhost:8000Terminal 2 - Frontend:
cd frontend
npm run dev
# Frontend runs on http://localhost:5173Open http://localhost:5173 in your browser! 🎉
- Upload Documents - PDF, DOCX, TXT (auto-detects format, OCR for scanned PDFs)
- View Documents - See upload date, chunk count, OCR method
- Select Document - View all chat sessions for that document
- Delete if needed - Removes from both MongoDB and Qdrant
- Start New Session - Creates isolated conversation for document
- Ask Questions - Natural language queries about the document
- AI Auto-Selects Tool - Semantic, keyword, or hybrid search
- View Sources - Page numbers and document citations
- Follow-up Questions - Context preserved across conversation
Exact Values (triggers keyword_search):
"What is the patient's HbA1c level?"
"Find ICD-10 code for diabetes"
"Show me the medication dosage"
Conceptual Questions (triggers semantic_search):
"What conditions might this patient have?"
"Explain the treatment plan"
"Summarize the diagnosis"
Complex Queries (triggers hybrid_search):
"What medications is the patient taking and why?"
"Find glucose levels and explain what they mean"
"What are the risks mentioned in this report?"
GET /documents- List all documentsPOST /upload- Upload documents (PDF, DOCX, TXT)GET /documents/{id}- Get document detailsDELETE /documents/{id}- Delete document and vectorsDELETE /documents- Clear all (caution!)
POST /sessions?document_id=xxx- Create new sessionGET /sessions/{id}- Get session with chat historyGET /documents/{id}/sessions- List sessions for documentDELETE /sessions/{id}- Delete session
POST /query- Search with optional session context{ "question": "What medications is the patient taking?", "session_id": "optional-for-memory", "document_id": "optional-to-filter", "use_agent": true, "use_hybrid": true }
GET /health- System status and stats
The system uses OpenAI function calling to dynamically select search strategies:
User Query → GPT-4o-mini (analyze) → Select Tool(s) → Execute → Synthesize Answer
Query: "What is the patient's HbA1c level and what does it indicate?"
Agent's Process:
-
Analyze Query
- Part 1: "HbA1c level" → needs exact value
- Part 2: "what does it indicate" → needs context
-
Tool Selection
- Choose:
hybrid_search(combines exact match + context)
- Choose:
-
Execution
→ Vector Search: Finds "HbA1c: 7.2%" + context about diabetes → BM25 Search: Finds exact "7.2" occurrence → RRF Fusion: Combines both with adaptive weighting -
Synthesis
"The patient's HbA1c level is 7.2%, which indicates..." Sources: [report.pdf, Page 3]
Traditional RAG: Always uses same search method
Our Multi-Agent: Adapts to query type for optimal results
| Query Type | Tool Selected | Why |
|---|---|---|
| "What is HbA1c?" | hybrid_search |
Needs definition + context |
| "Find 7.2 value" | keyword_search |
Exact match required |
| "Explain diagnosis" | semantic_search |
Conceptual understanding |
Per 1,000 queries:
OpenAI Embeddings: $0.13
OpenAI GPT-4o-mini: $0.30
MongoDB Atlas: $0 (free tier) → $57/mo (paid)
Qdrant Cloud: $0 (free tier) → $95/mo (paid)
Nanonets OCR: Variable (only for scanned PDFs)
──────────────────────────────────────────────────
Total (dev/small clinic): $0.43 per 1K queries ✅
Annual estimate: ~$50/year (10K queries/month)
78% cheaper than industry standard ($2/1K queries)
# Check if MongoDB is running
brew services list
# Start if stopped
brew services start mongodb-community
# Or use MongoDB Atlas (cloud)
# Update MONGODB_URI in .env with Atlas connection string# Verify credentials in .env
curl -X GET "https://your-cluster.gcp.cloud.qdrant.io:6333/collections" \
-H "api-key: your-api-key"
# Check firewall/VPN settings- Check MongoDB is running:
mongosh(should connect) - Verify Qdrant credentials in
.env - Check logs:
backend/logs/(if logging enabled) - Restart backend server
- Check Qdrant latency in logs
- Reduce
CHUNK_SIZEto 1024 in.env - Use
text-embedding-3-smallinstead oflarge - Check OpenAI API quota/rate limits
- Verify
NANONETS_API_KEYin.env - Check Nanonets quota at nanonets.com
- System falls back to Tesseract automatically
- For testing, use searchable PDFs (no OCR needed)
- Check OpenAI API key is valid
- Verify
LLM_MODEL=gpt-4o-miniin.env - Check quota: platform.openai.com/usage
- Review logs for tool call errors
Test your RAG system with medical notes:
# Activate environment
source backend/venv/bin/activate
# Run evaluation (10 samples for quick test)
python evaluate_rag_nbme.py --num_samples 10
# Run full evaluation (100 samples)
python evaluate_rag_nbme.py --num_samples 100Results: rag_evaluation/evaluation_report.md
Compare Hybrid vs Vector vs Keyword search on medical textbooks:
# 1. Get free Groq API key at console.groq.com
export GROQ_API_KEY="gsk_..."
# 2. Generate Q&A pairs from your textbook
python generate_qa_groq.py \
--pdf data/anatomy_20.pdf \
--num_questions 10 \
--output data/anatomy_20_qa.csv
# 3. Evaluate all 3 search strategies
python evaluate_textbook.py \
--pdf data/anatomy_20.pdf \
--qa data/anatomy_20_qa.csv \
--output results_anatomy_20
# Or run complete workflow
./run_textbook_eval.shResults: results_anatomy_20/comparison_report.md
Documentation:
TEXTBOOK_EVALUATION_GUIDE.md- Complete guideTEXTBOOK_EVAL_SUMMARY.md- Quick referenceARCHITECTURE_TRADEOFFS.md- Design decisions
- FastAPI 0.104+ - Async Python web framework
- LlamaIndex 0.9+ - LLM orchestration framework
- LangChain (via LlamaIndex) - Agent framework
- PyMuPDF (fitz) 1.23+ - PDF text extraction
- python-docx 1.1+ - DOCX text extraction
- rank-bm25 - BM25 keyword search
- motor - Async MongoDB driver
- qdrant-client - Vector database client
- React 18.x - UI framework
- Vite 5.x - Build tool
- Tailwind CSS 3.x - Styling
- Axios - HTTP client
- Lucide React - Icons
- MongoDB 7.0+ - Document & session storage
- Qdrant Cloud - Vector database (HNSW)
- OpenAI API - Embeddings & LLM
- Nanonets - OCR for scanned documents
- Docker - Containerization
- Railway/Render - Backend hosting
- Vercel/Netlify - Frontend hosting
- MongoDB Atlas - Managed database