RAG System Evaluation
This report details the methodology and results of evaluating the DJ Rag system against acceptance criteria.
Methodology
We evaluated the system using a "Gold Set" of 5 diverse queries designed to test different capabilities:
- Fact RetrievalExtracting specific facts from documents.
- SummarizationSynthesizing information across sections.
- Context AwarenessUnderstanding implied context.
- Citation AccuracyVerifying source mapping.
Metrics Definition
Precision
Relevance of retrieved chunks.
Recall
Completeness of the answer.
Citation Quality
Accuracy of source links.
Success Rate
% of queries passing all checks.
Gold Set Evaluation
| ID | Query | Expected Key Info | Actual Result | Status |
|---|---|---|---|---|
| 1 | "What are the exam rules?" | Mandatory feedback, 75% attendance. | Pending Auto-Eval | Pending |
| 2 | "Explain the architecture." | Next.js, FastAPI, Pinecone. | Pending Auto-Eval | Pending |
| 3 | "How to deploy?" | Vercel, Render/Railway. | Pending Auto-Eval | Pending |
| 4 | "Summarize the document." | Key points summary. | Pending Auto-Eval | Pending |
| 5 | "What are the limitations?" | Rate limits, file size. | Pending Auto-Eval | Pending |
Evaluation Note
The "Actual Result" column is populated automatically when you upload a document in the chat interface. The system generates 5 Q/A pairs to demonstrate its understanding of the specific content you uploaded.