RAG System Evaluation

This report details the methodology and results of evaluating the DJ Rag system against acceptance criteria.

Methodology

We evaluated the system using a "Gold Set" of 5 diverse queries designed to test different capabilities:

  • Fact RetrievalExtracting specific facts from documents.
  • SummarizationSynthesizing information across sections.
  • Context AwarenessUnderstanding implied context.
  • Citation AccuracyVerifying source mapping.

Metrics Definition

Precision

Relevance of retrieved chunks.

Recall

Completeness of the answer.

Citation Quality

Accuracy of source links.

Success Rate

% of queries passing all checks.

Gold Set Evaluation

IDQueryExpected Key InfoActual ResultStatus
1"What are the exam rules?"Mandatory feedback, 75% attendance.Pending Auto-EvalPending
2"Explain the architecture."Next.js, FastAPI, Pinecone.Pending Auto-EvalPending
3"How to deploy?"Vercel, Render/Railway.Pending Auto-EvalPending
4"Summarize the document."Key points summary.Pending Auto-EvalPending
5"What are the limitations?"Rate limits, file size.Pending Auto-EvalPending

Evaluation Note

The "Actual Result" column is populated automatically when you upload a document in the chat interface. The system generates 5 Q/A pairs to demonstrate its understanding of the specific content you uploaded.