Master Your RAG with RAGAS Evals

The Ragas Evaluators are now available, providing specialized tools to evaluate retrieval-augmented generation (RAG) workflows. These evaluators make it easy to set up quality checks when integrating a Knowledge Base into a RAG system and can be used in Experiments and Deployment to ensure responses are accurate, relevant, and safe.

Key Features

Out-of-the-Box Functionality: RAGAS Evaluators are ready to use and cannot be reconfigured, offering a consistent evaluation framework.
Reference-Based Scoring: Some evaluators require a reference to calculate metrics like accuracy or faithfulness.
Scoring Scale: Evaluations return a score between 0 and 1, with higher scores indicating better performance (e.g., higher relevance or faithfulness).

Included Evaluators

Context Precision: Assesses how well retrieved chunks align with the user’s query.
- Example: Ensures chunks about “home insurance” are prioritized when the query asks about coverage, filtering out irrelevant topics like auto insurance.
Response Relevancy: Evaluates how directly the generated answer addresses the query.
- Example: For “What are the fees for international transfers?” it ensures the answer is concise and focused on the fees without unrelated details.
Faithfulness: Ensures the response is factually consistent with the retrieved context.
- Example: For “What is the company’s remote work policy?” it checks if claims (e.g., “three days remote”) match the policy document.
Context Entity Recall: Verifies that critical entities from the reference answer are included in the retrieved content.
- Example: For “Tell me about the Taj Mahal,” ensures entities like “Shah Jahan” and “Agra” are retrieved.
Context Recall: Measures if all necessary details from a reference are retrieved.
- Example: For “What are the main benefits of product X?” it ensures all benefits like “cost savings” and “improved efficiency” are included.
Noise Sensitivity: Checks if the system ignores irrelevant information in the retrieved context.
- Example: For “What is LIC known for?” it ensures the response focuses on LIC’s attributes, filtering out unrelated economic data.