Ragas Evaluator
What are Ragas Evaluators?
Ragas Evaluators are specialized tools designed to evaluate the performance of retrieval-augmented generation (RAG) workflows. They focus on metrics like context relevance, faithfulness, recall, and robustness, ensuring that outputs derived from external knowledge bases or retrieval systems are accurate and reliable.
Why use Ragas Evaluators?
If your system retrieves information from external sources, these evaluators are essential. They ensure that responses are factually consistent, include all necessary details, and stay focused on relevant context. For applications like customer support or document summarization, Ragas Evaluators help guarantee the integrity and quality of your AI’s outputs.
Entities and Parameters
Before diving into specific evaluators, it's essential to understand the key entities involved in RAG evaluation:
Entity | API Parameter | Description | Example |
---|---|---|---|
User Query | query | The original question or request from the user | "What are the benefits of our premium insurance plan?" |
Knowledge Base Retrievals | retrievals | Array of document chunks retrieved from your knowledge base | ["Premium plan includes 24/7 support...", "Coverage extends to international travel..."] |
Generated Response | output | The AI's answer based on the retrieved context | "Our premium plan offers comprehensive coverage including..." |
Reference Answer | reference | A high-quality answer to compare against (may be ground truth or expert-written) | Human-written ideal response for the query |
Model | model | The AI model used for evaluation | "openai/gpt-4o" |
Ragas Evaluator Response
Ragas return a number response between 0 and 1. Depending on the measurement taken (relevance, faithfulness, etc.) the number returned will be closer to 1.
Example: When measuring pertinence of the response, a very pertinent answer will return a number closer to 1.
Example
Imagine a customer asks a chatbot, “What’s included in my insurance policy?” and the system retrieves chunks of information from a knowledge base. A Ragas Evaluator can verify if the retrieved chunks focus on the user’s question (e.g., home insurance details) and exclude irrelevant details (e.g., unrelated auto insurance policies). This ensures the response is accurate and useful.
Ragas Evaluators can be found in the Hub where we have many already available Evaluators ready to be used.
List of Ragas evaluators
Here-after you can find the complete list of evaluators with their parameter requirements and detailed descriptions:
Name | Required Parameters | Optional Parameters | Description | Example |
---|---|---|---|---|
Ragas Coherence | query , output , model | reference | Checks if the generated response presents ideas in a logical, organized manner. | ✅ Good: "First, log into your account. Then, navigate to settings. Finally, click 'Change Password'." ❌ Poor: "Click settings. Your account has security features. Navigate first to login. Change password option exists." |
Ragas Conciseness | query , output , model | reference | Evaluates if the response conveys information clearly and efficiently, without unnecessary details. | ✅ Concise: "The meeting is at 2 PM." ❌ Verbose: "The meeting, which we scheduled earlier, is at 2 PM in the afternoon today." |
Ragas Context Entities Recall | query , output , model | reference , retrievals | Measures how well your retrieval system captures important entities (people, places, things) mentioned in the ideal answer. | Ground truth mentions "John Smith, Sarah Jones, New York office" but retrieved documents only mention "John Smith, Sarah Jones" = 67% recall. |
Ragas Context Precision | query , output , model | reference , retrievals | Measures what proportion of retrieved documents are actually relevant to the user's question. | User asks about "project deadlines" and 7 out of 10 retrieved documents discuss deadlines = 70% precision. |
Ragas Context Recall | model , reference | query , output , retrievals | Measures if the retrieved documents contain all the information needed to answer the question properly. | Ideal answer has 4 key facts, but retrieved context only contains 3 of them = 75% recall. |
Ragas Correctness | query , output , model | reference | Directly compares the AI's answer against the known correct answer for factual accuracy. | Generated: "The deadline is Friday" vs. Ground truth: "The deadline is Monday" = low correctness. |
Ragas Faithfulness | query , output , model | retrievals | Ensures the AI's answer is factually consistent with the source documents it was given. | Context: "Budget increased 10%" but Answer: "Budget doubled" = low faithfulness. |
Ragas Harmfulness | query , output , model | retrievals | Detects if the response could potentially cause harm to individuals, groups, or society. | A response containing discriminatory language or dangerous instructions would score high on harmfulness. |
Ragas Maliciousness | query , output , model | retrievals | Identifies responses that might be trying to deceive, manipulate, or exploit users. | A response trying to trick someone into sharing passwords or personal information. |
Ragas Noise Sensitivity | query , output , model | retrievals | Tests if the AI can maintain accuracy even when retrieved documents contain irrelevant information. | Correctly answering "What time is the meeting?" even when documents also contain unrelated budget information. |
Ragas Response Relevancy | query , output , model | retrievals | Assesses how well the AI's answer addresses the specific question asked. | Question: "How do I reset my password?" Relevant answer gives reset steps vs. irrelevant answer about email settings. |
Ragas Summarization | query , output , model | reference , retrievals | Evaluates how well a summary captures the important information from the source documents. | Summarizing a 20-page report by including all main points vs. missing key conclusions or adding irrelevant details. |
Updated about 1 month ago