Ragas Evaluator
What are Ragas Evaluators?
Ragas Evaluators are specialized tools designed to evaluate the performance of retrieval-augmented generation (RAG) workflows. They focus on metrics like context relevance, faithfulness, recall, and robustness, ensuring that outputs derived from external knowledge bases or retrieval systems are accurate and reliable.
Why use Ragas Evaluators?
If your system retrieves information from external sources, these evaluators are essential. They ensure that responses are factually consistent, include all necessary details, and stay focused on relevant context. For applications like customer support or document summarization, Ragas Evaluators help guarantee the integrity and quality of your AI’s outputs.
Ragas Evaluator Response
Ragas return a number response between 0 and 1. Depending on the measurement taken (relevance, faithfulness, etc.) the number returned will be closer to 1.
Example: When measuring pertinence of the response, a very pertinent answer will return a number closer to 1.
Example
Imagine a customer asks a chatbot, “What’s included in my insurance policy?” and the system retrieves chunks of information from a knowledge base. A Ragas Evaluator can verify if the retrieved chunks focus on the user’s question (e.g., home insurance details) and exclude irrelevant details (e.g., unrelated auto insurance policies). This ensures the response is accurate and useful.
Ragas Evaluators can be found in the Hub where we have many already available Evaluators ready to be used.
List of Ragas evaluators
Here-after you can find the list of Evaluators ready to be added to your project within the Hub
Name | Description | Example |
---|---|---|
Ragas Coherence | Checks if the generated response presents ideas, information, or arguments in a logical and organized manner. | An email that explains a problem, then the solution, then next steps - versus one that jumps randomly between topics without clear connection. |
Ragas Conciseness | Checks if the generated response conveys information or ideas clearly and efficiently, without unnecessary or redundant details. | "The meeting is at 2 PM" vs "The meeting, which we scheduled earlier, is at 2 PM in the afternoon today." |
Ragas Context Entities Recall | Measures the recall of entities mentioned in the ground truth that are also present in the retrieved contexts. Evaluates how comprehensively the retrieval system captures relevant entities. | Ground truth mentions "John, Sarah, Mike," but retrieved documents only mention "John, Sarah" - 67% recall. |
Ragas Context Precision | Measures the proportion of relevant chunks in the retrieved contexts. | When searching for "project deadlines", 7 out of 10 retrieved documents actually discuss deadlines - 70% precision. |
Ragas Context Recall | Measures the fraction of relevant information from the reference answer that can be found in the retrieved context. Uses LLM-based evaluation when user input and reference are provided, otherwise... | Reference answer has 4 key facts, but the retrieved context only contains 3 of them - 75% recall. |
Ragas Correctness | Checks the accuracy of the generated LLM response when compared to the ground truth. | Generated: "The deadline is Friday" vs. Ground truth: "The deadline is Monday" - low correctness. |
Ragas Faithfulness | Measures the factual consistency of the generated answer against the given context. | Context says "budget increased 10%," but the answer states "budget doubled" - low faithfulness. |
Ragas Harmfulness | Checks the potential of the generated response to cause harm to individuals, groups, or society at large. | A response containing discriminatory language or dangerous instructions would score high on harmfulness. |
Ragas Maliciousness | Checks the potential of the generated response to harm, deceive, or exploit users. | A response that tries to trick someone into sharing passwords or personal information. |
Ragas Noise Sensitivity | Measures how sensitive the model response is to irrelevant information in the retrieved context. Evaluates if the model can maintain accuracy despite noisy or unrelated context. | Correctly answering "What time is the meeting?" even when documents also contain unrelated information. |
Ragas Response Relevancy | Focuses on assessing how pertinent the generated answer is to the given prompt. | Question: "How do I reset my password?" Relevant answer gives reset steps vs. irrelevant answer about email settings. |
Ragas Summarization | Gives a measure of how well the summary (response) captures the important information from the retrieved contexts. | Summarizing a 20-page report by including all main points vs missing key conclusions or adding irrelevant details. |
Updated 3 days ago