Ragas Evaluator

What are Ragas Evaluators?

Ragas Evaluators are specialized tools designed to evaluate the performance of retrieval-augmented generation (RAG) workflows. They focus on metrics like context relevance, faithfulness, recall, and robustness, ensuring that outputs derived from external knowledge bases or retrieval systems are accurate and reliable.


Why use Ragas Evaluators?

If your system retrieves information from external sources, these evaluators are essential. They ensure that responses are factually consistent, include all necessary details, and stay focused on relevant context. For applications like customer support or document summarization, Ragas Evaluators help guarantee the integrity and quality of your AI’s outputs.


Ragas Evaluator Response

Ragas return a number response between 0 and 1. Depending on the measurement taken (relevance, faithfulness, etc.) the number returned will be closer to 1.

Example: When measuring pertinence of the response, a very pertinent answer will return a number closer to 1.


Example

Imagine a customer asks a chatbot, “What’s included in my insurance policy?” and the system retrieves chunks of information from a knowledge base. A Ragas Evaluator can verify if the retrieved chunks focus on the user’s question (e.g., home insurance details) and exclude irrelevant details (e.g., unrelated auto insurance policies). This ensures the response is accurate and useful.


Ragas Evaluators can be found in the Hub where we have many already available Evaluators ready to be used.


List of Ragas evaluators

Here-after you can find the list of Evaluators ready to be added to your project within the Hub

NameDescriptionExample
Ragas CoherenceChecks if the generated response presents ideas, information, or arguments in a logical and organized manner.An email that explains a problem, then the solution, then next steps - versus one that jumps randomly between topics without clear connection.
Ragas ConcisenessChecks if the generated response conveys information or ideas clearly and efficiently, without unnecessary or redundant details."The meeting is at 2 PM" vs "The meeting, which we scheduled earlier, is at 2 PM in the afternoon today."
Ragas Context Entities RecallMeasures the recall of entities mentioned in the ground truth that are also present in the retrieved contexts. Evaluates how comprehensively the retrieval system captures relevant entities.Ground truth mentions "John, Sarah, Mike," but retrieved documents only mention "John, Sarah" - 67% recall.
Ragas Context PrecisionMeasures the proportion of relevant chunks in the retrieved contexts.When searching for "project deadlines", 7 out of 10 retrieved documents actually discuss deadlines - 70% precision.
Ragas Context RecallMeasures the fraction of relevant information from the reference answer that can be found in the retrieved context. Uses LLM-based evaluation when user input and reference are provided, otherwise...Reference answer has 4 key facts, but the retrieved context only contains 3 of them - 75% recall.
Ragas CorrectnessChecks the accuracy of the generated LLM response when compared to the ground truth.Generated: "The deadline is Friday" vs. Ground truth: "The deadline is Monday" - low correctness.
Ragas FaithfulnessMeasures the factual consistency of the generated answer against the given context.Context says "budget increased 10%," but the answer states "budget doubled" - low faithfulness.
Ragas HarmfulnessChecks the potential of the generated response to cause harm to individuals, groups, or society at large.A response containing discriminatory language or dangerous instructions would score high on harmfulness.
Ragas MaliciousnessChecks the potential of the generated response to harm, deceive, or exploit users.A response that tries to trick someone into sharing passwords or personal information.
Ragas Noise SensitivityMeasures how sensitive the model response is to irrelevant information in the retrieved context. Evaluates if the model can maintain accuracy despite noisy or unrelated context.Correctly answering "What time is the meeting?" even when documents also contain unrelated information.
Ragas Response RelevancyFocuses on assessing how pertinent the generated answer is to the given prompt.Question: "How do I reset my password?" Relevant answer gives reset steps vs. irrelevant answer about email settings.
Ragas SummarizationGives a measure of how well the summary (response) captures the important information from the retrieved contexts.Summarizing a 20-page report by including all main points vs missing key conclusions or adding irrelevant details.