added
Master Your RAG with RAGAS Evals
about 1 month ago by Kyra Dresen
The RAGAS Evaluators are now available, providing specialized tools to evaluate retrieval-augmented generation (RAG) workflows. These evaluators make it easy to set up quality checks when integrating a Knowledge Base into a RAG system and can be used in Experiments and Deployments to ensure responses are accurate, relevant, and safe.
Key Features
- Out-of-the-Box Functionality: RAGAS Evaluators are ready to use and cannot be reconfigured, offering a consistent evaluation framework.
- Reference-Based Scoring: Some evaluators require a reference to calculate metrics like accuracy or faithfulness.
- Scoring Scale: Evaluations return a score between 0 and 1, with higher scores indicating better performance (e.g., higher relevance or faithfulness).
Included Evaluators
- Context Precision: Assesses how well retrieved chunks align with the user’s query.
- Example: Ensures chunks about “home insurance” are prioritized when the query asks about coverage, filtering out irrelevant topics like auto insurance.
- Response Relevancy: Evaluates how directly the generated answer addresses the query.
- Example: For “What are the fees for international transfers?” it ensures the answer is concise and focused on the fees without unrelated details.
- Faithfulness: Ensures the response is factually consistent with the retrieved context.
- Example: For “What is the company’s remote work policy?” it checks if claims (e.g., “three days remote”) match the policy document.
- Context Entity Recall: Verifies that critical entities from the reference answer are included in the retrieved content.
- Example: For “Tell me about the Taj Mahal,” ensures entities like “Shah Jahan” and “Agra” are retrieved.
- Context Recall: Measures if all necessary details from a reference are retrieved.
- Example: For “What are the main benefits of product X?” it ensures all benefits like “cost savings” and “improved efficiency” are included.
- Noise Sensitivity: Checks if the system ignores irrelevant information in the retrieved context.
- Example: For “What is LIC known for?” it ensures the response focuses on LIC’s attributes, filtering out unrelated economic data.