| Ragas Coherence | query,output,model | reference | Checks if the generated response presents ideas in a logical, organized manner. | ✅ Good: “First, log into your account. Then, navigate to settings. Finally, click ‘Change Password’.” ❌ Poor: “Click settings. Your account has security features. Navigate first to login. Change password option exists.” | 
| Ragas Conciseness | query,output,model | reference | Evaluates if the response conveys information clearly and efficiently, without unnecessary details. | ✅ Concise: “The meeting is at 2 PM.” ❌ Verbose: “The meeting, which we scheduled earlier, is at 2 PM in the afternoon today.” | 
| Ragas Context Entities Recall | query,output,model | reference,retrievals | Measures how well your retrieval system captures important entities (people, places, things) mentioned in the ideal answer. | Ground truth mentions “John Smith, Sarah Jones, New York office” but retrieved documents only mention “John Smith, Sarah Jones” = 67% recall. | 
| Ragas Context Precision | query,output,model | reference,retrievals | Measures what proportion of retrieved documents are actually relevant to the user’s question. | User asks about “project deadlines” and 7 out of 10 retrieved documents discuss deadlines = 70% precision. | 
| Ragas Context Recall | model,reference | query,output,retrievals | Measures if the retrieved documents contain all the information needed to answer the question properly. | Ideal answer has 4 key facts, but retrieved context only contains 3 of them = 75% recall. | 
| Ragas Correctness | query,output,model | reference | Directly compares the AI’s answer against the known correct answer for factual accuracy. | Generated: “The deadline is Friday” vs. Ground truth: “The deadline is Monday” = low correctness. | 
| Ragas Faithfulness | query,output,model | retrievals | Ensures the AI’s answer is factually consistent with the source documents it was given. | Context: “Budget increased 10%” but Answer: “Budget doubled” = low faithfulness. | 
| Ragas Harmfulness | query,output,model | retrievals | Detects if the response could potentially cause harm to individuals, groups, or society. | A response containing discriminatory language or dangerous instructions would score high on harmfulness. | 
| Ragas Maliciousness | query,output,model | retrievals | Identifies responses that might be trying to deceive, manipulate, or exploit users. | A response trying to trick someone into sharing passwords or personal information. | 
| Ragas Noise Sensitivity | query,output,model | retrievals | Tests if the AI can maintain accuracy even when retrieved documents contain irrelevant information. | Correctly answering “What time is the meeting?” even when documents also contain unrelated budget information. | 
| Ragas Response Relevancy | query,output,model | retrievals | Assesses how well the AI’s answer addresses the specific question asked. | Question: “How do I reset my password?” Relevant answer gives reset steps vs. irrelevant answer about email settings. | 
| Ragas Summarization | query,output,model | reference,retrievals | Evaluates how well a summary captures the important information from the source documents. | Summarizing a 20-page report by including all main points vs. missing key conclusions or adding irrelevant details. |