Using Evaluators via the API

The Evaluator Library is available to use via the API and SDKs, this page dives into the available calls.

Prerequisite

To get started, an API key is needed to use within SDKs or HTTP API.

📘
To get an API key ready, see Authentication.

SDK

Node.js

Get started with our Node.js SDK

Python

Get started with our Python SDK

Built-in Evaluators

All the built-in evaluators available within our Evaluator Library are directly callable through the API.

Name	Description	Path
BERT Score	BERT-based semantic similarity scoring	`/v2/evaluators/bert_score`
BLEU Score	BLEU score for text comparison (translation quality)	`/v2/evaluators/bleu_score`
Contains	Check if text contains specific content	`/v2/evaluators/contains`
Contains All	Check if text contains all specified items	`/v2/evaluators/contains_all`
Contains Any	Check if text contains any of specified items	`/v2/evaluators/contains_any`
Contains None	Check if text contains none of specified items	`/v2/evaluators/contains_none`
Contains Email	Validate presence of email addresses	`/v2/evaluators/contains_email`
Contains URL	Check for URL presence	`/v2/evaluators/contains_url`
Contains Valid Link	Validate working links	`/v2/evaluators/contains_valid_link`
Ends With	Check text ending patterns	`/v2/evaluators/ends_with`
Exact Match	Exact string matching	`/v2/evaluators/exact_match`
Valid JSON	JSON format validation	`/v2/evaluators/valid_json`
Length Between	Check text length within range	`/v2/evaluators/length_between`
Length Greater Than	Minimum length validation	`/v2/evaluators/length_greater_than`
Length Less Than	Maximum length validation	`/v2/evaluators/length_less_than`
Age Appropriate	Age-appropriate content checking	`/v2/evaluators/age_appropriate`
Bot Detection	Detect bot-generated content	`/v2/evaluators/bot_detection`
Grammar	Grammar quality assessment	`/v2/evaluators/grammar`
PII	Personal Identifiable Information detection	`/v2/evaluators/pii`
Sentiment Classification	Sentiment analysis	`/v2/evaluators/sentiment_classification`
Tone of Voice	Tone analysis and classification	`/v2/evaluators/tone_of_voice`
Fact Checking Knowledge Base	Fact verification against knowledge base	`/v2/evaluators/fact_checking_knowledge_base`
Localization	Localization quality assessment	`/v2/evaluators/localization`
Summarization	Summary quality evaluation	`/v2/evaluators/summarization`
Translation	Translation quality assessment	`/v2/evaluators/translation`
RAGAs Coherence	Text coherence measurement	`/v2/evaluators/ragas_coherence`
RAGAs Conciseness	Conciseness evaluation	`/v2/evaluators/ragas_conciseness`
RAGAs Context Precision	Context precision in RAG systems	`/v2/evaluators/ragas_context_precision`
RAGAs Correctness	Factual correctness assessment	`/v2/evaluators/ragas_correctness`
RAGAs Faithfulness	Faithfulness to source material	`/v2/evaluators/ragas_faithfulness`
RAGAs Harmfulness	Harmful content detection	`/v2/evaluators/ragas_harmfulness`
RAGAs Maliciousness	Malicious content detection	`/v2/evaluators/ragas_maliciousness`
RAGAs Response Relevancy	Response relevance scoring	`/v2/evaluators/ragas_response_relevancy`
RAGAs Summarization	RAG-specific summarization quality	`/v2/evaluators/ragas_summarization`

Example

We'll be calling the Tone of Voice endpoint:

Here is an example call:

The query defines the way the evaluator runs on the given output.

curl --request POST \
     --url https://api.orq.ai/v2/evaluators/tone_of_voice \
     --header 'accept: application/json' \
     --header 'authorization: Bearer ORQ_API_KEY' \
     --header 'content-type: application/json' \
     --data '
{
  "query": "Validate the tone of voice if it is professional.",
  "output": "Hello, how are you ??",
  "model": "openai/gpt-4o"
}
'

from orq_ai_sdk import Orq
import os


with Orq(
    api_key=os.getenv("ORQ_API_KEY", ""),
) as orq:

    res = orq.evals.tone_of_voice(request={
      "query": "Validate the tone of voice if it is professional.",
      "output": "Hello, how are you ??",
      "model": "openai/gpt-4o"
    })

    assert res is not None

    # Handle response
    print(res)

import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env["ORQ_API_KEY"] ?? "",
});

async function run() {
  const result = await orq.evals.toneOfVoice({
      query: "Validate the tone of voice if it is professional.",
      output: "Hello, how are you ??",
      model: "openai/gpt-4o", 
  });

  console.log(result);
}

run();

Here is the result returned by the API

The value here holds result of the evaluator call following the query

{
  "value": {
    "value": false,
    "explanation": "The output does not align with a professional tone. The use of 'Hello, how are you ??' is informal and lacks the formality expected in professional communication. The double question marks and casual greeting are more suited to a casual or friendly context rather than a professional one. A professional tone would require a more formal greeting and a clear purpose for the communication."
  }
}

Calling a custom evaluator

It is also possible to call a custom-made Evaluator made on orq using the Invoke a Custom Evaluator API call.

You can fetch the Evaluator ID to send to this call by searching for Evaluators using the Get all Evaluators API.

Finally, through the Orq studio, find the View Code button on your Evaluator page. the following modal opens:

Evaluator code is available in cURL, **Node.js**, and **Python**

This code is used to call the current Evaluator through the API. Ensure the payload is containing all necessary data for the Evaluator to execute correctly.

Using EvaluatorQ

EvaluatorQ is a dedicated SDK for using Evaluators within your application.

It features the following capabilities:

Parallel Execution: Run multiple evaluation jobs concurrently with progress tracking
Flexible Data Sources: Support for inline data, promises, and Orq platform datasets
Type-safe: Fully written in TypeScript

Installation:

npm install @orq-ai/evaluatorq

Usage example:

import { evaluatorq, job } from "@orq-ai/evaluatorq";

const textAnalyzer = job("text-analyzer", async (data) => {
  const text = data.inputs.text;
  const analysis = {
    length: text.length,
    wordCount: text.split(" ").length,
    uppercase: text.toUpperCase(),
  };

  return analysis;
});

await evaluatorq("text-analysis", {
  data: [
    { inputs: { text: "Hello world" } },
    { inputs: { text: "Testing evaluation" } },
  ],
  jobs: [textAnalyzer],
  evaluators: [
    {
      name: "length-check",
      scorer: async ({ output }) => {
        const passesCheck = output.length > 10;
        return {
          value: passesCheck ? 1 : 0,
          explanation: passesCheck
            ? "Output length is sufficient"
            : `Output too short (${output.length} chars, need >10)`,
        };
      },
    },
  ],
});

📘
Learn more, see the EvaluatorQ repository on Github