Using Evaluators via the API

The Evaluator Library is available to use via the API and SDKs, this page dives into the available calls.

Prerequisite

To get started, an API key is needed to use within SDKs or HTTP API.

📘

To get an API key ready, see Authentication.

SDK

Built-in Evaluators

All the built-in evaluators available within our Evaluator Library are directly callable through the API.

NameDescriptionPath
BERT ScoreBERT-based semantic similarity scoring/v2/evaluators/bert_score
BLEU ScoreBLEU score for text comparison (translation quality)/v2/evaluators/bleu_score
ContainsCheck if text contains specific content/v2/evaluators/contains
Contains AllCheck if text contains all specified items/v2/evaluators/contains_all
Contains AnyCheck if text contains any of specified items/v2/evaluators/contains_any
Contains NoneCheck if text contains none of specified items/v2/evaluators/contains_none
Contains EmailValidate presence of email addresses/v2/evaluators/contains_email
Contains URLCheck for URL presence/v2/evaluators/contains_url
Contains Valid LinkValidate working links/v2/evaluators/contains_valid_link
Ends WithCheck text ending patterns/v2/evaluators/ends_with
Exact MatchExact string matching/v2/evaluators/exact_match
Valid JSONJSON format validation/v2/evaluators/valid_json
Length BetweenCheck text length within range/v2/evaluators/length_between
Length Greater ThanMinimum length validation/v2/evaluators/length_greater_than
Length Less ThanMaximum length validation/v2/evaluators/length_less_than
Age AppropriateAge-appropriate content checking/v2/evaluators/age_appropriate
Bot DetectionDetect bot-generated content/v2/evaluators/bot_detection
GrammarGrammar quality assessment/v2/evaluators/grammar
PIIPersonal Identifiable Information detection/v2/evaluators/pii
Sentiment ClassificationSentiment analysis/v2/evaluators/sentiment_classification
Tone of VoiceTone analysis and classification/v2/evaluators/tone_of_voice
Fact Checking Knowledge BaseFact verification against knowledge base/v2/evaluators/fact_checking_knowledge_base
LocalizationLocalization quality assessment/v2/evaluators/localization
SummarizationSummary quality evaluation/v2/evaluators/summarization
TranslationTranslation quality assessment/v2/evaluators/translation
RAGAs CoherenceText coherence measurement/v2/evaluators/ragas_coherence
RAGAs ConcisenessConciseness evaluation/v2/evaluators/ragas_conciseness
RAGAs Context PrecisionContext precision in RAG systems/v2/evaluators/ragas_context_precision
RAGAs CorrectnessFactual correctness assessment/v2/evaluators/ragas_correctness
RAGAs FaithfulnessFaithfulness to source material/v2/evaluators/ragas_faithfulness
RAGAs HarmfulnessHarmful content detection/v2/evaluators/ragas_harmfulness
RAGAs MaliciousnessMalicious content detection/v2/evaluators/ragas_maliciousness
RAGAs Response RelevancyResponse relevance scoring/v2/evaluators/ragas_response_relevancy
RAGAs SummarizationRAG-specific summarization quality/v2/evaluators/ragas_summarization

Example

We'll be calling the Tone of Voice endpoint:

Here is an example call:

The query defines the way the evaluator runs on the given output.

curl --request POST \
     --url https://api.orq.ai/v2/evaluators/tone_of_voice \
     --header 'accept: application/json' \
     --header 'authorization: Bearer ORQ_API_KEY' \
     --header 'content-type: application/json' \
     --data '
{
  "query": "Validate the tone of voice if it is professional.",
  "output": "Hello, how are you ??",
  "model": "openai/gpt-4o"
}
'
from orq_ai_sdk import Orq
import os


with Orq(
    api_key=os.getenv("ORQ_API_KEY", ""),
) as orq:

    res = orq.evals.tone_of_voice(request={
      "query": "Validate the tone of voice if it is professional.",
      "output": "Hello, how are you ??",
      "model": "openai/gpt-4o"
    })

    assert res is not None

    # Handle response
    print(res)
import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env["ORQ_API_KEY"] ?? "",
});

async function run() {
  const result = await orq.evals.toneOfVoice({
      query: "Validate the tone of voice if it is professional.",
      output: "Hello, how are you ??",
      model: "openai/gpt-4o", 
  });

  console.log(result);
}

run();

Here is the result returned by the API

The value here holds result of the evaluator call following the query

{
  "value": {
    "value": false,
    "explanation": "The output does not align with a professional tone. The use of 'Hello, how are you ??' is informal and lacks the formality expected in professional communication. The double question marks and casual greeting are more suited to a casual or friendly context rather than a professional one. A professional tone would require a more formal greeting and a clear purpose for the communication."
  }
}

Calling a custom evaluator

It is also possible to call a custom-made Evaluator made on orq using the Invoke a Custom Evaluator API call.

You can fetch the Evaluator ID to send to this call by searching for Evaluators using the Get all Evaluators API.

Finally, through the Orq studio, find the View Code button on your Evaluator page. the following modal opens:

Evaluator code is available in cURL, Node.js, and Python

This code is used to call the current Evaluator through the API. Ensure the payload is containing all necessary data for the Evaluator to execute correctly.

Using EvaluatorQ

EvaluatorQ is a dedicated SDK for using Evaluators within your application.

It features the following capabilities:

  • Parallel Execution: Run multiple evaluation jobs concurrently with progress tracking
  • Flexible Data Sources: Support for inline data, promises, and Orq platform datasets
  • Type-safe: Fully written in TypeScript

Installation:

npm install @orq-ai/evaluatorq

Usage example:

import { evaluatorq, job } from "@orq-ai/evaluatorq";

const textAnalyzer = job("text-analyzer", async (data) => {
  const text = data.inputs.text;
  const analysis = {
    length: text.length,
    wordCount: text.split(" ").length,
    uppercase: text.toUpperCase(),
  };

  return analysis;
});

await evaluatorq("text-analysis", {
  data: [
    { inputs: { text: "Hello world" } },
    { inputs: { text: "Testing evaluation" } },
  ],
  jobs: [textAnalyzer],
  evaluators: [
    {
      name: "length-check",
      scorer: async ({ output }) => {
        const passesCheck = output.length > 10;
        return {
          value: passesCheck ? 1 : 0,
          explanation: passesCheck
            ? "Output length is sufficient"
            : `Output too short (${output.length} chars, need >10)`,
        };
      },
    },
  ],
});
📘

Learn more, see the EvaluatorQ repository on Github