Skip to main content

Introduction

orq.ai exposes an API to manipulate Evaluators. These APIs are used to manage Evaluators programmatically. In this page we’ll see the common use cases for creating, and fetching Evaluators through the API.

Prerequisite

To get started, an API key is needed to use within SDKs or HTTP API.
To get an API key ready, see Authentication.

SDKs

Node.js

Python

Creating an Evaluator

To create an Evaluator we’ll use the Create an Evaluator API call. We then need to decide what type of Evaluator we’ll create:

HTTP Evaluator

Here is a valid payload to create an HTTP evaluator:
{
  "type": "http_eval",
  "method": "POST",
  "headers": {
    "header-key": "header-value"
  },
  "payload": {
    "body-key": "body-value"
  },
  "url": "https://myevaluatorendpoint.com/api",
  "path": "Default/Evaluators",
  "key": "MyEvaluator"
}
To learn more about building HTTP Evaluators, see Creating an HTTP Evaluator.

JSON Evaluator

Here is a valid payload to create a JSON evaluator:
Make sure to correctly escape the JSON Schema payload.
{
  "guardrail_config": {
    "enabled": true,
    "type": "boolean",
    "value": true
  },
  "type": "json_schema",
  "schema": "{   \"$schema\": \"http://json-schema.org/draft-07/schema#\",   \"$id\": \"https://example.com/person.schema.json\",   \"title\": \"Person\",   \"description\": \"A person object\",   \"type\": \"object\",   \"properties\": {     \"firstName\": {       \"type\": \"string\",       \"description\": \"The person's first name\"     },     \"lastName\": {       \"type\": \"string\",       \"description\": \"The person's last name\"     },     \"age\": {       \"type\": \"integer\",       \"minimum\": 0,       \"maximum\": 150,       \"description\": \"Age in years\"     },     \"email\": {       \"type\": \"string\",       \"format\": \"email\",       \"description\": \"Email address\"     },     \"address\": {       \"type\": \"object\",       \"properties\": {         \"street\": {           \"type\": \"string\"         },         \"city\": {           \"type\": \"string\"         },         \"zipCode\": {           \"type\": \"string\",           \"pattern\": \"^[0-9]{5}(-[0-9]{4})?$\"         }       },       \"required\": [\"street\", \"city\", \"zipCode\"]     },     \"hobbies\": {       \"type\": \"array\",       \"items\": {         \"type\": \"string\"       },       \"uniqueItems\": true     }   },   \"required\": [\"firstName\", \"lastName\", \"email\"],   \"additionalProperties\": false }"
}
To learn more about building JSON Evaluators, see Creating a JSON Evaluator.

LLM Evaluator

Here is a valid payload to create an LLM evaluator:
{
  "type": "llm_eval",
  "prompt": "Give a number response from 0 to 1, 0 for innapropriate, 10 for perfectly appropriate {{log.output}}",
  "path": "Default/evaluators",
  "model": "openai/gpt-4o",
  "key": "myKey"
}
To learn more about building LLM Evaluators, see Creating an LLM Evaluator.

Python Evaluator

Here’s a valid Python Evaluator:
Use \n to indicate newlines in code.
{
  "type": "python_eval",
  "path": "Default/Evaluators",
  "key": "MyEvaluator",
  "code": "def evaluate(log):\n  output_size = len(log[\"output\"])\n  reference_size = len(log[\"reference\"])\n  return abs(output_size - reference_size)\n"
}
To learn more about building Python Evaluators, see Creating a Python Evaluator.

Guardrail Configuration

For each Evaluator payload you can also define a guardrail payload looking as follows and add it into the creation payload
{
  "guardrail_config": {
    "enabled": true,
    "type": "number", // can be also boolean
    "value": 5, // value needs to match type
    "operator": "lte" // defines operator to compare value with
  }
}

Calling the API

Here’s an example end-to-end API call and response:
curl --request POST \
     --url https://api.orq.ai/v2/evaluators \
     --header 'accept: application/json' \
     --header 'authorization: Bearer ORQ_API_KEY' \
     --header 'content-type: application/json' \
     --data '
{
  "guardrail_config": {
    "enabled": true,
    "type": "number",
    "value": 5,
    "operator": "lte"
  },
  "type": "python_eval",
  "path": "Default/Evaluators",
  "key": "MyEvaluator",
  "code": "def evaluate(log):\n  output_size = len(log[\"output\"])\n  reference_size = len(log[\"reference\"])\n  return abs(output_size - reference_size)\n"
}
'
The expected response is the following:
{
  "_id":"EVALUATOR_ID",
  "key":"MyEvaluator",
  "description":"",
  "created":"2025-06-26T11:37:02.132Z",
  "updated":"2025-06-26T11:37:02.132Z",
  "type":"python_eval",
  "code":"def evaluate(log):\n  output_size = len(log[\"output\"])\n  reference_size = len(log[\"reference\"])\n  return abs(output_size - reference_size)\n"
}

Listing Evaluators

To list evaluators we’re using the Listing Evaluators API. We’re making the following call:
curl --request GET \
     --url https://api.orq.ai/v2/evaluators \
     --header 'accept: application/json' \
     --header 'authorization: Bearer ORQ_API_KEY'
The resulting payload is the following:
{
  "object": "list",
  "data": [
    {
      "_id": "EVALUATOR_ID",
      "key": "BERT Score",
      "description": "Computes the similarity of two sentences as a sum of cosine similarities between their tokens embeddings",
      "created": "2024-12-16T12:36:40.359Z",
      "updated": "2024-12-16T12:36:40.359Z",
      "guardrail_config": {
        "enabled": false,
        "type": "number",
        "value": 0.3,
        "operator": "gt"
      },
      "type": "function_eval",
      "function_params": {
        "type": "bert_score"
      }
    },
    ...
}

Using Evaluators

Calling an Evaluator from the Library

We’ll be calling the Tone of Voice endpoint: Here is an example call:
The query defines the way the evaluator runs on the given output.
curl --request POST \
     --url https://api.orq.ai/v2/evaluators/tone_of_voice \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <ORQ_API_KEY>' \
     --header 'content-type: application/json' \
     --data '
{
  "query": "Validate the tone of voice if it is professional.",
  "output": "Hello, how are you ??",
  "model": "openai/gpt-4o"
}
'
Here is the result returned by the API
The value here holds result of the evaluator call following the query
{
  "value": {
    "value": false,
    "explanation": "The output does not align with a professional tone. The use of 'Hello, how are you ??' is informal and lacks the formality expected in professional communication. The double question marks and casual greeting are more suited to a casual or friendly context rather than a professional one. A professional tone would require a more formal greeting and a clear purpose for the communication."
  }
}

Calling a custom evaluator

It is also possible to call a custom-made Evaluator made on orq using the API. You can fetch the Evaluator ID to send to this call by searching for Evaluators using the Get all Evaluators API. Then you can run the following API call:
curl 'https://api.orq.ai/v2/evaluators/<evaluator_id>/invoke' \
-H 'Authorization: Bearer <ORQ_API_KEY>' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
--data-raw '{
    "query": "Your input text",
    "output": "Your output text",
    "reference": "Optional reference text",
    "messages": [
        {
            "role": "user",
            "content": "Your message"
        }
    ],
    "retrievals": ["Your retrieval content"]
}' \
--compressed
Finally, through the Orq studio, find the View Code button on your Evaluator page. The following modal opens:
This code is used to call the current Evaluator through the API. Ensure the payload is containing all necessary data for the Evaluator to execute correctly.

Guardrail Error Response

When a guardrail blocks a generation in a Deployment or Agent, Orq.ai returns an HTTP 422 Unprocessable Entity. The following payloads are returned.
{
  "code": 422,
  "error": "Validation failed: Not all guardrails were met while validating the response.",
  "message": "Validation failed: Not all guardrails were met while validating the response.",
  "source": "system",
  "guardrails": [
    {
      "id": "01KMR75R90XDA80020YT8MHP2W",
      "status": "completed",
      "started_at": "2026-03-27T17:58:55.330Z",
      "finished_at": "2026-03-27T17:58:55.364Z",
      "related_entities": [
        {
          "type": "evaluator",
          "evaluator_id": "01KK9D8Z0JCEC1ASQJH8R28B57",
          "evaluator_metric_name": "python_evaluator"
        }
      ],
      "passed": false,
      "reason": null,
      "evaluator_type": "output_guardrail",
      "type": "boolean",
      "value": false
    }
  ]
}
FieldTypeDescription
idstringInternal ID of the guardrail result.
statusstringExecution status of the guardrail: "completed" or "failed".
started_atstringISO 8601 timestamp when the guardrail evaluation started.
finished_atstringISO 8601 timestamp when the guardrail evaluation finished.
related_entitiesarrayReferences to the evaluator that ran as this guardrail. Each entry contains type, evaluator_id, and evaluator_metric_name.
passedbooleanfalse for every entry in this error response, as the guardrail’s condition was not met. For example: a boolean guardrail configured to pass on true returns passed: false when the evaluator returns false.
reasonstring or nullExplanation of the failure, when provided by the evaluator.
evaluator_typestring"input_guardrail" if the guardrail ran before the model (request rejected before generation). "output_guardrail" if the guardrail ran after generation (response withheld).
typestringThe value type returned by the evaluator: "boolean" or "number".
valueboolean or numberThe raw value returned by the evaluator.
When the evaluator fails to execute: If the evaluator itself fails to run (for example, a network error contacting an external HTTP evaluator or a timeout), the guardrail is silently skipped and the generation proceeds. A broken evaluator does not block your users. Monitor skipped guardrail executions through Traces in the Orq.ai Studio.When an LLM guardrail’s underlying model fails: If the model powering an LLM guardrail is unavailable, Orq.ai fails the entire request for safety. Since the guardrail could not run, there is no way to know whether it would have blocked the generation, so Orq.ai errs on the side of caution.

Using EvaluatorQ

EvaluatorQ is a dedicated SDK for using Evaluators within your application. It features the following capabilities:
  • Parallel Execution: Run multiple evaluation jobs concurrently with progress tracking
  • Flexible Data Sources: Support for inline data, promises, and Orq platform datasets
  • Type-safe: Fully written in TypeScript
Installation:
npm install @orq-ai/evaluatorq
Usage example:
import { evaluatorq, job } from "@orq-ai/evaluatorq";

const textAnalyzer = job("text-analyzer", async (data) => {
  const text = data.inputs.text;
  const analysis = {
    length: text.length,
    wordCount: text.split(" ").length,
    uppercase: text.toUpperCase(),
  };

  return analysis;
});

await evaluatorq("text-analysis", {
  data: [
    { inputs: { text: "Hello world" } },
    { inputs: { text: "Testing evaluation" } },
  ],
  jobs: [textAnalyzer],
  evaluators: [
    {
      name: "length-check",
      scorer: async ({ output }) => {
        const passesCheck = output.length > 10;
        return {
          value: passesCheck ? 1 : 0,
          explanation: passesCheck
            ? "Output length is sufficient"
            : `Output too short (${output.length} chars, need >10)`,
        };
      },
    },
  ],
});
Learn more, see the Python EvaluatorQ and TypeScript EvaluatorQ repositories.