> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Create Evaluators

> Automate LLM output assessment with custom evaluators. Create LLM-as-a-Judge, HTTP, JSON, and Python evaluators via the AI Studio or API, with MCP support for LLM and Python evaluators.

**Evaluators** are automated tools that assess model outputs within [Experiments](/docs/experiments/overview), [Deployments](/docs/deployments/overview), and [Agents](/docs/agents/build). They verify outputs against reference data, enforce compliance criteria, and power **Guardrails** that block non-compliant generations before they reach users.

Four evaluator types are available.

<CardGroup cols={2}>
  <Card title="LLM Evaluator" icon="robot" href="#llm-evaluator">
    Use a model to judge outputs against any criteria you define in a prompt.
  </Card>

  <Card title="Python Evaluator" icon="python" href="#python-evaluator">
    Write custom Python code for full flexibility. Use for statistical scoring, regex checks, length validation, or any custom evaluation logic.
  </Card>

  <Card title="HTTP Evaluator" icon="globe" href="#http-evaluator">
    Call an external API to evaluate outputs. Use for business-specific compliance checks, custom scoring services, or domain-specific validations.
  </Card>

  <Card title="JSON Evaluator" icon="brackets-curly" href="#json-evaluator">
    Validate model outputs against a JSON Schema. Use to enforce correct payload structure for incoming or outgoing model responses.
  </Card>
</CardGroup>

## Use Cases

<AccordionGroup>
  <Accordion title="Automated quality scoring" icon="star">
    Score model outputs on dimensions like tone, accuracy, or relevance without manual review. Use LLM-as-a-Judge evaluators with custom rubrics, or import pre-built scoring functions from the [Hub](/docs/hub/overview).
  </Accordion>

  <Accordion title="Output compliance checks" icon="shield-check">
    Verify that outputs meet specific format, content, or structural requirements. Use JSON evaluators for schema validation, Python evaluators for custom logic, or HTTP evaluators to call your own compliance APIs.
  </Accordion>

  <Accordion title="Guardrails in Deployments and Agents" icon="lock">
    Attach evaluators as guardrails to block generations that fail a pass condition. Input guardrails run before the model; output guardrails run after. A failed guardrail returns HTTP 422 to the caller.
  </Accordion>

  <Accordion title="Regression testing in Experiments" icon="flask">
    Run evaluators across a full dataset in an Experiment to track quality over time. Compare evaluator scores across runs and prompt variants to catch regressions before deploying changes.
  </Accordion>
</AccordionGroup>

## LLM Evaluator

LLM Evaluators use a model to judge outputs against any criteria you define in a prompt.

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    In a [Project](/docs/projects/overview) or folder, click the `+` button and select **LLM Evaluator**. Select the model to use for evaluation. It must be enabled in the [AI Router](/docs/model-garden/overview).
  </Tab>

  <Tab title="API & SDK" icon="code">
    Use the [Create an Evaluator API](/reference/evaluators/create-an-evaluator).

    ```json JSON theme={"theme":{"light":"github-light","dark":"github-dark"}}
    {
      "type": "llm_eval",
      "prompt": "Give a number response from 0 to 1, 0 for inappropriate, 1 for perfectly appropriate {{log.output}}",
      "path": "Default/evaluators",
      "model": "openai/gpt-4o",
      "key": "myKey",
      "guardrail_config": {
        "enabled": true,
        "type": "number",
        "value": 0.7,
        "operator": "gte"
      }
    }
    ```
  </Tab>

  <Tab title="MCP" icon="https://mintcdn.com/orqai/i7ZhKI7LFRfXU7ox/images/logos/mcp.svg?fit=max&auto=format&n=i7ZhKI7LFRfXU7ox&q=85&s=cef7916eb5fe1f6bb97541398d3f7639" width="16" height="16" data-path="images/logos/mcp.svg">
    **Retrieve an evaluator's configuration:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Show me the current configuration for the "tone-scorer" evaluator
    ```

    The assistant uses `search_entities` to resolve the evaluator ID, then `get_llm_eval` to retrieve the full configuration including prompt, model, and output type.

    ***

    **Create an LLM evaluator:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create an LLM-as-a-Judge evaluator that scores responses on tone: professional, neutral, or aggressive
    ```

    The assistant uses `create_llm_eval` with a categorical scoring rubric and confirms the evaluator ID.

    ***

    **Update an existing LLM evaluator:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Update the "tone-scorer" evaluator to also check for formal language and return a boolean instead of a number
    ```

    The assistant uses `search_entities` to find the evaluator, then `update_llm_eval` with the updated `prompt` and `output_type: "boolean"`.
  </Tab>
</Tabs>

### Configure Prompt

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Your prompt has access to the following **string** variables:

    * `{{log.input}}`: the last message sent to the model
    * `{{log.output}}`: the output response generated by the evaluated model
    * `{{log.messages}}`: all messages sent to the model, excluding the last message
    * `{{input.all_messages}}`: the full conversation, including the last user message
    * `{{log.retrievals}}`: [Knowledge Base](/docs/knowledge/overview) retrievals
    * `{{log.reference}}`: the reference used to compare output
    * `{{output.tools_called}}`: a numbered, human-readable summary of each tool call made during the run, with arguments and responses
    * `{{log.tool_calls}}`: alias of `{{output.tools_called}}`

    <Note>
      Nested indexing such as `{{log.tool_calls[0].tool_name}}` resolves only on the Python evaluator runtime. On the Go evaluator runtime the value is the rendered string above.
    </Note>
  </Tab>
</Tabs>

### Output Types

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Choose which type of output your model evaluation will provide. The output type also determines how the evaluator can be used as a Guardrail.

    <Tabs>
      <Tab title="Boolean" icon="bars">
        The model returns a **True** or **False** response. Use this for binary pass/fail checks.

        **Guardrail**: Select **True** or **False**. The guardrail passes when the model returns the selected value.
      </Tab>

      <Tab title="Number" icon="hashtag">
        The model returns a numeric score. Use any scale that fits your use case (e.g. 1-5, 0-100).

        **Guardrail**: Enter a threshold in **Pass if greater or equal than**. The guardrail passes when the score meets or exceeds the threshold.
      </Tab>

      <Tab title="Categorical" icon="grid-2">
        The model classifies the output into one of your predefined labels.

        When you select **Categorical**, a label editor appears below the output type selector. Add one label per row: enter a **Value** (the exact string the model must return) and an optional **Description** to guide the model. At least one label is required.

        **Guardrail**: Select one or more values in **Pass if output is one of**. The guardrail passes when the model's output matches any of the selected labels.

        <Frame caption="Configure which categorical labels must match for the guardrail to pass.">
          <img src="https://mintcdn.com/orqai/7yBnkUrxNQ0b0A6G/images/guardrail-categorical.png?fit=max&auto=format&n=7yBnkUrxNQ0b0A6G&q=85&s=84746abccab181820e86aa02c7b658aa" alt="Categorical guardrail configuration" width="415" height="441" data-path="images/guardrail-categorical.png" />
        </Frame>
      </Tab>

      <Tab title="String" icon="font">
        The model returns a free-form string response. Not available as a guardrail.
      </Tab>
    </Tabs>
  </Tab>
</Tabs>

### Examples

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    <AccordionGroup>
      <Accordion title="Evaluating formality on a 1-5 scale" icon="sliders">
        ```text theme={"theme":{"light":"github-light","dark":"github-dark"}}
        Rate the formality of the following output on a scale of 1 to 5:
        - 1: Very casual/informal
        - 5: Very formal/professional

        Only output the number.

        [OUTPUT] {{log.output}}
        ```
      </Accordion>

      <Accordion title="Evaluating accuracy on a 0-100 scale" icon="bullseye">
        ```text theme={"theme":{"light":"github-light","dark":"github-dark"}}
        Evaluate how accurate the response [OUTPUT] is compared to the query [INPUT].

        Score from 0 to 100, where:
        - 0: Completely inaccurate or irrelevant
        - 50: Partially accurate
        - 100: Perfectly accurate and complete

        Only output the score as a number.

        [INPUT] {{log.input}}
        [OUTPUT] {{log.output}}
        ```
      </Accordion>

      <Accordion title="Binary pass/fail with numeric output" icon="circle-check">
        ```text theme={"theme":{"light":"github-light","dark":"github-dark"}}
        Evaluate if the response adequately answers the user's question.

        Return 1 if the response is satisfactory, 0 if it is not.

        [QUESTION] {{log.input}}
        [RESPONSE] {{log.output}}
        ```
      </Accordion>
    </AccordionGroup>
  </Tab>
</Tabs>

### Testing

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Configure the LLM payload in the Studio Playground:

    <Frame caption="Configure the LLM payload that will be sent to the evaluator.">
      <img src="https://mintcdn.com/orqai/XbJWQ7lqn4sIVHea/images/docs/d04c8a4424879b761bdbb59bb58193a0cb562cf0abc1d01cd7cc526af5c3b431-Screenshot_2025-06-27_at_11.12.01.png?fit=max&auto=format&n=XbJWQ7lqn4sIVHea&q=85&s=4555fbee055923f79fbfd15778d24a51" alt="Studio Playground panel for configuring the LLM payload sent to an LLM evaluator." width="806" height="558" data-path="images/docs/d04c8a4424879b761bdbb59bb58193a0cb562cf0abc1d01cd7cc526af5c3b431-Screenshot_2025-06-27_at_11.12.01.png" />
    </Frame>

    Click **Run** to execute the evaluator. The result appears in the **Response** field.

    <Frame caption="An LLM Evaluator test response.">
      <img src="https://mintcdn.com/orqai/ep9iJPTKd6tE7QFF/images/docs/a2c8694931c114e9305160eac7f7aedd285d3f07e6789090e84c226bf0ea090c-Screenshot_2025-06-27_at_11.23.38.png?fit=max&auto=format&n=ep9iJPTKd6tE7QFF&q=85&s=12ce3f6d102258f6dd7de0021709ad91" alt="Response field showing the result of an LLM evaluator test run." width="820" height="494" data-path="images/docs/a2c8694931c114e9305160eac7f7aedd285d3f07e6789090e84c226bf0ea090c-Screenshot_2025-06-27_at_11.23.38.png" />
    </Frame>

    <Info>
      Once created, this evaluator is available as a guardrail in **Deployments** and **Agents**. See [Evaluators and Guardrails in Deployments](/docs/deployments/creating#evaluators-and-guardrails) and [Evaluators and Guardrails in Agents](/docs/agents/agent-studio#evaluators-and-guardrails) to learn more.
    </Info>
  </Tab>
</Tabs>

## Python Evaluator

Python Evaluators let you write custom **Python code** for maximum flexibility: from simple validations (regex, length checks) to complex analyses (statistical scoring, custom algorithms).

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    In a [Project](/docs/projects/overview) or folder, click the `+` button and select **Python Evaluator**. You are taken to the code editor. Your evaluation function has access to the following fields from the evaluated model's log:

    * `log["input"]` `<str>`: the last message sent to generate the output
    * `log["output"]` `<str>`: the generated response from the model
    * `log["reference"]` `<str>`: the reference used to compare the output
    * `log["messages"]` `list<str>`: all previous messages sent to the model
    * `log["retrievals"]` `list<str>`: all [Knowledge Base](/docs/knowledge/overview) retrievals

    The evaluator can return two response types:

    * **Number**: return a numeric score
    * **Boolean**: return a true/false value

    Example: compare output size with the reference:

    ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
    def evaluate(log):
        output_size = len(log["output"])
        reference_size = len(log["reference"])
        return abs(output_size - reference_size)
    ```

    <Info>
      You can define multiple methods within the code editor. The last method is the entry-point for the Evaluator when run.
    </Info>
  </Tab>

  <Tab title="API & SDK" icon="code">
    Use the [Create an Evaluator API](/reference/evaluators/create-an-evaluator). Use `\n` to indicate newlines in code.

    ```json JSON theme={"theme":{"light":"github-light","dark":"github-dark"}}
    {
      "type": "python_eval",
      "path": "Default/Evaluators",
      "key": "MyEvaluator",
      "code": "def evaluate(log):\n  output_size = len(log[\"output\"])\n  reference_size = len(log[\"reference\"])\n  return abs(output_size - reference_size)\n",
      "guardrail_config": {
        "enabled": true,
        "type": "number",
        "value": 10,
        "operator": "lte"
      }
    }
    ```
  </Tab>

  <Tab title="MCP" icon="https://mintcdn.com/orqai/i7ZhKI7LFRfXU7ox/images/logos/mcp.svg?fit=max&auto=format&n=i7ZhKI7LFRfXU7ox&q=85&s=cef7916eb5fe1f6bb97541398d3f7639" width="16" height="16" data-path="images/logos/mcp.svg">
    **Retrieve a Python evaluator's configuration:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Show me the current configuration for the "json-validator" evaluator
    ```

    The assistant uses `search_entities` to resolve the evaluator ID, then `get_python_eval` to retrieve the full configuration including code and output type.

    ***

    **Create a Python evaluator:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create a Python evaluator that checks whether the response contains a valid JSON object
    ```

    The assistant writes a Python snippet that parses the response and validates JSON structure, then uses `create_python_eval` to register it in your workspace.

    ***

    **Update a Python evaluator:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Update the "json-validator" evaluator to also check that the JSON contains a "status" field
    ```

    The assistant uses `search_entities` to find the evaluator, then `update_python_eval` with the updated `code`.
  </Tab>
</Tabs>

### Environment and Libraries

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    The Python Evaluator runs in **Python 3.12** with the following preloaded libraries:

    ```text theme={"theme":{"light":"github-light","dark":"github-dark"}}
    numpy==1.26.4
    nltk==3.9.1
    json
    re
    ```
  </Tab>
</Tabs>

### Guardrail Configuration

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Within a [Deployment](/docs/deployments/overview) or [Agent](/docs/agents/overview), use your Python Evaluator as a Guardrail to block generations that don't meet your custom evaluation logic.

    Use the **Pass condition** to define when the guardrail passes:

    * **Boolean evaluators**: select **True** or **False**. The guardrail passes when your function returns the selected value.
    * **Number evaluators**: enter a score threshold. The guardrail passes when your function's return value is greater than or equal to the threshold.
  </Tab>
</Tabs>

### Testing

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Configure the Python payload in the Studio Playground:

    <Frame caption="Configure the payload that will be sent to the Python evaluator.">
      <img src="https://mintcdn.com/orqai/EqUGDI2og-dnTmDI/images/docs/654f199a1a1ae3287e719db4c52f61b1c1725eed4bd92fb64368ed351d8daa51-Screenshot_2025-06-27_at_11.31.08.png?fit=max&auto=format&n=EqUGDI2og-dnTmDI&q=85&s=9a06182982b2b5a27ef630ef5c2f80f6" alt="Studio Playground panel for configuring the payload sent to a Python evaluator." width="830" height="546" data-path="images/docs/654f199a1a1ae3287e719db4c52f61b1c1725eed4bd92fb64368ed351d8daa51-Screenshot_2025-06-27_at_11.31.08.png" />
    </Frame>

    Click **Run** to execute the evaluator. The result appears in the **Response** field.

    <Frame caption="A Python test response.">
      <img src="https://mintcdn.com/orqai/8ublVIDMeb653NWy/images/docs/2892c6189ecf781ba25353fac32d5bba1d7a03ea9b04b1fd437301b80c5c2c6a-Screenshot_2025-06-27_at_11.31.10.png?fit=max&auto=format&n=8ublVIDMeb653NWy&q=85&s=9e8571c1b6449522eb9e6c464d79ab25" alt="Response field showing the result of a Python evaluator test run." width="820" height="480" data-path="images/docs/2892c6189ecf781ba25353fac32d5bba1d7a03ea9b04b1fd437301b80c5c2c6a-Screenshot_2025-06-27_at_11.31.10.png" />
    </Frame>
  </Tab>
</Tabs>

## HTTP Evaluator

HTTP evaluators call an external API to perform evaluation, enabling flexible assessments using your own or third-party endpoints. Use them for business-specific compliance checks, custom quality scoring, or domain-specific validations.

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    In a [Project](/docs/projects/overview) or folder, click the `+` button and select **HTTP Evaluator**. Define the following:

    | Field       | Description                                               |
    | ----------- | --------------------------------------------------------- |
    | **URL**     | The API endpoint.                                         |
    | **Headers** | Key-value pairs for HTTP headers sent during evaluation.  |
    | **Payload** | Key-value pairs for the HTTP body sent during evaluation. |
  </Tab>

  <Tab title="API & SDK" icon="code">
    Use the [Create an Evaluator API](/reference/evaluators/create-an-evaluator).

    ```json JSON theme={"theme":{"light":"github-light","dark":"github-dark"}}
    {
      "type": "http_eval",
      "method": "POST",
      "headers": {
        "header-key": "header-value"
      },
      "payload": {
        "body-key": "body-value"
      },
      "url": "https://myevaluatorendpoint.com/api",
      "path": "Default/Evaluators",
      "key": "MyEvaluator",
      "guardrail_config": {
        "enabled": true,
        "type": "number",
        "value": 5,
        "operator": "gte"
      }
    }
    ```
  </Tab>
</Tabs>

**Payload Detail**

The following variables are accessible in the payload sent to your endpoint:

```jsonc theme={"theme":{"light":"github-light","dark":"github-dark"}}
{
  "query": "",           // last message sent to the model
  "response": "",        // assistant-generated response
  "expected_output": "", // dataset reference for the evaluation
  "retrieved_context": [] // knowledge base retrievals
}
```

**Expected Response Payload**

For an HTTP Evaluator to be valid, **Orq.ai** expects a response payload in one of the following formats. If none is returned, the evaluator is ignored during processing.

<CodeGroup>
  ```json Boolean theme={"theme":{"light":"github-light","dark":"github-dark"}}
  {
      "type": "boolean",
      "value": true
  }
  ```

  ```json Number theme={"theme":{"light":"github-light","dark":"github-dark"}}
  {
      "type": "number",
      "value": 1
  }
  ```

  ```json String theme={"theme":{"light":"github-light","dark":"github-dark"}}
  {
    "type": "string",
    "value": "This response passed all compliance checks"
  }
  ```
</CodeGroup>

### Guardrail Configuration

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Within a [Deployment](/docs/deployments/overview) or [Agent](/docs/agents/overview), you can use your HTTP Evaluator as a Guardrail to block responses based on the value returned by your endpoint.

    Use the **Pass condition** to set a numeric threshold. The guardrail passes when the value returned by your endpoint is greater than or equal to the threshold.
  </Tab>
</Tabs>

### Testing

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    A Playground is available in the Studio to test your evaluator against any output before using it in an Experiment or Deployment.

    Configure the request fields:

    <Frame caption="Configure all fields that will be sent to the evaluator.">
      <img src="https://mintcdn.com/orqai/ep9iJPTKd6tE7QFF/images/docs/ac154d257ac277a26bae5cabf622e7b8d3da59cf6e6060e7dbdcf11790122351-Screenshot_2025-06-27_at_11.12.01.png?fit=max&auto=format&n=ep9iJPTKd6tE7QFF&q=85&s=dd9f9c98329951357764bc6848070fc8" alt="Studio Playground panel for configuring the request fields sent to an HTTP evaluator." width="806" height="558" data-path="images/docs/ac154d257ac277a26bae5cabf622e7b8d3da59cf6e6060e7dbdcf11790122351-Screenshot_2025-06-27_at_11.12.01.png" />
    </Frame>

    Click **Run** to execute the evaluator. The result appears in the **Response** field.

    <Frame caption="An HTTP test response.">
      <img src="https://mintcdn.com/orqai/EqUGDI2og-dnTmDI/images/docs/90df89905f761bd97537cac54b99562397d23767c3bd79b1204d3004f71e1c3c-Screenshot_2025-06-27_at_11.25.22.png?fit=max&auto=format&n=EqUGDI2og-dnTmDI&q=85&s=4bac65d8451a98c7e2975edf13794ca0" alt="Response field showing the result of an HTTP evaluator test run." width="836" height="488" data-path="images/docs/90df89905f761bd97537cac54b99562397d23767c3bd79b1204d3004f71e1c3c-Screenshot_2025-06-27_at_11.25.22.png" />
    </Frame>
  </Tab>
</Tabs>

## JSON Evaluator

JSON Evaluators validate model outputs against a JSON Schema, ensuring correct payload structure for incoming or outgoing model responses.

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    In a [Project](/docs/projects/overview) or folder, click the `+` button and select **JSON Evaluator**. Specify a **JSON Schema** that defines which fields are required and their types. For example:

    ```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
    {
      "type": "object",
      "properties": {
        "title": {
          "type": "string",
          "description": "The post title"
        },
        "length": {
          "type": "integer",
          "description": "The post length"
        }
      },
      "required": [ "title", "length" ]
    }
    ```
  </Tab>

  <Tab title="API & SDK" icon="code">
    Use the [Create an Evaluator API](/reference/evaluators/create-an-evaluator). The `schema` field takes the JSON Schema as a serialized string. Quote characters must be escaped as `\"`.

    ```json JSON theme={"theme":{"light":"github-light","dark":"github-dark"}}
    {
      "guardrail_config": {
        "enabled": true,
        "type": "boolean",
        "value": true
      },
      "type": "json_schema",
      "schema": "{   \"$schema\": \"http://json-schema.org/draft-07/schema#\",   \"$id\": \"https://example.com/person.schema.json\",   \"title\": \"Person\",   \"description\": \"A person object\",   \"type\": \"object\",   \"properties\": {     \"firstName\": {       \"type\": \"string\",       \"description\": \"The person's first name\"     },     \"lastName\": {       \"type\": \"string\",       \"description\": \"The person's last name\"     },     \"age\": {       \"type\": \"integer\",       \"minimum\": 0,       \"maximum\": 150,       \"description\": \"Age in years\"     },     \"email\": {       \"type\": \"string\",       \"format\": \"email\",       \"description\": \"Email address\"     }   },   \"required\": [\"firstName\", \"lastName\", \"email\"],   \"additionalProperties\": false }"
    }
    ```
  </Tab>
</Tabs>

### Guardrail Configuration

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Within a [Deployment](/docs/deployments/overview) or [Agent](/docs/agents/overview), use your JSON Evaluator as a Guardrail to block payloads that don't validate the given JSON Schema. Enabling the Guardrail toggle will block non-conforming payloads.
  </Tab>
</Tabs>

### Testing

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Configure the JSON payload in the Studio Playground:

    <Frame caption="Configure the JSON payload that will be sent to the evaluator.">
      <img src="https://mintcdn.com/orqai/XbJWQ7lqn4sIVHea/images/docs/ccf8dea2516c5da0298dc9227b2ef5a979880861f3e0c936bcd057610282e6d1-Screenshot_2025-06-27_at_11.28.30.png?fit=max&auto=format&n=XbJWQ7lqn4sIVHea&q=85&s=4624e74d9df4cad6bc3ec0cfeb9fd3a8" alt="Studio Playground panel for configuring the JSON payload sent to a JSON evaluator." width="820" height="554" data-path="images/docs/ccf8dea2516c5da0298dc9227b2ef5a979880861f3e0c936bcd057610282e6d1-Screenshot_2025-06-27_at_11.28.30.png" />
    </Frame>

    Click **Run** to execute the evaluator. The result appears in the **Response** field.

    <Frame caption="A JSON test response.">
      <img src="https://mintcdn.com/orqai/ep9iJPTKd6tE7QFF/images/docs/9ac09f00d61e085dfb2467abf49431f0589311ccb98b6bb4ec6ebf678220e549-Screenshot_2025-06-27_at_11.28.15.png?fit=max&auto=format&n=ep9iJPTKd6tE7QFF&q=85&s=58abbd3e1b38199d268aa60ecd598439" alt="Response field showing the result of a JSON evaluator test run." width="796" height="478" data-path="images/docs/9ac09f00d61e085dfb2467abf49431f0589311ccb98b6bb4ec6ebf678220e549-Screenshot_2025-06-27_at_11.28.15.png" />
    </Frame>
  </Tab>
</Tabs>

## Versions

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    When you are done editing, click **Publish** to save your changes. You will be prompted to write a commit message and choose a version bump:

    <Frame caption="Publish a new version of your Evaluator.">
      <img src="https://mintcdn.com/orqai/4EPXiu89-sAKjNI7/images/evaluator-publish.png?fit=max&auto=format&n=4EPXiu89-sAKjNI7&q=85&s=dd165b40b98d2b38da312ece11fc2bea" alt="Evaluator publish" width="516" height="366" data-path="images/evaluator-publish.png" />
    </Frame>

    * **Patch** (e.g. `v1.0.0` to `v1.0.1`): small fixes, no behaviour change
    * **Minor** (e.g. `v1.0.0` to `v1.1.0`): new functionality, backwards compatible
    * **Major** (e.g. `v1.0.0` to `v2.0.0`): breaking change or significant rework

    The **Versions** tab shows the full history with author and publish timestamp for each version.

    <Frame caption="Evaluator versions.">
      <img src="https://mintcdn.com/orqai/4EPXiu89-sAKjNI7/images/evaluators-versions.png?fit=max&auto=format&n=4EPXiu89-sAKjNI7&q=85&s=29095d162b428bac42c76bc00c1ed120" alt="Evaluator versions" width="577" height="411" data-path="images/evaluators-versions.png" />
    </Frame>

    Each published version has three action buttons:

    | Action      | Icon                        | Description                                                                                     |
    | ----------- | --------------------------- | ----------------------------------------------------------------------------------------------- |
    | Compare     | <Icon icon="right-left" />  | Open a diff view to see what changed between versions                                           |
    | Code        | <Icon icon="code" />        | Load a code snippet to invoke the evaluator at this exact version                               |
    | Environment | <Icon icon="layer-group" /> | Tag the version with an [Environment](/docs/administer/environments) (e.g. production, staging) |

    <Tip>
      Reference a specific version by appending `@` and the version number: `my-evaluator@1.0.1`. Reference an environment tag directly: `my-evaluator@production`. Without a suffix, the latest published version is used.
    </Tip>
  </Tab>
</Tabs>

## List Evaluators

<Tabs>
  <Tab title="API & SDK" icon="code">
    Use the [List Evaluators API](/reference/evaluators/get-all-evaluators):

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request GET \
           --url https://api.orq.ai/v2/evaluators \
           --header 'accept: application/json' \
           --header 'authorization: Bearer ORQ_API_KEY'
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      from orq_ai_sdk import Orq
      import os

      with Orq(api_key=os.getenv("ORQ_API_KEY", "")) as orq:
          res = orq.evals.all(limit=10)
          print(res)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      import { Orq } from "@orq-ai/node";

      const orq = new Orq({ apiKey: process.env["ORQ_API_KEY"] ?? "" });

      const result = await orq.evals.all({});
      console.log(result);
      ```
    </CodeGroup>
  </Tab>
</Tabs>

## Invoke an Evaluator

<Tabs>
  <Tab title="API & SDK" icon="code">
    **Call a library evaluator** (for example, the **Tone of Voice** evaluator):

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
           --url https://api.orq.ai/v2/evaluators/tone_of_voice \
           --header 'accept: application/json' \
           --header 'authorization: Bearer ORQ_API_KEY' \
           --header 'content-type: application/json' \
           --data '{
        "query": "Validate the tone of voice if it is professional.",
        "output": "Hello, how are you ??",
        "model": "openai/gpt-4o"
      }'
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      from orq_ai_sdk import Orq
      import os

      with Orq(api_key=os.getenv("ORQ_API_KEY", "")) as orq:
          res = orq.evals.tone_of_voice(request={
              "query": "Validate the tone of voice if it is professional.",
              "output": "Hello, how are you ??",
              "model": "openai/gpt-4o"
          })
          print(res)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      import { Orq } from "@orq-ai/node";

      const orq = new Orq({ apiKey: process.env["ORQ_API_KEY"] ?? "" });

      const result = await orq.evals.toneOfVoice({
          query: "Validate the tone of voice if it is professional.",
          output: "Hello, how are you ??",
          model: "openai/gpt-4o",
      });
      console.log(result);
      ```
    </CodeGroup>

    **Call a custom evaluator**: fetch the evaluator ID from the [List Evaluators API](/reference/evaluators/get-all-evaluators), then invoke it. Use the **View Code** button on your evaluator page in the AI Studio to get a pre-filled snippet.

    <Frame caption="Evaluator code is available directly to use.">
      <img src="https://mintcdn.com/orqai/ep9iJPTKd6tE7QFF/images/docs/a2dff60969c3a7632443bc2d8ed47a6275725ff4ee06d6d170f45655f8cf1bd5-image.png?fit=max&auto=format&n=ep9iJPTKd6tE7QFF&q=85&s=02c00a10411ab93f60252425b86d8d08" alt="Evaluator page with the View Code button exposing a pre-filled invocation snippet." width="885" height="923" data-path="images/docs/a2dff60969c3a7632443bc2d8ed47a6275725ff4ee06d6d170f45655f8cf1bd5-image.png" />
    </Frame>

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl 'https://api.orq.ai/v2/evaluators/<evaluator_id>/invoke' \
      -H 'Authorization: Bearer ORQ_API_KEY' \
      -H 'Content-Type: application/json' \
      -H 'Accept: application/json' \
      --data-raw '{
          "query": "Your input text",
          "output": "Your output text",
          "reference": "Optional reference text",
          "messages": [{"role": "user", "content": "Your message"}],
          "retrievals": ["Your retrieval content"]
      }'
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      from orq_ai_sdk import Orq
      import os

      orq = Orq(api_key=os.getenv("ORQ_API_KEY", ""))

      evaluation = orq.evals.invoke(
          id="01JN5J8W4J5JP8ZSD0TADK11GJ",
          query="Your input text",
          output="Your output text",
          reference="Optional reference text",
          messages=[{"role": "user", "content": "Your message"}],
          retrievals=["Your retrieval content"]
      )
      print(evaluation)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      import { Orq } from "@orq-ai/node";

      const orq = new Orq({ apiKey: process.env["ORQ_API_KEY"] ?? "" });

      const evaluation = await orq.evals.invoke({
          id: "01JN5J8W4J5JP8ZSD0TADK11GJ",
          requestBody: {
              query: "Your input text",
              output: "Your output text",
              reference: "Optional reference text",
              messages: [{ role: "user", content: "Your message" }],
              retrievals: ["Your retrieval content"]
          }
      });
      console.log(evaluation);
      ```
    </CodeGroup>
  </Tab>
</Tabs>

## Guardrail Error Response

When a guardrail evaluation fails, **Orq.ai** returns an HTTP `422 Unprocessable Entity`. The response body lists every guardrail that did not pass.

<Tabs>
  <Tab title="Deployments">
    ```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
    {
      "code": 422,
      "error": "Validation failed: Not all guardrails were met while validating the response.",
      "message": "Validation failed: Not all guardrails were met while validating the response.",
      "source": "system",
      "guardrails": [
        {
          "id": "01KMR75R90XDA80020YT8MHP2W",
          "status": "completed",
          "started_at": "2026-03-27T17:58:55.330Z",
          "finished_at": "2026-03-27T17:58:55.364Z",
          "related_entities": [
            {
              "type": "evaluator",
              "evaluator_id": "01KK9D8Z0JCEC1ASQJH8R28B57",
              "evaluator_metric_name": "python_evaluator"
            }
          ],
          "passed": false,
          "reason": null,
          "evaluator_type": "output_guardrail",
          "type": "boolean",
          "value": false
        }
      ]
    }
    ```

    | Field              | Type                       | Description                                                                                                            |
    | ------------------ | -------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
    | `id`               | string                     | Internal ID of the guardrail result.                                                                                   |
    | `status`           | string                     | Execution status: `"completed"` or `"failed"`.                                                                         |
    | `started_at`       | string                     | ISO 8601 timestamp when the guardrail evaluation started.                                                              |
    | `finished_at`      | string                     | ISO 8601 timestamp when the guardrail evaluation finished.                                                             |
    | `related_entities` | array                      | References to the evaluator that ran. Each entry contains `type`, `evaluator_id`, and `evaluator_metric_name`.         |
    | `passed`           | boolean                    | `false` for every entry in this error response.                                                                        |
    | `reason`           | string or null             | Explanation of the failure, when provided by the evaluator.                                                            |
    | `evaluator_type`   | string                     | `"input_guardrail"` if the guardrail ran before the model. `"output_guardrail"` if the guardrail ran after generation. |
    | `type`             | string                     | The value type returned by the evaluator: `"boolean"`, `"number"`, or `"categorical"`.                                 |
    | `value`            | boolean, number, or string | The raw value returned by the evaluator.                                                                               |
  </Tab>

  <Tab title="Agents">
    ```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
    {
      "code": 422,
      "error": "Validation failed: Not all guardrails were met while validating the messages.",
      "message": "Validation failed: Not all guardrails were met while validating the messages.",
      "source": "system"
    }
    ```

    The `guardrails` array is not included in Agent responses. Use [Traces](/docs/observability/traces) in the **Orq.ai** Studio to identify which guardrail failed.
  </Tab>
</Tabs>

<Info>
  **When the evaluator fails to execute:** If the evaluator itself fails to run (for example, a network error or timeout), the guardrail is silently skipped and the generation proceeds. Monitor skipped guardrail executions through [Traces](/docs/observability/traces).

  **When an LLM guardrail's underlying model fails:** If the model powering an LLM guardrail is unavailable, **Orq.ai** fails the entire request for safety. Since the guardrail could not run, there is no way to know whether it would have blocked the generation.
</Info>

## Evaluatorq

**Evaluatorq** is a dedicated SDK for running evaluations programmatically. It supports parallel job execution, flexible data sources (inline, CSV, Orq datasets), and syncs results to the **Orq.ai** AI Studio.

<Tabs>
  <Tab title="API & SDK" icon="code">
    **Install:**

    <CodeGroup>
      ```bash Node.js theme={"theme":{"light":"github-light","dark":"github-dark"}}
      npm install @orq-ai/evaluatorq
      ```

      ```bash Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      pip install evaluatorq
      ```
    </CodeGroup>

    **Usage example:**

    <CodeGroup>
      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      import { evaluatorq, job } from "@orq-ai/evaluatorq";

      const textAnalyzer = job("text-analyzer", async (data) => {
          const text = data.inputs.text;
          return {
              length: text.length,
              wordCount: text.split(" ").length,
              uppercase: text.toUpperCase(),
          };
      });

      await evaluatorq("text-analysis", {
          data: [
              { inputs: { text: "Hello world" } },
              { inputs: { text: "Testing evaluation" } },
          ],
          jobs: [textAnalyzer],
          evaluators: [
              {
                  name: "length-check",
                  scorer: async ({ output }) => {
                      const passesCheck = output.length > 10;
                      return {
                          value: passesCheck ? 1 : 0,
                          explanation: passesCheck
                              ? "Output length is sufficient"
                              : `Output too short (${output.length} chars, need >10)`,
                      };
                  },
              },
          ],
      });
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      import asyncio
      from evaluatorq import evaluatorq, job, DataPoint, EvaluationResult

      @job("text-analyzer")
      async def text_analyzer(data: DataPoint, row: int):
          text = data.inputs["text"]
          return {
              "length": len(text),
              "word_count": len(text.split()),
              "uppercase": text.upper(),
          }

      async def length_check_scorer(params):
          output = params["output"]
          passes_check = output["length"] > 10
          return EvaluationResult(
              value=1 if passes_check else 0,
              explanation=(
                  "Output length is sufficient"
                  if passes_check
                  else f"Output too short ({output['length']} chars, need >10)"
              )
          )

      async def main():
          await evaluatorq(
              "text-analysis",
              data=[
                  DataPoint(inputs={"text": "Hello world"}),
                  DataPoint(inputs={"text": "Testing evaluation"}),
              ],
              jobs=[text_analyzer],
              evaluators=[{"name": "length-check", "scorer": length_check_scorer}],
          )

      if __name__ == "__main__":
          asyncio.run(main())
      ```
    </CodeGroup>

    <Info>
      See the [Python Evaluatorq](https://github.com/orq-ai/orqkit/tree/main/packages/evaluatorq-py) and [TypeScript Evaluatorq](https://github.com/orq-ai/orqkit/tree/main/packages/evaluatorq) repositories for more.
    </Info>

    <Card title="Cookbook: Running evaluations in parallel with Evaluatorq" icon="flask" href="/docs/tutorials/evaluator-q">
      Step-by-step walkthrough comparing agent variants with parallel evaluators, including DeepEval and RAGAS integration.
    </Card>
  </Tab>
</Tabs>
