Annotation Queues

After observing an application in production, the next step is annotating and curating that data to build evaluation datasets. This process turns raw production logs into high-quality test cases that drive systematic improvement. Use Cases

Collecting quality feedback

Capture thumbs up/down ratings, custom scores, or categorical labels on AI responses. Build a feedback loop that surfaces low-quality generations for review.

Compliance and QA review

Flag responses with specific defects (hallucination, off-topic, inappropriate content) using structured annotation keys shared across the team.

Dataset curation

Annotate Traces with corrections and quality labels, then export curated subsets as training datasets for future experiments.

Human-in-the-loop workflows

Route Traces to Annotation Queues for systematic expert review. Combine with Trace Automations to automatically surface Traces that meet specific criteria.

Concepts Three concepts work together to form the annotations system:

Human Reviews: define the schema (key, value type, options) that annotations must conform to
Annotation Queues: organized workflows for reviewing Traces in bulk via AI Studio
Annotations API: the API and SDK for applying feedback values to a Trace or span programmatically

Human Reviews

Define annotation schemas: keys, value types, and validation rules. Available on chat completion and responses spans once created.

Organize human review workflows. Filter and present relevant Traces for review in bulk.

Annotations API

Apply structured human feedback to Traces and spans programmatically via the API and SDK.

Create Human Review

Human Reviews define the structure and validation rules for annotations. Each annotation key must match an existing Human Review definition in the project.

AI Studio

To create a Human Review, head to Project Settings > Human Review and press the button. Human Reviews can also be created directly from an Annotation Queue.

Create human review form with Key, Title, Description fields and a Type selector showing Categorical, Range, and Text options. — Customizing a Human Review.

Three Human Review types are available:

Categorical: button options with custom labels, such as good/bad or saved/deleted
Range: a custom scoring slider, for example a scale from 0 to 100
Open field: free-form text input for detailed comments

Once created, a Human Review is available on all chat completion spans and responses spans in the project. No additional configuration or filtering required.

Deleting a Human Review removes it from any Annotation Queues and Experiments that use it, so it no longer appears as a review option there. Annotations already recorded with that Human Review are preserved: every annotated data point remains stored and queryable.

Common Annotation Types Legacy

Rating

Rate the overall quality of AI responses:

Rating	Description
good	The response was helpful and accurate.
bad	The response was unhelpful or inaccurate.

Defects

Identify specific issues with AI responses:

Defect	Description
grammatical	Responses that contain grammatical errors
spelling	Responses that contain spelling errors
hallucination	Responses that contain hallucinations or factual inaccuracies
repetition	Responses that contain unnecessary repetition
inappropriate	Responses that are deemed inappropriate or offensive
off_topic	Responses that do not address the user’s query
incompleteness	Responses that are incomplete or partially address the query
ambiguity	Responses that are vague or unclear

Multiple defects can be selected for one response using an array-type Human Review.

Use Annotations

Annotations can be applied wherever a Trace or span is reviewed:

Directly on a Trace or Log: open a single Trace or Log in the Traces or Logs view and use the Annotations panel.
In an Annotation Queue: review a curated set of Traces in bulk. Fill a queue with Trace Automations or by manually adding individual Traces or Logs.
Programmatically: apply feedback through the API and SDK using the API & SDK tab below.
In an Experiment: apply Human Reviews while reviewing experiment outputs.

Every annotation applied in an Annotation Queue is written back to its originating Trace. Because the values live on the Trace, they can be queried with the Orq MCP and used to run analysis across reviewed data.

AI Studio
API & SDK

The annotation capabilities differ between Logs and Traces. Logs support both human feedback and corrections, while Traces only support human feedback annotations.

Traces
Logs

Navigate to the Traces view and select a single trace. The Annotations panel will be displayed, allowing you to apply human feedback to the AI response.

Trace detail panel for a claude-sonnet chat-completion showing Evaluations section with Defects, Interactions, and Rating feedback options including good/bad thumbs. — The Annotations panel in Traces lets you apply human feedback.

Here are examples on how to use the API to annotate LLM responses.

Add a Quality Rating

curl -X POST "https://api.orq.ai/v2/traces/{trace_id}/spans/{span_id}/annotation" \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "annotations": [
      {
        "key": "rating",
        "value": "good"
      }
    ]
  }'

from orq_ai_sdk import Orq
import os

orq = Orq(api_key=os.getenv("ORQ_API_KEY"))

result = orq.annotations.create(
    trace_id="<trace_id>",
    span_id="<span_id>",
    annotations=[
        {
            "key": "rating",
            "value": "good"
        }
    ]
)

print(result)

import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env.ORQ_API_KEY,
});

const result = await orq.annotations.create({
  traceId: "<trace_id>",
  spanId: "<span_id>",
  annotations: [
    {
      key: "rating",
      value: "good"
    }
  ]
});

console.log(result);

Add Multiple Defects

curl -X POST "https://api.orq.ai/v2/traces/{trace_id}/spans/{span_id}/annotation" \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "annotations": [
      {
        "key": "defects",
        "value": ["grammatical", "spelling", "ambiguity"]
      }
    ]
  }'

from orq_ai_sdk import Orq
import os

orq = Orq(api_key=os.getenv("ORQ_API_KEY"))

result = orq.annotations.create(
    trace_id="<trace_id>",
    span_id="<span_id>",
    annotations=[
        {
            "key": "defects",
            "value": ["grammatical", "spelling", "ambiguity"]
        }
    ]
)

print(result)

import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env.ORQ_API_KEY,
});

const result = await orq.annotations.create({
  traceId: "<trace_id>",
  spanId: "<span_id>",
  annotations: [
    {
      key: "defects",
      value: ["grammatical", "spelling", "ambiguity"]
    }
  ]
});

console.log(result);

Add a Numeric Score

curl -X POST "https://api.orq.ai/v2/traces/{trace_id}/spans/{span_id}/annotation" \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "annotations": [
      {
        "key": "confidence_score",
        "value": 0.95
      }
    ],
    "metadata": {
      "identityId": "user-123"
    }
  }'

from orq_ai_sdk import Orq
import os

orq = Orq(api_key=os.getenv("ORQ_API_KEY"))

result = orq.annotations.create(
    trace_id="<trace_id>",
    span_id="<span_id>",
    annotations=[
        {
            "key": "confidence_score",
            "value": 0.95
        }
    ],
    metadata={
        "identityId": "user-123"
    }
)

print(result)

import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env.ORQ_API_KEY,
});

const result = await orq.annotations.create({
  traceId: "<trace_id>",
  spanId: "<span_id>",
  annotations: [
    {
      key: "confidence_score",
      value: 0.95
    }
  ],
  metadata: {
    identityId: "user-123"
  }
});

console.log(result);

Add a Text Correction

curl -X POST "https://api.orq.ai/v2/traces/{trace_id}/spans/{span_id}/annotation" \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "annotations": [
      {
        "key": "correction",
        "value": "The correct answer should emphasize scalability and fault tolerance."
      }
    ]
  }'

from orq_ai_sdk import Orq
import os

orq = Orq(api_key=os.getenv("ORQ_API_KEY"))

result = orq.annotations.create(
    trace_id="<trace_id>",
    span_id="<span_id>",
    annotations=[
        {
            "key": "correction",
            "value": "The correct answer should emphasize scalability and fault tolerance."
        }
    ]
)

print(result)

import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env.ORQ_API_KEY,
});

const result = await orq.annotations.create({
  traceId: "<trace_id>",
  spanId: "<span_id>",
  annotations: [
    {
      key: "correction",
      value: "The correct answer should emphasize scalability and fault tolerance."
    }
  ]
});

console.log(result);

Batch Add Multiple Annotations

curl -X POST "https://api.orq.ai/v2/traces/{trace_id}/spans/{span_id}/annotation" \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "annotations": [
      {
        "key": "rating",
        "value": "good"
      },
      {
        "key": "confidence_score",
        "value": 0.92
      },
      {
        "key": "categories",
        "value": ["helpful", "accurate", "concise"]
      }
    ]
  }'

from orq_ai_sdk import Orq
import os

orq = Orq(api_key=os.getenv("ORQ_API_KEY"))

result = orq.annotations.create(
    trace_id="<trace_id>",
    span_id="<span_id>",
    annotations=[
        {"key": "rating", "value": "good"},
        {"key": "confidence_score", "value": 0.92},
        {"key": "categories", "value": ["helpful", "accurate", "concise"]}
    ]
)

print(result)

import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env.ORQ_API_KEY,
});

const result = await orq.annotations.create({
  traceId: "<trace_id>",
  spanId: "<span_id>",
  annotations: [
    { key: "rating", value: "good" },
    { key: "confidence_score", value: 0.92 },
    { key: "categories", value: ["helpful", "accurate", "concise"] }
  ]
});

console.log(result);

Remove Annotations

curl -X DELETE "https://api.orq.ai/v2/traces/{trace_id}/spans/{span_id}/annotation" \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "keys": ["rating", "defects"]
  }'

from orq_ai_sdk import Orq
import os

orq = Orq(api_key=os.getenv("ORQ_API_KEY"))

result = orq.annotations.delete(
    trace_id="<trace_id>",
    span_id="<span_id>",
    keys=["rating", "defects"]
)

print(result)

import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env.ORQ_API_KEY,
});

const result = await orq.annotations.delete({
  traceId: "<trace_id>",
  spanId: "<span_id>",
  keys: ["rating", "defects"]
});

console.log(result);

API Error Handling

Status Code	Error	Example Message	Solution
404	Human Review Not Found	`The human review with key "rating" for workspace abc123 was not found.`	Create a Human Review with the specified key before annotating.
404	Span Not Found	`Span with id xyz789 for workspace abc123 was not found.`	Verify the `trace_id` and `span_id` are correct and belong to the workspace.
400	Invalid Value	`Invalid value: poor. Valid options are: good, bad.`	Ensure the value matches the options defined in the Human Review.
400	Value Out of Range	`Value 15 is out of range [0, 10].`	Provide a number within the defined min/max range for the Human Review.
400	String Too Long	`String value exceeds maximum length of 200 characters.`	Shorten the string annotation to 200 characters or less.

See a complete feedback loop implemented from scratch. Read our cookbook Capturing User Feedback.

Constraints

Batch Limits: up to 10 annotations per create request, up to 10 keys per delete request
String Length: string values are limited to 200 characters maximum
Deployment Span Propagation: when annotating a deployment span, the associated log is automatically annotated with the same values
Metadata Fields: optional metadata object supports identityId, source, and reviewerId for tracking and attribution

Create Annotation Queues

Annotation Queues help you organize and apply Human Reviews effectively to relevant incoming Traces.

AI Studio

To create an Annotation Queue, head to AI Studio > Annotation Queue.Choose Create Annotation Queue.The following fields are configurable:

The Name of the queue
The Description of the Annotation Queue
The Human Reviews that Traces will be reviewed by

Create Annotation Queue panel with fields for name, description, and human reviews, showing Defects, Interactions, and Rating tags selected. — Create Annotation Queue panel showing name, description, and Human Reviews fields.

Fill Annotation Queues

Once a queue exists, fill it with the Traces to review. Traces can be added automatically or manually.

Automatically
Manually

Use Trace Automations to route Traces into a queue based on configured rules. Add an Add to Annotation Queue action to an automation and select the target queue. As matching Traces arrive, they are added to the queue without manual effort, which keeps a steady stream of relevant Traces ready for review.

Edit Automation panel with a metadata filter on request_id, an Add to Annotation Queue action selecting the fireflies_annotation queue, and an Apply Evaluator action marked Coming soon. — An automation with an Add to Annotation Queue action routing matching Traces into a queue.

Use Annotation Queues

Open an Annotation Queue to step through its Traces one at a time in the review screen.

Annotation Queue review screen showing Item 7 of 43 in the header, a left panel with Inputs, Metrics (Latency, Cost, tokens), and Task (Model claude-haiku-4-5, Provider anthropic), a center panel with the System instructions, User input, and Assistant output, and a right Annotations panel with a comment field and a rating with good and bad buttons. A dataset selector and Add to dataset button sit at the bottom. — Annotation Queue review screen. Left: Inputs, Metrics, and Task. Center: the full interaction. Right: the Annotations panel.

The screen is divided into three panels:

Left: details for the selected Trace.
- Inputs: the variables mapped to inputs, when configured.
- Metrics: latency, cost, and token usage.
- Task: the model, provider, and other configuration parameters.
The header shows the current position, the total number of items in the queue, and how many have already been reviewed.
Center: the full interaction for the selected Trace.
Right: the Annotations panel with the Human Reviews configured for the queue, such as a rating with categorical buttons or an open comment field. Selecting a value saves immediately and marks the Trace as reviewed.

Navigate between items with K (previous) and J (next), or use the up and down buttons at the top left. When a data point is worth reusing, select Add to dataset to send the Trace to a Dataset for use in a future Experiment.

Adding a Trace to a Dataset does not copy its annotations for now. As noted above, the annotation values stay on the originating Trace, where they remain queryable via the Orq MCP.

Annotations in Experiments

Human Reviews can also be applied outside of Annotation Queues, while reviewing the outputs of an Experiment. In the experiment review screen, the Human Reviews defined for the project appear alongside Evaluator scores, so outputs can be annotated manually as part of an evaluation run.

Get Started

AI Gateway

AI Observability

AI Engineering

AI Governance

AI Chat

Organization

Annotation Queues

Human Reviews