Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

MCP Integration

Access your Orq.ai workspace directly from Claude Code. Manage experiments, query traces, and configure agents using natural language.

AI Gateway Beta

Route Claude Code’s model calls through the AI Gateway.

MCP

Claude Code is Anthropic’s official CLI that brings Claude’s capabilities to your terminal and development workflow. With the Orq MCP integration, you can access all Orq.ai features directly through Claude Code’s conversational interface.

Prerequisites

Claude Code CLI installed
Active Orq.ai account
Orq.ai API key

Installation

Add the Orq MCP server to Claude Code with a single command:

claude mcp add --transport http orq https://my.orq.ai/v2/mcp --header "Authorization: Bearer ${ORQ_API_KEY}"

Make sure to set your ORQ_API_KEY environment variable before running the command:

export ORQ_API_KEY="your-api-key-here"

Verify Installation

Check that the Orq MCP is installed:

claude mcp list

You should see orq in the list of available MCP servers.

Available Commands

Once integrated, you can ask Claude Code to perform these operations:

Agents

Create an agent with custom instructions and tools
Get agent configuration for [agent-key]
Update agent [agent-key] with new instructions or model
Configure agent with evaluators and guardrails

Analytics

Get analytics overview for my workspace
Show me workspace metrics for the last 7 days
Query analytics filtered by deployment ID

Datasets

Create a dataset called "customer-queries"
List all datapoints in dataset [dataset-key]
Add datapoints to dataset [dataset-key]
Update datapoint [datapoint-id]
Delete specific datapoints in dataset [dataset-key]
Delete dataset [dataset-key]

Experiments

Create an experiment from dataset [dataset-key]
List all experiment runs
Export experiment run [run-id] as CSV
Run experiment and auto-evaluate results

Evaluators

Get evaluator configuration for [evaluator-key]
Create an LLM-as-a-Judge evaluator for tone
Create a Python evaluator to check response length
Add evaluator to experiment [experiment-key]
Update evaluator [evaluator-key] with a new prompt
Update Python evaluator [evaluator-key] with revised code

Traces

List traces from the last 24 hours
Show me traces with errors
Get span details for trace [trace-id]
Find the slowest traces from today
Show all traces for thread [thread-id]

Models

List all available chat models
List all available embedding models

Registry

List registry keys for filtering traces
List top values for [attribute-key]

Search for datasets named "customer"
Find experiments in project [project-id]
Find experiments in project [project-key]

Documentation

Search the Orq.ai docs for [topic]

Managing Entities

Delete agent [agent-key]
Delete experiment [experiment-key]
Delete evaluator [evaluator-key]
Delete prompt [prompt-key]
Delete knowledge base [knowledge-base-key]

Use delete_dataset to delete a dataset along with all its datapoints.

Usage Examples

Create an Experiment

Create an experiment called "GPT-5.2 vs Claude Sonnet 4.6 Comparison" using the "customer-queries" dataset

Claude Code will:

Use search_entities to find the “customer-queries” dataset
Use create_experiment with the specified name and dataset ID
Configure task columns with GPT-5.2 and Claude Sonnet 4.6 models
Return the experiment ID and configuration details

Query Trace Analytics

Has my system thrown any errors in the last 24 hours?

Claude Code will:

Calculate the time range for the last 24 hours
Use list_traces with error status filter
Analyze the error data
Provide a summary of total error count, error types and frequencies, affected traces, and time distribution

Create a Synthetic Dataset

Create a dataset called "Product Questions" with 50 synthetic customer questions about e-commerce products

Claude Code will:

Generate 50 synthetic customer questions about e-commerce products
Use create_dataset to create a new dataset named “Product Questions”
Use create_datapoints to add all 50 questions to the dataset
Confirm creation with the dataset ID and summary

Performance Analysis

Has my system's performance improved or decreased over the past week?

Claude Code will:

Use query_analytics with a 7-day time range
Analyze average latency trends over time
Review token usage patterns and cost variations
Compare error rate changes across the week
Provide insights on model performance comparisons and trends

Complete Experiment Creation

I have a CSV file with 100 customer queries. Create a dataset, add an LLM evaluator for tone and accuracy, then run an experiment comparing GPT-5.2 and Claude Sonnet 4.6

Claude Code will:

Read and parse your CSV file
Use create_dataset to create a new dataset with an auto-generated name
Use create_datapoints to add all 100 customer queries from the CSV
Use create_llm_eval to create an LLM-as-a-Judge evaluator for tone
Use create_llm_eval again to create an LLM-as-a-Judge evaluator for accuracy
Use create_experiment with the dataset ID and auto-run enabled
Configure two task columns (one for GPT-5.2, one for Claude Sonnet 4.6)
Execute the experiment automatically via the auto-run option
Summarize the results with evaluation scores for both models

Trace Investigation

Show me the 10 slowest traces from yesterday and explain what might be causing the latency

Claude Code will:

Calculate yesterday’s date range
Use list_traces with latency sorting (descending) and limit of 10
Use list_spans to retrieve span information for each trace
Analyze the execution patterns and span durations
Provide performance insights identifying bottlenecks
Suggest optimization opportunities based on the data

Troubleshooting

Authentication Errors

Verify your API key is valid: echo $ORQ_API_KEY
Check the API key has the necessary permissions
Re-add the MCP with the correct API key

Connection Issues

Verify the endpoint URL is correct
Check your internet connection
Try removing and re-adding the integration

Tool Not Found

Get MCP server details: claude mcp get orq
Verify the MCP is properly installed: claude mcp list

Plugins

The orq-ai/assistant-plugins marketplace exposes three plugins for Claude Code. Add the marketplace once, then install whichever plugins you need:

claude plugin marketplace add orq-ai/assistant-plugins

Plugin	Purpose
`orq-skills`	Pre-built agentic workflows and slash commands for the Build, Evaluate, Optimize lifecycle. Includes the Orq MCP server.
`orq-mcp`	Standalone Orq MCP server. Use this if you only want platform tool access without the skills bundle.
`orq-trace`	Automatically traces Claude Code sessions to orq.ai: captures sessions, turns, tool calls, and LLM responses as hierarchical OTLP spans.

orq-skills already bundles the MCP server. Don’t install orq-mcp alongside it or the MCP will be registered twice.

Skills

Skills extend Claude Code with pre-built agentic workflows for the full Build, Evaluate, Optimize lifecycle. See the Skills page for the full reference.

Installation

# Requires marketplace to be added first (see Plugins section above)
# Installs skills, commands, agents, and the MCP server
claude plugin install orq-skills@assistant-plugins

MCP server only

If you only want the Orq MCP server without the skills bundle, install orq-mcp instead:

claude plugin install orq-mcp@assistant-plugins

This gives Claude access to orq.ai platform tools (agents, analytics, traces, experiments) without registering the agentic workflows.

Session tracing

The orq-trace plugin captures every Claude Code session as a hierarchical trace in orq.ai: useful for reviewing past coding sessions, sharing context with teammates, or analyzing tool-call patterns across runs.

claude plugin install orq-trace@assistant-plugins

Set ORQ_API_KEY in your environment so the plugin can ship spans to your workspace. Sessions appear under Traces once the next session starts.

Commands

Quick slash-command actions available in Claude Code:

Command	Description
`/orq:quickstart`	Interactive onboarding: credentials, MCP setup, skills tour
`/orq:workspace`	Workspace overview: agents, deployments, prompts, datasets
`/orq:traces`	Query and summarize traces with filters
`/orq:models`	List available AI models by provider
`/orq:analytics`	Usage analytics: requests, cost, tokens, errors

Available Skills

Triggered by describing what you need. Claude Code picks the right skill automatically.

Skill	Description
build-agent	Design, create, and configure an Orq.ai agent
build-evaluator	Create validated LLM-as-a-Judge evaluators
analyze-trace-failures	Read production traces and categorize failures
run-experiment	Create and run experiments with evaluation
generate-synthetic-dataset	Generate and curate evaluation datasets
optimize-prompt	Analyze and optimize system prompts
setup-observability	Instrument LLM applications with orq.ai tracing: AI Gateway for zero-code traces, or OpenTelemetry for framework-level spans
compare-agents	Run cross-framework agent comparisons using evaluatorq

AI Gateway

Beta Set the following environment variables before launching Claude Code. Once set, every model call Claude Code makes is automatically routed through the Orq.ai AI Gateway for the duration of that session.

export ANTHROPIC_BASE_URL="https://api.orq.ai/v3/anthropic"
export ANTHROPIC_AUTH_TOKEN="$ORQ_API_KEY"
export ANTHROPIC_API_KEY=""  # must be set to empty to prevent Claude Code from using the Anthropic API directly
export ANTHROPIC_MODEL="anthropic/claude-sonnet-4-5"  # the anthropic/ prefix is required

Claude Code requests routed through the AI Gateway appear in Traces.

MCP Integration

AI Gateway Beta

​MCP

​Prerequisites

​Installation

​Verify Installation

​Available Commands

​Usage Examples

​Create an Experiment

​Query Trace Analytics

​Create a Synthetic Dataset

​Performance Analysis

​Complete Experiment Creation

​Trace Investigation

​Troubleshooting

​Plugins

​Skills

​Installation

​MCP server only

​Session tracing

​Commands

​Available Skills

​AI Gateway

MCP

Prerequisites

Installation

Verify Installation

Available Commands

Usage Examples

Create an Experiment

Query Trace Analytics

Create a Synthetic Dataset

Performance Analysis

Complete Experiment Creation

Trace Investigation

Troubleshooting

Plugins

Skills

Installation

MCP server only

Session tracing

Commands

Available Skills

AI Gateway