Skip to main content

MCP Integration

Access your Orq.ai workspace directly from Claude Code. Manage experiments, query traces, and configure agents using natural language.

AI Router  Beta

Route Claude Code’s model calls through the AI Router.

MCP

Claude Code is Anthropic’s official CLI that brings Claude’s capabilities to your terminal and development workflow. With the Orq MCP integration, you can access all Orq.ai features directly through Claude Code’s conversational interface.

Prerequisites

Installation

Add the Orq MCP server to Claude Code with a single command:
claude mcp add --transport http orq https://my.orq.ai/v2/mcp --header "Authorization: Bearer ${ORQ_API_KEY}"
Make sure to set your ORQ_API_KEY environment variable before running the command:
export ORQ_API_KEY="your-api-key-here"

Verify Installation

Check that the Orq MCP is installed:
claude mcp list
You should see orq in the list of available MCP servers.

Available Commands

Once integrated, you can ask Claude Code to perform these operations:
  • Create an agent with custom instructions and tools
  • Get agent configuration for [agent-key]
  • Update agent [agent-key] with new instructions or model
  • Configure agent with evaluators and guardrails
  • Get analytics overview for my workspace
  • Show me workspace metrics for the last 7 days
  • Query analytics filtered by deployment ID
  • Create a dataset called "customer-queries"
  • List all datapoints in dataset [dataset-key]
  • Add datapoints to dataset [dataset-key]
  • Update datapoint [datapoint-id]
  • Delete specific datapoints in dataset [dataset-key]
  • Delete dataset [dataset-key]
  • Create an experiment from dataset [dataset-key]
  • List all experiment runs
  • Export experiment run [run-id] as CSV
  • Run experiment and auto-evaluate results
  • Get evaluator configuration for [evaluator-key]
  • Create an LLM-as-a-Judge evaluator for tone
  • Create a Python evaluator to check response length
  • Add evaluator to experiment [experiment-key]
  • Update evaluator [evaluator-key] with a new prompt
  • Update Python evaluator [evaluator-key] with revised code
  • List traces from the last 24 hours
  • Show me traces with errors
  • Get span details for trace [trace-id]
  • Find the slowest traces from today
  • Show all traces for thread [thread-id]
  • List all available chat models
  • List all available embedding models
  • List registry keys for filtering traces
  • List top values for [attribute-key]
  • Search the Orq.ai docs for [topic]
  • Delete agent [agent-key]
  • Delete experiment [experiment-key]
  • Delete evaluator [evaluator-key]
  • Delete prompt [prompt-key]
  • Delete knowledge base [knowledge-base-key]
Use delete_dataset to delete a dataset along with all its datapoints.

Usage Examples

Create an Experiment

Create an experiment called "GPT-5.2 vs Claude Sonnet 4.6 Comparison" using the "customer-queries" dataset
Claude Code will:
  1. Use search_entities to find the “customer-queries” dataset
  2. Use create_experiment with the specified name and dataset ID
  3. Configure task columns with GPT-5.2 and Claude Sonnet 4.6 models
  4. Return the experiment ID and configuration details

Query Trace Analytics

Has my system thrown any errors in the last 24 hours?
Claude Code will:
  1. Calculate the time range for the last 24 hours
  2. Use list_traces with error status filter
  3. Analyze the error data
  4. Provide a summary of total error count, error types and frequencies, affected traces, and time distribution

Create a Synthetic Dataset

Create a dataset called "Product Questions" with 50 synthetic customer questions about e-commerce products
Claude Code will:
  1. Generate 50 synthetic customer questions about e-commerce products
  2. Use create_dataset to create a new dataset named “Product Questions”
  3. Use create_datapoints to add all 50 questions to the dataset
  4. Confirm creation with the dataset ID and summary

Performance Analysis

Has my system's performance improved or decreased over the past week?
Claude Code will:
  1. Use query_analytics with a 7-day time range
  2. Analyze average latency trends over time
  3. Review token usage patterns and cost variations
  4. Compare error rate changes across the week
  5. Provide insights on model performance comparisons and trends

Complete Experiment Creation

I have a CSV file with 100 customer queries. Create a dataset, add an LLM evaluator for tone and accuracy, then run an experiment comparing GPT-5.2 and Claude Sonnet 4.6
Claude Code will:
  1. Read and parse your CSV file
  2. Use create_dataset to create a new dataset with an auto-generated name
  3. Use create_datapoints to add all 100 customer queries from the CSV
  4. Use create_llm_eval to create an LLM-as-a-Judge evaluator for tone
  5. Use create_llm_eval again to create an LLM-as-a-Judge evaluator for accuracy
  6. Use create_experiment with the dataset ID and auto-run enabled
  7. Configure two task columns (one for GPT-5.2, one for Claude Sonnet 4.6)
  8. Execute the experiment automatically via the auto-run option
  9. Summarize the results with evaluation scores for both models

Trace Investigation

Show me the 10 slowest traces from yesterday and explain what might be causing the latency
Claude Code will:
  1. Calculate yesterday’s date range
  2. Use list_traces with latency sorting (descending) and limit of 10
  3. Use list_spans to retrieve span information for each trace
  4. Analyze the execution patterns and span durations
  5. Provide performance insights identifying bottlenecks
  6. Suggest optimization opportunities based on the data

Troubleshooting

  1. Verify your API key is valid: echo $ORQ_API_KEY
  2. Check the API key has the necessary permissions
  3. Re-add the MCP with the correct API key
  1. Verify the endpoint URL is correct
  2. Check your internet connection
  3. Try removing and re-adding the integration
  1. Get MCP server details: claude mcp get orq
  2. Verify the MCP is properly installed: claude mcp list

Skills

Skills extend Claude Code with pre-built agentic workflows for the full Build, Evaluate, Optimize lifecycle. See the Skills page for the full reference.

Installation

# Installs skills, commands, agents, and the MCP server in one step
claude plugin marketplace add orq-ai/claude-plugins
claude plugin install orq-skills@orq-claude-plugin

Commands

Quick slash-command actions available in Claude Code:
CommandDescription
/orq:quickstartInteractive onboarding: credentials, MCP setup, skills tour
/orq:workspaceWorkspace overview: agents, deployments, prompts, datasets
/orq:tracesQuery and summarize traces with filters
/orq:modelsList available AI models by provider
/orq:analyticsUsage analytics: requests, cost, tokens, errors

Available Skills

Triggered by describing what you need. Claude Code picks the right skill automatically.
SkillDescription
build-agentDesign, create, and configure an Orq.ai agent
build-evaluatorCreate validated LLM-as-a-Judge evaluators
analyze-trace-failuresRead production traces and categorize failures
run-experimentCreate and run experiments with evaluation
generate-synthetic-datasetGenerate and curate evaluation datasets
optimize-promptAnalyze and optimize system prompts
setup-observabilityInstrument LLM applications with orq.ai tracing: AI Router for zero-code traces, or OpenTelemetry for framework-level spans
compare-agentsRun cross-framework agent comparisons using evaluatorq

AI Router

Beta Set the following environment variables before launching Claude Code. Once set, every model call Claude Code makes is automatically routed through the Orq.ai AI Router for the duration of that session.
export ANTHROPIC_BASE_URL="https://my.orq.ai/v3/router/"
export ANTHROPIC_AUTH_TOKEN="$ORQ_API_KEY"
export ANTHROPIC_API_KEY=""  # must be set to empty to prevent Claude Code from using the Anthropic API directly
Traces are not yet available for Claude Code routed through the AI Router.