MCP Integration
Access your Orq.ai workspace directly from Claude Code. Manage experiments, query traces, and configure agents using natural language.
AI Router Beta
Route Claude Code’s model calls through the AI Router.
MCP
Claude Code is Anthropic’s official CLI that brings Claude’s capabilities to your terminal and development workflow. With the Orq MCP integration, you can access all Orq.ai features directly through Claude Code’s conversational interface.Prerequisites
- Claude Code CLI installed
- Active Orq.ai account
- Orq.ai API key
Installation
Add the Orq MCP server to Claude Code with a single command:Make sure to set your
ORQ_API_KEY environment variable before running the command:Verify Installation
Check that the Orq MCP is installed:orq in the list of available MCP servers.
Available Commands
Once integrated, you can ask Claude Code to perform these operations:Agents
Agents
Create an agent with custom instructions and toolsGet agent configuration for [agent-key]Update agent [agent-key] with new instructions or modelConfigure agent with evaluators and guardrails
Analytics
Analytics
Get analytics overview for my workspaceShow me workspace metrics for the last 7 daysQuery analytics filtered by deployment ID
Datasets
Datasets
Create a dataset called "customer-queries"List all datapoints in dataset [dataset-key]Add datapoints to dataset [dataset-key]Update datapoint [datapoint-id]Delete specific datapoints in dataset [dataset-key]Delete dataset [dataset-key]
Experiments
Experiments
Create an experiment from dataset [dataset-key]List all experiment runsExport experiment run [run-id] as CSVRun experiment and auto-evaluate results
Evaluators
Evaluators
Get evaluator configuration for [evaluator-key]Create an LLM-as-a-Judge evaluator for toneCreate a Python evaluator to check response lengthAdd evaluator to experiment [experiment-key]Update evaluator [evaluator-key] with a new promptUpdate Python evaluator [evaluator-key] with revised code
Traces
Traces
List traces from the last 24 hoursShow me traces with errorsGet span details for trace [trace-id]Find the slowest traces from todayShow all traces for thread [thread-id]
Models
Models
List all available chat modelsList all available embedding models
Registry
Registry
List registry keys for filtering tracesList top values for [attribute-key]
Search
Search
Search for datasets named "customer"Find experiments in project [project-id]Find experiments in project [project-key]
Documentation
Documentation
Search the Orq.ai docs for [topic]
Managing Entities
Managing Entities
Delete agent [agent-key]Delete experiment [experiment-key]Delete evaluator [evaluator-key]Delete prompt [prompt-key]Delete knowledge base [knowledge-base-key]
delete_dataset to delete a dataset along with all its datapoints.Usage Examples
Create an Experiment
- Use
search_entitiesto find the “customer-queries” dataset - Use
create_experimentwith the specified name and dataset ID - Configure task columns with GPT-5.2 and Claude Sonnet 4.6 models
- Return the experiment ID and configuration details
Query Trace Analytics
- Calculate the time range for the last 24 hours
- Use
list_traceswith error status filter - Analyze the error data
- Provide a summary of total error count, error types and frequencies, affected traces, and time distribution
Create a Synthetic Dataset
- Generate 50 synthetic customer questions about e-commerce products
- Use
create_datasetto create a new dataset named “Product Questions” - Use
create_datapointsto add all 50 questions to the dataset - Confirm creation with the dataset ID and summary
Performance Analysis
- Use
query_analyticswith a 7-day time range - Analyze average latency trends over time
- Review token usage patterns and cost variations
- Compare error rate changes across the week
- Provide insights on model performance comparisons and trends
Complete Experiment Creation
- Read and parse your CSV file
- Use
create_datasetto create a new dataset with an auto-generated name - Use
create_datapointsto add all 100 customer queries from the CSV - Use
create_llm_evalto create an LLM-as-a-Judge evaluator for tone - Use
create_llm_evalagain to create an LLM-as-a-Judge evaluator for accuracy - Use
create_experimentwith the dataset ID and auto-run enabled - Configure two task columns (one for GPT-5.2, one for Claude Sonnet 4.6)
- Execute the experiment automatically via the auto-run option
- Summarize the results with evaluation scores for both models
Trace Investigation
- Calculate yesterday’s date range
- Use
list_traceswith latency sorting (descending) and limit of 10 - Use
list_spansto retrieve span information for each trace - Analyze the execution patterns and span durations
- Provide performance insights identifying bottlenecks
- Suggest optimization opportunities based on the data
Troubleshooting
Authentication Errors
Authentication Errors
- Verify your API key is valid:
echo $ORQ_API_KEY - Check the API key has the necessary permissions
- Re-add the MCP with the correct API key
Connection Issues
Connection Issues
- Verify the endpoint URL is correct
- Check your internet connection
- Try removing and re-adding the integration
Tool Not Found
Tool Not Found
- Get MCP server details:
claude mcp get orq - Verify the MCP is properly installed:
claude mcp list
Skills
Skills extend Claude Code with pre-built agentic workflows for the full Build, Evaluate, Optimize lifecycle. See the Skills page for the full reference.Installation
Commands
Quick slash-command actions available in Claude Code:| Command | Description |
|---|---|
/orq:quickstart | Interactive onboarding: credentials, MCP setup, skills tour |
/orq:workspace | Workspace overview: agents, deployments, prompts, datasets |
/orq:traces | Query and summarize traces with filters |
/orq:models | List available AI models by provider |
/orq:analytics | Usage analytics: requests, cost, tokens, errors |
Available Skills
Triggered by describing what you need. Claude Code picks the right skill automatically.| Skill | Description |
|---|---|
| build-agent | Design, create, and configure an Orq.ai agent |
| build-evaluator | Create validated LLM-as-a-Judge evaluators |
| analyze-trace-failures | Read production traces and categorize failures |
| run-experiment | Create and run experiments with evaluation |
| generate-synthetic-dataset | Generate and curate evaluation datasets |
| optimize-prompt | Analyze and optimize system prompts |
| setup-observability | Instrument LLM applications with orq.ai tracing: AI Router for zero-code traces, or OpenTelemetry for framework-level spans |
| compare-agents | Run cross-framework agent comparisons using evaluatorq |
AI Router
Beta Set the following environment variables before launching Claude Code. Once set, every model call Claude Code makes is automatically routed through the Orq.ai AI Router for the duration of that session.Traces are not yet available for Claude Code routed through the AI Router.