What is the Orq MCP?
The Orq Model Context Protocol (MCP) server provides AI code assistants with direct access to your Orq.ai workspace. With 30 specialized tools, you can manage experiments, create datasets, configure evaluators, and analyze traces without leaving your IDE.Key Capabilities
Agent Creation
Create, update, and configure agents with instructions, tools, models, evaluators, and guardrails
Experiment Management
Run experiments, compare prompts or models side-by-side, and export results
Dataset Operations
Create datasets, add or edit datapoints, and generate synthetic test data
Analytics & Insights
Query usage, cost, latency, and error metrics across your workspace
Evaluator & Guardrail Configuration
Create and update LLM-as-a-Judge and Python evaluators, and attach guardrails to agents
Docs Exploration
Search the Orq.ai documentation without leaving your IDE
Quickstart
Point your assistant at the MCP server and authenticate with your API key:| Endpoint | https://my.orq.ai/v2/mcp |
| Auth Header | Authorization: Bearer YOUR_ORQ_API_KEY |
Code Assistants
See detailed documentation for the following code assistants:Claude Code
Official Anthropic CLI for Claude with MCP integration
Claude Desktop
Use Orq MCP in Claude’s desktop application
Codex
AI coding assistant with MCP protocol support
Cursor
AI-first code editor with native MCP support
Warp
AI-powered terminal with native MCP support
Available Tools
The Orq MCP provides 30 tools across 10 categories:| Category | Tool | Description |
|---|---|---|
| Agents | get_agent | Retrieve agent configuration and details |
| Agents | create_agent | Create a new agent with instructions, tools, models, evaluators, and guardrails |
| Agents | update_agent | Update an existing agent’s configuration (instructions, model, tools, evaluators, guardrails) |
| Analytics | get_analytics_overview | Get workspace snapshot (requests, cost, tokens, errors, error rate, latency, top models) |
| Analytics | query_analytics | Flexible drill-down with filtering and grouping |
| Dataset | create_dataset | Create a new dataset |
| Dataset | list_datapoints | List datapoints in a dataset |
| Dataset | create_datapoints | Create datapoints (max 100) |
| Dataset | update_datapoint | Update a datapoint |
| Dataset | delete_datapoints | Delete datapoints (max 100) |
| Dataset | delete_dataset | Delete a dataset and all datapoints |
| Evaluator | get_llm_eval | Retrieve an LLM-as-a-Judge evaluator configuration |
| Evaluator | get_python_eval | Retrieve a Python code evaluator configuration |
| Evaluator | create_llm_eval | Create LLM-as-a-Judge evaluator |
| Evaluator | create_python_eval | Create Python code evaluator |
| Evaluator | update_llm_eval | Update an existing LLM-as-a-Judge evaluator (prompt, model, output type) |
| Evaluator | update_python_eval | Update an existing Python code evaluator (code, output type) |
| Experiment | list_experiment_runs | List runs with cursor pagination |
| Experiment | get_experiment_run | Export run (JSON/JSONL/CSV) |
| Experiment | create_experiment | Create experiment from dataset with optional auto-run |
| Models | list_models | List available AI models by type (chat, embedding, image, tts, stt, and more) |
| Registry | list_registry_keys | List available attribute keys for filtering traces |
| Registry | list_registry_values | List top values for a specific attribute |
| Search | search_entities | Search any entity type: project, dataset, prompt, experiment, agent, evaluator, knowledge, memory store, or deployment (supports cursor pagination) |
| Search | search_directories | List directories within a project |
| Search | search_docs | Query the Orq.ai documentation for feature guidance and API reference |
| Traces | list_traces | List traces with filtering by model, type, project, thread ID, time range, and more |
| Traces | get_span | Retrieve a single span (compact or full mode) |
| Traces | list_spans | List all spans in a trace |
| Workspace | delete_entity | Delete any entity by type and ID. Supported types: agent, prompt, experiment, evaluator, knowledge, memory_store, prompt_snippet, sheet, tool. Use delete_dataset to delete a dataset along with all its datapoints |
Examples
Investigating Traces
Investigating Traces
Find errors from the last 24 hoursThe assistant will:
Detect regressions after a model switchThe assistant will:
Find the slowest tracesThe assistant will:
Filter traces by thread IDThe assistant will:
- Calculate the unix timestamp for 24 hours ago
- Use
list_traceswith filterstatus:=ERROR && timestamp:>TIMESTAMPand sort bytimestamp:desc - Display trace IDs, names, durations, and timestamps
- Summarize the most common error types and their frequency
Detect regressions after a model switch
- Use
query_analyticswithmetric: "latency"andgroup_by: ["model"]for the period before the switch - Repeat for the period after the switch
- Compare average latency per model across both windows and surface any regressions
Find the slowest traces
- Use
list_tracessorted byduration_ms:desc, filtered to today, limit 5 - Use
list_spanswith eachtrace_idto retrieve the full span tree - Surface bottlenecks and latency outliers
Filter traces by thread ID
- Use
list_traceswiththread_id: "thread_abc123" - Return all traces associated with that conversation thread
- Surface turn count, total cost, and any errors across the session
Running Experiments
Running Experiments
Compare two models on an existing datasetThe assistant will:
Compare two prompt strategiesThe assistant will:
Export experiment resultsThe assistant will:
- Search for the “user-queries” dataset using
search_entities - Use
create_experimentwith two model configurations andauto_runenabled - Return the experiment ID once both configurations have run
Compare two prompt strategies
- Search for the dataset using
search_entities - Use
create_experimentwith two prompt variants andauto_runenabled - Use
get_experiment_runto retrieve evaluation metrics - Compare the variants and summarize which performed better
Export experiment results
- Use
list_experiment_runsto find the most recent run - Use
get_experiment_runwith CSV export format - Return a signed download URL for the CSV file
Managing Datasets
Managing Datasets
Create a synthetic datasetThe assistant will:
Import data from codeThe assistant will:
Update or clean up a datasetThe assistant will:
- Generate 50 synthetic question/answer pairs
- Use
create_datasetto create the dataset - Use
create_datapointsto add all entries in bulk, each formatted as{ inputs: { question: "..." }, expected_output: "..." }
Import data from code
- Parse the JSON from your selection or context
- Use
create_datasetwith an appropriate name - Use
create_datapointsto add each entry as a datapoint
Update or clean up a dataset
- Use
search_entitiesto find the “staging-tests” dataset and retrieve its ID - Use
list_datapointsto retrieve all entries - Filter for datapoints with empty
expected_output - Use
delete_datapointsto remove them in batches
Evaluators
Evaluators
Retrieve an evaluator’s configurationThe assistant will:
Create an LLM-as-a-Judge evaluatorThe assistant will:
Create a Python evaluatorThe assistant will:
Create an experiment with evaluatorsThe assistant will:
Update an existing evaluatorThe assistant will:
- Search for the evaluator using
search_entitiesto resolve its ID - Use
get_llm_evalorget_python_evalto retrieve the full configuration - Display the prompt, model, output type, and other settings
Create an LLM-as-a-Judge evaluator
- Use
create_llm_evalwith a scoring rubric for tone classification - Confirm the evaluator ID and configuration
Create a Python evaluator
- Write a Python snippet that parses the response and validates JSON structure
- Use
create_python_evalto register it in your workspace
Create an experiment with evaluators
- Search for the dataset using
search_entities - Use
search_entitiesto find the evaluator and get its key, or use the key returned bycreate_llm_eval/create_python_evalif created in the same session - Use
create_experimentwith both the dataset ID and evaluator ID, withauto_runenabled
Update an existing evaluator
- Search for the evaluator using
search_entities - Use
update_llm_evalwith the evaluator ID, updatedprompt, andoutput_type: "boolean" - Confirm the new configuration
Managing Entities
Managing Entities
Delete a workspace entityThe assistant will:
- Search for the experiment using
search_entities - Use
delete_entitywithtype: "experiment"and the resolved ID - Confirm deletion
Supported
type values: agent, prompt, experiment, evaluator, knowledge, memory_store, prompt_snippet, sheet, tool. Use delete_dataset to delete a dataset along with all its datapoints.Documentation Search
Documentation Search
Look up a feature in the Orq.ai docsThe assistant will:
Get started with a specific product areaThe assistant will:
- Use
search_docswith a relevant query - Return matching documentation sections with guidance and examples
- Summarize the answer in context
Get started with a specific product area
- Use
search_docsto find Router onboarding content - Return setup steps, configuration options, and quick-start examples
Analytics
Analytics
Get a workspace snapshotThe assistant will:
Drill into a specific model’s performanceThe assistant will:
Identify your most expensive modelsThe assistant will:
- Use
get_analytics_overviewwith a 7-day range - Return total requests, cost, tokens, error rate, latency, and top models
Drill into a specific model’s performance
- Use
query_analyticswithmetric: "errors", filtered by model and a 7-day range - Use
query_analyticswithmetric: "cost", filtered by model and a 7-day range - Surface error rate trends and cost breakdown side by side
Identify your most expensive models
- Use
query_analyticswithmetric: "cost",group_by: ["model"], and a 30-day range - Aggregate cost per model across all time buckets and rank them by total spend