> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Orq MCP Server tools and quickstart

> Connect AI code assistants to your Orq.ai workspace via the Model Context Protocol. Reference for all 30 available tools with usage examples.

## What is the Orq MCP?

The Orq Model Context Protocol (MCP) server provides AI code assistants with direct access to your Orq.ai workspace. With 30 specialized tools, you can manage experiments, create datasets, configure evaluators, and analyze traces without leaving your IDE.

## Installation

Point your assistant at the MCP server and authenticate with your [API key](/docs/administer/api-keys):

|                 |                                          |
| --------------- | ---------------------------------------- |
| **Endpoint**    | `https://my.orq.ai/v2/mcp`               |
| **Auth Header** | `Authorization: Bearer YOUR_ORQ_API_KEY` |

### Code Assistants

See detailed documentation for the following code assistants:

<CardGroup cols={3}>
  <Card title="Claude Code" icon="https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/claude-code.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=1654adb689b324322f3d63dd9eef6ad0" href="/docs/integrations/code-assistants/claude-code" width="61" height="43" data-path="images/logos/claude-code.svg">
    Official Anthropic CLI for Claude with MCP integration
  </Card>

  <Card title="Claude Desktop" icon="https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/claude-desktop.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=bccf0e29820481de693393dd71aa84f3" href="/docs/integrations/code-assistants/claude-desktop" width="61" height="43" data-path="images/logos/claude-desktop.svg">
    Use Orq MCP in Claude's desktop application
  </Card>

  <Card title="Codex" icon="https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/codex.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=feb92a7651c85d15d424f900821581b6" href="/docs/integrations/code-assistants/codex" width="256" height="260" data-path="images/logos/codex.svg">
    AI coding assistant with MCP protocol support
  </Card>

  <Card title="Cursor" icon="https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/cursor.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=ba1ed7e66516eb920ae1462050041bdf" href="/docs/integrations/code-assistants/cursor" width="24" height="24" data-path="images/logos/cursor.svg">
    AI-first code editor with native MCP support
  </Card>

  <Card title="VS Code" icon="https://mintcdn.com/orqai/9VWlkBlGuTGPnFjG/images/logos/vscode.svg?fit=max&auto=format&n=9VWlkBlGuTGPnFjG&q=85&s=4c770da22e84990dee79094d1fab3179" href="/docs/integrations/code-assistants/vscode" width="24" height="24" data-path="images/logos/vscode.svg">
    AI-powered editor with GitHub Copilot and native MCP support
  </Card>

  <Card title="Warp" icon="https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/warp.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=2f4b99ef604241484380367ae742ea4d" href="/docs/integrations/code-assistants/warp" width="24" height="24" data-path="images/logos/warp.svg">
    AI-powered terminal with native MCP support
  </Card>
</CardGroup>

## Key Capabilities

<CardGroup cols={3}>
  <Card title="Agent Creation" icon="robot">
    Create, update, and configure agents with instructions, tools, models, evaluators, and guardrails
  </Card>

  <Card title="Experiment Management" icon="flask">
    Run experiments, compare prompts or models side-by-side, and export results
  </Card>

  <Card title="Dataset Operations" icon="database">
    Create datasets, add or edit datapoints, and generate synthetic test data
  </Card>

  <Card title="Analytics & Insights" icon="chart-line">
    Query usage, cost, latency, and error metrics across your workspace
  </Card>

  <Card title="Evaluator & Guardrail Configuration" icon="clipboard-check">
    Create and update LLM-as-a-Judge and Python evaluators, and attach guardrails to agents
  </Card>

  <Card title="Docs Exploration" icon="book-open">
    Search the **Orq.ai** documentation without leaving your IDE
  </Card>
</CardGroup>

## Available Tools

The Orq MCP provides 30 tools across 10 categories:

| Category   | Tool                         | Description                                                                                                                                                                                                                                      |
| ---------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Agents     | **get\_agent**               | Retrieve agent configuration and details                                                                                                                                                                                                         |
| Agents     | **create\_agent**            | Create a new agent with instructions, tools, models, evaluators, and guardrails                                                                                                                                                                  |
| Agents     | **update\_agent**            | Update an existing agent's configuration (instructions, model, tools, evaluators, guardrails)                                                                                                                                                    |
| Analytics  | **get\_analytics\_overview** | Get workspace snapshot (requests, cost, tokens, errors, error rate, latency, top models)                                                                                                                                                         |
| Analytics  | **query\_analytics**         | Flexible drill-down with filtering and grouping                                                                                                                                                                                                  |
| Dataset    | **create\_dataset**          | Create a new dataset                                                                                                                                                                                                                             |
| Dataset    | **list\_datapoints**         | List datapoints in a dataset                                                                                                                                                                                                                     |
| Dataset    | **create\_datapoints**       | Create datapoints (max 100)                                                                                                                                                                                                                      |
| Dataset    | **update\_datapoint**        | Update a datapoint                                                                                                                                                                                                                               |
| Dataset    | **delete\_datapoints**       | Delete datapoints (max 100)                                                                                                                                                                                                                      |
| Dataset    | **delete\_dataset**          | Delete a dataset and all datapoints                                                                                                                                                                                                              |
| Evaluator  | **get\_llm\_eval**           | Retrieve an LLM-as-a-Judge evaluator configuration                                                                                                                                                                                               |
| Evaluator  | **get\_python\_eval**        | Retrieve a Python code evaluator configuration                                                                                                                                                                                                   |
| Evaluator  | **create\_llm\_eval**        | Create LLM-as-a-Judge evaluator                                                                                                                                                                                                                  |
| Evaluator  | **create\_python\_eval**     | Create Python code evaluator                                                                                                                                                                                                                     |
| Evaluator  | **update\_llm\_eval**        | Update an existing LLM-as-a-Judge evaluator (prompt, model, output type)                                                                                                                                                                         |
| Evaluator  | **update\_python\_eval**     | Update an existing Python code evaluator (code, output type)                                                                                                                                                                                     |
| Experiment | **list\_experiment\_runs**   | List runs with cursor pagination                                                                                                                                                                                                                 |
| Experiment | **get\_experiment\_run**     | Export run (JSON/JSONL/CSV)                                                                                                                                                                                                                      |
| Experiment | **create\_experiment**       | Create experiment from dataset with optional auto-run                                                                                                                                                                                            |
| Models     | **list\_models**             | List available AI models by type (chat, embedding, image, tts, stt, and more)                                                                                                                                                                    |
| Registry   | **list\_registry\_keys**     | List available attribute keys for filtering traces                                                                                                                                                                                               |
| Registry   | **list\_registry\_values**   | List top values for a specific attribute                                                                                                                                                                                                         |
| Search     | **search\_entities**         | Search any entity type: project, dataset, prompt, experiment, agent, evaluator, knowledge, memory store, or deployment (supports cursor pagination)                                                                                              |
| Search     | **search\_directories**      | List directories within a project                                                                                                                                                                                                                |
| Search     | **search\_docs**             | Query the Orq.ai documentation for feature guidance and API reference                                                                                                                                                                            |
| Traces     | **list\_traces**             | List traces with filtering by model, type, project, thread ID, time range, and more                                                                                                                                                              |
| Traces     | **get\_span**                | Retrieve a single span (compact or full mode)                                                                                                                                                                                                    |
| Traces     | **list\_spans**              | List all spans in a trace                                                                                                                                                                                                                        |
| Workspace  | **delete\_entity**           | Delete any entity by type and ID. Supported types: `agent`, `prompt`, `experiment`, `evaluator`, `knowledge`, `memory_store`, `prompt_snippet` (Skills), `sheet`, `tool`. Use `delete_dataset` to delete a dataset along with all its datapoints |

## Examples

<AccordionGroup>
  <Accordion title="Building an Agent" icon="robot">
    **Create an agent from scratch**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create a customer support agent called "Support Bot" that answers questions about our SaaS product. Use GPT-4.1 and give it a concise and professional tone.
    ```

    The assistant will:

    1. Use `create_agent` with the name, instructions, and model (`openai/gpt-4.1`)
    2. Return the agent key and configuration summary

    ***

    **Review and update agent instructions**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Show me the current instructions for the "Support Bot" agent and update them to always respond in the user's language
    ```

    The assistant will:

    1. Use `get_agent` to retrieve the current configuration
    2. Display the existing instructions
    3. Use `update_agent` with the revised `instructions` field
    4. Confirm the update
  </Accordion>

  <Accordion title="Investigating Traces" icon="chart-bullet">
    **Find errors from the last 24 hours**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Show me all traces with errors from the last 24 hours
    ```

    The assistant will:

    1. Calculate the unix timestamp for 24 hours ago
    2. Use `list_traces` with filter `status:=ERROR && timestamp:>TIMESTAMP` and sort by `timestamp:desc`
    3. Display trace IDs, names, durations, and timestamps
    4. Summarize the most common error types and their frequency

    ***

    **Detect regressions after a model switch**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    After switching models yesterday, has latency increased or stabilized?
    ```

    The assistant will:

    1. Use `query_analytics` with `metric: "latency"` and `group_by: ["model"]` for the period before the switch
    2. Repeat for the period after the switch
    3. Compare average latency per model across both windows and surface any regressions

    ***

    **Find the slowest traces**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Find the 5 slowest traces from today and show me their span details
    ```

    The assistant will:

    1. Use `list_traces` sorted by `duration_ms:desc`, filtered to today, limit 5
    2. Use `list_spans` with each `trace_id` to retrieve the full span tree
    3. Surface bottlenecks and latency outliers

    ***

    **Filter traces by thread ID**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Show me all traces for thread ID thread_abc123
    ```

    The assistant will:

    1. Use `list_traces` with `thread_id: "thread_abc123"`
    2. Return all traces associated with that conversation thread
    3. Surface turn count, total cost, and any errors across the session
  </Accordion>

  <Accordion title="Running Experiments" icon="flask">
    **Compare two models on an existing dataset**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create an experiment comparing GPT-5.2 and Claude Sonnet 4.6 using the "user-queries" dataset
    ```

    The assistant will:

    1. Search for the "user-queries" dataset using `search_entities`
    2. Use `create_experiment` with two model configurations and `auto_run` enabled
    3. Return the experiment ID once both configurations have run

    ***

    **Compare two prompt strategies**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create an experiment using the "customer-feedback" dataset with two prompts: one focused on empathy and one on brevity. Run it and summarize the results.
    ```

    The assistant will:

    1. Search for the dataset using `search_entities`
    2. Use `create_experiment` with two prompt variants and `auto_run` enabled
    3. Use `get_experiment_run` to retrieve evaluation metrics
    4. Compare the variants and summarize which performed better

    ***

    **Export experiment results**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Export the latest experiment run as CSV
    ```

    The assistant will:

    1. Use `list_experiment_runs` to find the most recent run
    2. Use `get_experiment_run` with CSV export format
    3. Return a signed download URL for the CSV file
  </Accordion>

  <Accordion title="Managing Datasets" icon="database">
    **Create a synthetic dataset**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Generate 50 realistic customer support questions about a SaaS product and create a dataset called "Support Training Data"
    ```

    The assistant will:

    1. Generate 50 synthetic question/answer pairs
    2. Use `create_dataset` to create the dataset
    3. Use `create_datapoints` to add all entries in bulk, each formatted as `{ inputs: { question: "..." }, expected_output: "..." }`

    ***

    **Import data from code**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create a dataset from the JSON array above and add it to my workspace
    ```

    The assistant will:

    1. Parse the JSON from your selection or context
    2. Use `create_dataset` with an appropriate name
    3. Use `create_datapoints` to add each entry as a datapoint

    ***

    **Update or clean up a dataset**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Delete all datapoints in the "staging-tests" dataset that have an empty expected_output field
    ```

    The assistant will:

    1. Use `search_entities` to find the "staging-tests" dataset and retrieve its ID
    2. Use `list_datapoints` to retrieve all entries
    3. Filter for datapoints with empty `expected_output`
    4. Use `delete_datapoints` to remove them in batches
  </Accordion>

  <Accordion title="Evaluators" icon="clipboard-check">
    **Retrieve an evaluator's configuration**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Show me the current configuration for the "tone-scorer" evaluator
    ```

    The assistant will:

    1. Search for the evaluator using `search_entities` to resolve its ID
    2. Use `get_llm_eval` or `get_python_eval` to retrieve the full configuration
    3. Display the prompt, model, output type, and other settings

    ***

    **Create an LLM-as-a-Judge evaluator**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create an LLM-as-a-Judge evaluator that scores responses on tone: professional, neutral, or aggressive
    ```

    The assistant will:

    1. Use `create_llm_eval` with a scoring rubric for tone classification
    2. Confirm the evaluator ID and configuration

    ***

    **Create a Python evaluator**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create a Python evaluator that checks whether the response contains a valid JSON object
    ```

    The assistant will:

    1. Write a Python snippet that parses the response and validates JSON structure
    2. Use `create_python_eval` to register it in your workspace

    ***

    **Create an experiment with evaluators**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create an experiment from the "qa-dataset" dataset with the "tone-scorer" evaluator attached
    ```

    The assistant will:

    1. Search for the dataset using `search_entities`
    2. Use `search_entities` to find the evaluator and get its key, or use the key returned by `create_llm_eval` / `create_python_eval` if created in the same session
    3. Use `create_experiment` with both the dataset ID and evaluator ID, with `auto_run` enabled

    ***

    **Update an existing evaluator**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Update the "tone-scorer" evaluator to also check for formal language and return a boolean instead of a number
    ```

    The assistant will:

    1. Search for the evaluator using `search_entities`
    2. Use `update_llm_eval` with the evaluator ID, updated `prompt`, and `output_type: "boolean"`
    3. Confirm the new configuration
  </Accordion>

  <Accordion title="Managing Entities" icon="trash">
    **Delete a workspace entity**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Delete the experiment named "GPT-5 Test Run" from my workspace
    ```

    The assistant will:

    1. Search for the experiment using `search_entities`
    2. Use `delete_entity` with `type: "experiment"` and the resolved ID
    3. Confirm deletion

    <Note>
      Supported `type` values: `agent`, `prompt`, `experiment`, `evaluator`, `knowledge`, `memory_store`, `prompt_snippet` (Skills), `sheet`, `tool`. Use `delete_dataset` to delete a dataset along with all its datapoints.
    </Note>
  </Accordion>

  <Accordion title="Documentation Search" icon="book-open">
    **Look up a feature in the Orq.ai docs**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    How does prompt caching work in the AI Router?
    ```

    The assistant will:

    1. Use `search_docs` with a relevant query
    2. Return matching documentation sections with guidance and examples
    3. Summarize the answer in context

    ***

    **Get started with a specific product area**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Show me how to set up the AI Router
    ```

    The assistant will:

    1. Use `search_docs` to find Router onboarding content
    2. Return setup steps, configuration options, and quick-start examples
  </Accordion>

  <Accordion title="Analytics" icon="chart-line">
    **Get a workspace snapshot**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Give me an overview of my workspace metrics for the last 7 days
    ```

    The assistant will:

    1. Use `get_analytics_overview` with a 7-day range
    2. Return total requests, cost, tokens, error rate, latency, and top models

    ***

    **Drill into a specific model's performance**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    How has gpt-5.2 performed this week? Focus on error rate and cost.
    ```

    The assistant will:

    1. Use `query_analytics` with `metric: "errors"`, filtered by model and a 7-day range
    2. Use `query_analytics` with `metric: "cost"`, filtered by model and a 7-day range
    3. Surface error rate trends and cost breakdown side by side

    ***

    **Identify your most expensive models**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Which models are costing the most this month?
    ```

    The assistant will:

    1. Use `query_analytics` with `metric: "cost"`, `group_by: ["model"]`, and a 30-day range
    2. Aggregate cost per model across all time buckets and rank them by total spend
  </Accordion>
</AccordionGroup>

## Skills

**Orq Skills** layer pre-built multi-step workflows on top of these MCP tools: build agents, run experiments, analyze trace failures, and more with a single command.

<Card title="Orq Skills" icon="wand-magic-sparkles" href="/docs/integrations/code-assistants/skills">
  Pre-built workflows and slash commands for the full Build, Evaluate, Optimize lifecycle
</Card>
