> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Orq MCP Server tools and quickstart

> Connect AI code assistants to an Orq.ai workspace via the Model Context Protocol. Reference for all 38 available tools with usage examples.

## What is the Orq MCP?

The Orq Model Context Protocol (MCP) server provides AI code assistants with direct access to the **Orq.ai** workspace. With 38 specialized tools, manage experiments, create datasets, configure evaluators, and analyze traces without leaving the IDE.

## Installation

Point the assistant at the MCP server and authenticate with an [API key](/docs/ai-studio/organization/api-keys):

|                 |                                          |
| --------------- | ---------------------------------------- |
| **Endpoint**    | `https://my.orq.ai/v2/mcp`               |
| **Auth Header** | `Authorization: Bearer YOUR_ORQ_API_KEY` |

### Code Assistants

See detailed documentation for the following code assistants:

<CardGroup cols={3}>
  <Card title="Claude Code" icon="https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/claude-code.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=1654adb689b324322f3d63dd9eef6ad0" href="/docs/ai-studio/integrations/code-assistants/claude-code" width="61" height="43" data-path="images/logos/claude-code.svg">
    Official Anthropic CLI for Claude with MCP integration
  </Card>

  <Card title="Claude Desktop" icon="https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/claude-desktop.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=bccf0e29820481de693393dd71aa84f3" href="/docs/ai-studio/integrations/code-assistants/claude-desktop" width="61" height="43" data-path="images/logos/claude-desktop.svg">
    Use Orq MCP in Claude's desktop application
  </Card>

  <Card title="Codex" icon="https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/codex.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=feb92a7651c85d15d424f900821581b6" href="/docs/ai-studio/integrations/code-assistants/codex" width="256" height="260" data-path="images/logos/codex.svg">
    AI coding assistant with MCP protocol support
  </Card>

  <Card title="Cursor" icon="https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/cursor.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=ba1ed7e66516eb920ae1462050041bdf" href="/docs/ai-studio/integrations/code-assistants/cursor" width="24" height="24" data-path="images/logos/cursor.svg">
    AI-first code editor with native MCP support
  </Card>

  <Card title="VS Code" icon="https://mintcdn.com/orqai/9VWlkBlGuTGPnFjG/images/logos/vscode.svg?fit=max&auto=format&n=9VWlkBlGuTGPnFjG&q=85&s=4c770da22e84990dee79094d1fab3179" href="/docs/ai-studio/integrations/code-assistants/vscode" width="24" height="24" data-path="images/logos/vscode.svg">
    AI-powered editor with GitHub Copilot and native MCP support
  </Card>

  <Card title="Warp" icon="https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/warp.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=2f4b99ef604241484380367ae742ea4d" href="/docs/ai-studio/integrations/code-assistants/warp" width="24" height="24" data-path="images/logos/warp.svg">
    AI-powered terminal with native MCP support
  </Card>
</CardGroup>

## Key Capabilities

<CardGroup cols={3}>
  <Card title="Agent Creation" icon="robot">
    Create, update, and configure agents with instructions, tools, models, evaluators, and guardrails
  </Card>

  <Card title="Experiment Management" icon="flask">
    Run experiments, compare prompts or models side-by-side, and export results
  </Card>

  <Card title="Dataset Operations" icon="database">
    Create datasets, add or edit datapoints, and generate synthetic test data
  </Card>

  <Card title="Analytics & Insights" icon="chart-line">
    Query usage, cost, latency, and error metrics across the workspace
  </Card>

  <Card title="Evaluator & Guardrail Configuration" icon="clipboard-check">
    Create and update LLM-as-a-Judge and Python evaluators, and attach guardrails to agents
  </Card>

  <Card title="Docs Exploration" icon="book-open">
    Search the **Orq.ai** documentation without leaving your IDE
  </Card>
</CardGroup>

## Available Tools

The Orq MCP provides 38 tools across 11 categories:

| Category    | Tool                          | Description                                                                                                                                                                                                                                      |
| ----------- | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Agents      | **get\_agent**                | Retrieve agent configuration and details                                                                                                                                                                                                         |
| Agents      | **create\_agent**             | Create a new agent with instructions, tools, models, evaluators, and guardrails                                                                                                                                                                  |
| Agents      | **update\_agent**             | Update an existing agent's configuration and publish a new semantic version. Requires `versionIncrement` (`major`, `minor`, or `patch`) and `versionDescription` with every update                                                               |
| Agents      | **invoke\_agent**             | Invoke an agent via the Responses API. Supports multi-turn via `previous_response_id`, variables, and background mode                                                                                                                            |
| Agents      | **retrieve\_agent\_response** | Retrieve a previously created agent response by ID                                                                                                                                                                                               |
| Analytics   | **get\_analytics\_overview**  | Get workspace snapshot (requests, cost, tokens, errors, error rate, latency, top models)                                                                                                                                                         |
| Analytics   | **query\_analytics**          | Flexible drill-down with filtering and grouping                                                                                                                                                                                                  |
| Dataset     | **create\_dataset**           | Create a new dataset                                                                                                                                                                                                                             |
| Dataset     | **list\_datapoints**          | List datapoints in a dataset                                                                                                                                                                                                                     |
| Dataset     | **create\_datapoints**        | Create datapoints (max 100)                                                                                                                                                                                                                      |
| Dataset     | **update\_datapoint**         | Update a datapoint                                                                                                                                                                                                                               |
| Dataset     | **delete\_datapoints**        | Delete datapoints (max 100)                                                                                                                                                                                                                      |
| Dataset     | **delete\_dataset**           | Delete a dataset and all datapoints                                                                                                                                                                                                              |
| Deployments | **create\_deployment**        | Create a deployment                                                                                                                                                                                                                              |
| Deployments | **get\_deployment**           | Retrieve a deployment by key                                                                                                                                                                                                                     |
| Evaluator   | **get\_llm\_eval**            | Retrieve an LLM-as-a-Judge evaluator configuration                                                                                                                                                                                               |
| Evaluator   | **get\_python\_eval**         | Retrieve a Python code evaluator configuration                                                                                                                                                                                                   |
| Evaluator   | **create\_llm\_eval**         | Create LLM-as-a-Judge evaluator                                                                                                                                                                                                                  |
| Evaluator   | **create\_python\_eval**      | Create Python code evaluator                                                                                                                                                                                                                     |
| Evaluator   | **update\_llm\_eval**         | Update an existing LLM-as-a-Judge evaluator (prompt, model, output type)                                                                                                                                                                         |
| Evaluator   | **update\_python\_eval**      | Update an existing Python code evaluator (code, output type)                                                                                                                                                                                     |
| Experiment  | **list\_experiment\_runs**    | List runs with cursor pagination                                                                                                                                                                                                                 |
| Experiment  | **get\_experiment\_run**      | Export run (JSON/JSONL/CSV)                                                                                                                                                                                                                      |
| Experiment  | **create\_experiment**        | Create experiment from dataset with optional auto-run                                                                                                                                                                                            |
| Models      | **list\_models**              | List available AI models by type (chat, embedding, image, tts, stt, and more)                                                                                                                                                                    |
| Models      | **invoke\_model**             | Invoke any model directly via the Responses API. Supports reasoning effort control and response content inclusion                                                                                                                                |
| Search      | **search\_entities**          | Search any entity type: project, dataset, prompt, experiment, agent, evaluator, knowledge, memory store, or deployment (supports cursor pagination)                                                                                              |
| Search      | **search\_directories**       | List directories within a project                                                                                                                                                                                                                |
| Search      | **search\_docs**              | Query the Orq.ai documentation for feature guidance and API reference                                                                                                                                                                            |
| Skills      | **create\_skill**             | Create a reusable skill                                                                                                                                                                                                                          |
| Skills      | **update\_skill**             | Update an existing skill                                                                                                                                                                                                                         |
| Skills      | **get\_skill**                | Retrieve a skill by key                                                                                                                                                                                                                          |
| Skills      | **list\_skills**              | List all skills in the workspace                                                                                                                                                                                                                 |
| Skills      | **delete\_skill**             | Delete a skill                                                                                                                                                                                                                                   |
| Traces      | **list\_traces**              | List traces with filtering by model, type, project, thread ID, time range, and more                                                                                                                                                              |
| Traces      | **get\_span**                 | Retrieve a single span (compact or full mode)                                                                                                                                                                                                    |
| Traces      | **list\_spans**               | List all spans in a trace                                                                                                                                                                                                                        |
| Workspace   | **delete\_entity**            | Delete any entity by type and ID. Supported types: `agent`, `prompt`, `experiment`, `evaluator`, `knowledge`, `memory_store`, `prompt_snippet` (Skills), `sheet`, `tool`. Use `delete_dataset` to delete a dataset along with all its datapoints |

## Examples

<AccordionGroup>
  <Accordion title="Building an Agent" icon="robot">
    **Create an agent from scratch**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create a customer support agent called "Support Bot" that answers questions about our SaaS product. Use GPT-4.1 and give it a concise and professional tone.
    ```

    The assistant will:

    1. Use `create_agent` with the name, instructions, and model (`openai/gpt-4.1`)
    2. Return the agent key and configuration summary

    ***

    **Review and update agent instructions**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Show me the current instructions for the "Support Bot" agent and update them to always respond in the user's language
    ```

    The assistant will:

    1. Use `get_agent` to retrieve the current configuration
    2. Display the existing instructions
    3. Use `update_agent` with the revised `instructions` field, `versionIncrement`, and `versionDescription`
    4. Confirm the update and new version
  </Accordion>

  <Accordion title="Invoking a Model" icon="bolt">
    Use `invoke_model` to call any model directly via the Responses API.

    **Parameters**

    | Parameter   | Type   | Description                                                                                                                                                                     |
    | ----------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | `model`     | string | Model ID in `provider/model` format (e.g. `openai/gpt-5`, `openai/o3`)                                                                                                          |
    | `reasoning` | object | Reasoning configuration. Supported on OpenAI GPT-5 and o-series models only. `effort`: `none`, `low`, `medium`, `high`, or `xhigh`. `summary`: `auto`, `concise`, or `detailed` |
    | `include`   | array  | Response content to include: `reasoning.encrypted_content`, `message.output_text.logprobs`                                                                                      |

    ***

    **Call an o-series model with reasoning**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Use invoke_model to call openai/o3 with medium reasoning effort and return a concise reasoning summary
    ```

    The assistant will:

    1. Use `invoke_model` with `model: "openai/o3"` and `reasoning: { effort: "medium", summary: "concise" }`
    2. Return the model response along with the reasoning summary

    ***

    **Include encrypted reasoning content**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Invoke gpt-5 and include the encrypted reasoning content in the response
    ```

    The assistant will:

    1. Use `invoke_model` with `model: "openai/gpt-5"` and `include: ["reasoning.encrypted_content"]`
    2. Return the response with the encrypted reasoning block attached
  </Accordion>

  <Accordion title="Investigating Traces" icon="chart-bullet">
    **Find errors from the last 24 hours**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Show me all traces with errors from the last 24 hours
    ```

    The assistant will:

    1. Calculate the unix timestamp for 24 hours ago
    2. Use `list_traces` with filter `status:=ERROR && timestamp:>TIMESTAMP` and sort by `timestamp:desc`
    3. Display trace IDs, names, durations, and timestamps
    4. Summarize the most common error types and their frequency

    ***

    **Detect regressions after a model switch**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    After switching models yesterday, has latency increased or stabilized?
    ```

    The assistant will:

    1. Use `query_analytics` with `metric: "latency"` and `group_by: ["model"]` for the period before the switch
    2. Repeat for the period after the switch
    3. Compare average latency per model across both windows and surface any regressions

    ***

    **Find the slowest traces**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Find the 5 slowest traces from today and show me their span details
    ```

    The assistant will:

    1. Use `list_traces` sorted by `duration_ms:desc`, filtered to today, limit 5
    2. Use `list_spans` with each `trace_id` to retrieve the full span tree
    3. Surface bottlenecks and latency outliers

    ***

    **Filter traces by thread ID**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Show me all traces for thread ID thread_abc123
    ```

    The assistant will:

    1. Use `list_traces` with `thread_id: "thread_abc123"`
    2. Return all traces associated with that conversation thread
    3. Surface turn count, total cost, and any errors across the session
  </Accordion>

  <Accordion title="Running Experiments" icon="flask">
    **Compare two models on an existing dataset**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create an experiment comparing GPT-5.2 and Claude Sonnet 4.6 using the "user-queries" dataset
    ```

    The assistant will:

    1. Search for the "user-queries" dataset using `search_entities`
    2. Use `create_experiment` with two model configurations and `auto_run` enabled
    3. Return the experiment ID once both configurations have run

    ***

    **Compare two prompt strategies**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create an experiment using the "customer-feedback" dataset with two prompts: one focused on empathy and one on brevity. Run it and summarize the results.
    ```

    The assistant will:

    1. Search for the dataset using `search_entities`
    2. Use `create_experiment` with two prompt variants and `auto_run` enabled
    3. Use `get_experiment_run` to retrieve evaluation metrics
    4. Compare the variants and summarize which performed better

    ***

    **Export experiment results**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Export the latest experiment run as CSV
    ```

    The assistant will:

    1. Use `list_experiment_runs` to find the most recent run
    2. Use `get_experiment_run` with CSV export format
    3. Return a signed download URL for the CSV file
  </Accordion>

  <Accordion title="Managing Datasets" icon="database">
    **Create a synthetic dataset**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Generate 50 realistic customer support questions about a SaaS product and create a dataset called "Support Training Data"
    ```

    The assistant will:

    1. Generate 50 synthetic question/answer pairs
    2. Use `create_dataset` to create the dataset
    3. Use `create_datapoints` to add all entries in bulk, each formatted as `{ inputs: { question: "..." }, expected_output: "..." }`

    ***

    **Import data from code**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create a dataset from the JSON array above and add it to my workspace
    ```

    The assistant will:

    1. Parse the JSON from the selection or context
    2. Use `create_dataset` with an appropriate name
    3. Use `create_datapoints` to add each entry as a datapoint

    ***

    **Update or clean up a dataset**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Delete all datapoints in the "staging-tests" dataset that have an empty expected_output field
    ```

    The assistant will:

    1. Use `search_entities` to find the "staging-tests" dataset and retrieve its ID
    2. Use `list_datapoints` to retrieve all entries
    3. Filter for datapoints with empty `expected_output`
    4. Use `delete_datapoints` to remove them in batches
  </Accordion>

  <Accordion title="Evaluators" icon="clipboard-check">
    **Retrieve an evaluator's configuration**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Show me the current configuration for the "tone-scorer" evaluator
    ```

    The assistant will:

    1. Search for the evaluator using `search_entities` to resolve its ID
    2. Use `get_llm_eval` or `get_python_eval` to retrieve the full configuration
    3. Display the prompt, model, output type, and other settings

    ***

    **Create an LLM-as-a-Judge evaluator**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create an LLM-as-a-Judge evaluator that scores responses on tone: professional, neutral, or aggressive
    ```

    The assistant will:

    1. Use `create_llm_eval` with a scoring rubric for tone classification
    2. Confirm the evaluator ID and configuration

    ***

    **Create a Python evaluator**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create a Python evaluator that checks whether the response contains a valid JSON object
    ```

    The assistant will:

    1. Write a Python snippet that parses the response and validates JSON structure
    2. Use `create_python_eval` to register it in the workspace

    ***

    **Create an experiment with evaluators**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create an experiment from the "qa-dataset" dataset with the "tone-scorer" evaluator attached
    ```

    The assistant will:

    1. Search for the dataset using `search_entities`
    2. Use `search_entities` to find the evaluator and get its key, or use the key returned by `create_llm_eval` / `create_python_eval` if created in the same session
    3. Use `create_experiment` with both the dataset ID and evaluator ID, with `auto_run` enabled

    ***

    **Update an existing evaluator**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Update the "tone-scorer" evaluator to also check for formal language and return a boolean instead of a number
    ```

    The assistant will:

    1. Search for the evaluator using `search_entities`
    2. Use `update_llm_eval` with the evaluator ID, updated `prompt`, and `output_type: "boolean"`
    3. Confirm the new configuration
  </Accordion>

  <Accordion title="Managing Entities" icon="trash">
    **Delete a workspace entity**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Delete the experiment named "GPT-5 Test Run" from my workspace
    ```

    The assistant will:

    1. Search for the experiment using `search_entities`
    2. Use `delete_entity` with `type: "experiment"` and the resolved ID
    3. Confirm deletion

    <Note>
      Supported `type` values: `agent`, `prompt`, `experiment`, `evaluator`, `knowledge`, `memory_store`, `prompt_snippet` (Skills), `sheet`, `tool`. Use `delete_dataset` to delete a dataset along with all its datapoints.
    </Note>
  </Accordion>

  <Accordion title="Documentation Search" icon="book-open">
    **Look up a feature in the Orq.ai docs**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    How does prompt caching work in the AI Gateway?
    ```

    The assistant will:

    1. Use `search_docs` with a relevant query
    2. Return matching documentation sections with guidance and examples
    3. Summarize the answer in context

    ***

    **Get started with a specific product area**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Show me how to set up the AI Gateway
    ```

    The assistant will:

    1. Use `search_docs` to find Router onboarding content
    2. Return setup steps, configuration options, and quick-start examples
  </Accordion>

  <Accordion title="Analytics" icon="chart-line">
    **Get a workspace snapshot**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Give me an overview of my workspace metrics for the last 7 days
    ```

    The assistant will:

    1. Use `get_analytics_overview` with a 7-day range
    2. Return total requests, cost, tokens, error rate, latency, and top models

    ***

    **Drill into a specific model's performance**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    How has gpt-5.2 performed this week? Focus on error rate and cost.
    ```

    The assistant will:

    1. Use `query_analytics` with `metric: "errors"`, filtered by model and a 7-day range
    2. Use `query_analytics` with `metric: "cost"`, filtered by model and a 7-day range
    3. Surface error rate trends and cost breakdown side by side

    ***

    **Identify the most expensive models**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Which models are costing the most this month?
    ```

    The assistant will:

    1. Use `query_analytics` with `metric: "cost"`, `group_by: ["model"]`, and a 30-day range
    2. Aggregate cost per model across all time buckets and rank them by total spend
  </Accordion>
</AccordionGroup>

## Skills

**Orq Skills** layer pre-built multi-step workflows on top of these MCP tools: build agents, run experiments, analyze trace failures, and more with a single command.

<Card title="Orq Skills" icon="wand-magic-sparkles" href="/docs/ai-studio/integrations/code-assistants/orq-skills">
  Pre-built workflows and slash commands for the full Build, Evaluate, Optimize lifecycle
</Card>
