> ## Documentation Index > Fetch the complete documentation index at: https://docs.orq.ai/llms.txt > Use this file to discover all available pages before exploring further. # Orq MCP Server tools and quickstart > Connect AI code assistants to an Orq.ai workspace via the Model Context Protocol. Reference for all 38 available tools with usage examples. ## What is the Orq MCP? The Orq Model Context Protocol (MCP) server provides AI code assistants with direct access to the **Orq.ai** workspace. With 38 specialized tools, manage experiments, create datasets, configure evaluators, and analyze traces without leaving the IDE. ## Installation Point the assistant at the MCP server and authenticate with an [API key](/docs/ai-studio/organization/api-keys): | | | | --------------- | ---------------------------------------- | | **Endpoint** | `https://my.orq.ai/v2/mcp` | | **Auth Header** | `Authorization: Bearer YOUR_ORQ_API_KEY` | ### Code Assistants See detailed documentation for the following code assistants: Official Anthropic CLI for Claude with MCP integration Use Orq MCP in Claude's desktop application AI coding assistant with MCP protocol support AI-first code editor with native MCP support AI-powered editor with GitHub Copilot and native MCP support AI-powered terminal with native MCP support ## Key Capabilities Create, update, and configure agents with instructions, tools, models, evaluators, and guardrails Run experiments, compare prompts or models side-by-side, and export results Create datasets, add or edit datapoints, and generate synthetic test data Query usage, cost, latency, and error metrics across the workspace Create and update LLM-as-a-Judge and Python evaluators, and attach guardrails to agents Search the **Orq.ai** documentation without leaving your IDE ## Available Tools The Orq MCP provides 38 tools across 11 categories: | Category | Tool | Description | | ----------- | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | Agents | **get\_agent** | Retrieve agent configuration and details | | Agents | **create\_agent** | Create a new agent with instructions, tools, models, evaluators, and guardrails | | Agents | **update\_agent** | Update an existing agent's configuration and publish a new semantic version. Requires `versionIncrement` (`major`, `minor`, or `patch`) and `versionDescription` with every update | | Agents | **invoke\_agent** | Invoke an agent via the Responses API. Supports multi-turn via `previous_response_id`, variables, and background mode | | Agents | **retrieve\_agent\_response** | Retrieve a previously created agent response by ID | | Analytics | **get\_analytics\_overview** | Get workspace snapshot (requests, cost, tokens, errors, error rate, latency, top models) | | Analytics | **query\_analytics** | Flexible drill-down with filtering and grouping | | Dataset | **create\_dataset** | Create a new dataset | | Dataset | **list\_datapoints** | List datapoints in a dataset | | Dataset | **create\_datapoints** | Create datapoints (max 100) | | Dataset | **update\_datapoint** | Update a datapoint | | Dataset | **delete\_datapoints** | Delete datapoints (max 100) | | Dataset | **delete\_dataset** | Delete a dataset and all datapoints | | Deployments | **create\_deployment** | Create a deployment | | Deployments | **get\_deployment** | Retrieve a deployment by key | | Evaluator | **get\_llm\_eval** | Retrieve an LLM-as-a-Judge evaluator configuration | | Evaluator | **get\_python\_eval** | Retrieve a Python code evaluator configuration | | Evaluator | **create\_llm\_eval** | Create LLM-as-a-Judge evaluator | | Evaluator | **create\_python\_eval** | Create Python code evaluator | | Evaluator | **update\_llm\_eval** | Update an existing LLM-as-a-Judge evaluator (prompt, model, output type) | | Evaluator | **update\_python\_eval** | Update an existing Python code evaluator (code, output type) | | Experiment | **list\_experiment\_runs** | List runs with cursor pagination | | Experiment | **get\_experiment\_run** | Export run (JSON/JSONL/CSV) | | Experiment | **create\_experiment** | Create experiment from dataset with optional auto-run | | Models | **list\_models** | List available AI models by type (chat, embedding, image, tts, stt, and more) | | Models | **invoke\_model** | Invoke any model directly via the Responses API. Supports reasoning effort control and response content inclusion | | Search | **search\_entities** | Search any entity type: project, dataset, prompt, experiment, agent, evaluator, knowledge, memory store, or deployment (supports cursor pagination) | | Search | **search\_directories** | List directories within a project | | Search | **search\_docs** | Query the Orq.ai documentation for feature guidance and API reference | | Skills | **create\_skill** | Create a reusable skill | | Skills | **update\_skill** | Update an existing skill | | Skills | **get\_skill** | Retrieve a skill by key | | Skills | **list\_skills** | List all skills in the workspace | | Skills | **delete\_skill** | Delete a skill | | Traces | **list\_traces** | List traces with filtering by model, type, project, thread ID, time range, and more | | Traces | **get\_span** | Retrieve a single span (compact or full mode) | | Traces | **list\_spans** | List all spans in a trace | | Workspace | **delete\_entity** | Delete any entity by type and ID. Supported types: `agent`, `prompt`, `experiment`, `evaluator`, `knowledge`, `memory_store`, `prompt_snippet` (Skills), `sheet`, `tool`. Use `delete_dataset` to delete a dataset along with all its datapoints | ## Examples **Create an agent from scratch** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Create a customer support agent called "Support Bot" that answers questions about our SaaS product. Use GPT-4.1 and give it a concise and professional tone. ``` The assistant will: 1. Use `create_agent` with the name, instructions, and model (`openai/gpt-4.1`) 2. Return the agent key and configuration summary *** **Review and update agent instructions** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Show me the current instructions for the "Support Bot" agent and update them to always respond in the user's language ``` The assistant will: 1. Use `get_agent` to retrieve the current configuration 2. Display the existing instructions 3. Use `update_agent` with the revised `instructions` field, `versionIncrement`, and `versionDescription` 4. Confirm the update and new version Use `invoke_model` to call any model directly via the Responses API. **Parameters** | Parameter | Type | Description | | ----------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `model` | string | Model ID in `provider/model` format (e.g. `openai/gpt-5`, `openai/o3`) | | `reasoning` | object | Reasoning configuration. Supported on OpenAI GPT-5 and o-series models only. `effort`: `none`, `low`, `medium`, `high`, or `xhigh`. `summary`: `auto`, `concise`, or `detailed` | | `include` | array | Response content to include: `reasoning.encrypted_content`, `message.output_text.logprobs` | *** **Call an o-series model with reasoning** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Use invoke_model to call openai/o3 with medium reasoning effort and return a concise reasoning summary ``` The assistant will: 1. Use `invoke_model` with `model: "openai/o3"` and `reasoning: { effort: "medium", summary: "concise" }` 2. Return the model response along with the reasoning summary *** **Include encrypted reasoning content** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Invoke gpt-5 and include the encrypted reasoning content in the response ``` The assistant will: 1. Use `invoke_model` with `model: "openai/gpt-5"` and `include: ["reasoning.encrypted_content"]` 2. Return the response with the encrypted reasoning block attached **Find errors from the last 24 hours** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Show me all traces with errors from the last 24 hours ``` The assistant will: 1. Calculate the unix timestamp for 24 hours ago 2. Use `list_traces` with filter `status:=ERROR && timestamp:>TIMESTAMP` and sort by `timestamp:desc` 3. Display trace IDs, names, durations, and timestamps 4. Summarize the most common error types and their frequency *** **Detect regressions after a model switch** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} After switching models yesterday, has latency increased or stabilized? ``` The assistant will: 1. Use `query_analytics` with `metric: "latency"` and `group_by: ["model"]` for the period before the switch 2. Repeat for the period after the switch 3. Compare average latency per model across both windows and surface any regressions *** **Find the slowest traces** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Find the 5 slowest traces from today and show me their span details ``` The assistant will: 1. Use `list_traces` sorted by `duration_ms:desc`, filtered to today, limit 5 2. Use `list_spans` with each `trace_id` to retrieve the full span tree 3. Surface bottlenecks and latency outliers *** **Filter traces by thread ID** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Show me all traces for thread ID thread_abc123 ``` The assistant will: 1. Use `list_traces` with `thread_id: "thread_abc123"` 2. Return all traces associated with that conversation thread 3. Surface turn count, total cost, and any errors across the session **Compare two models on an existing dataset** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Create an experiment comparing GPT-5.2 and Claude Sonnet 4.6 using the "user-queries" dataset ``` The assistant will: 1. Search for the "user-queries" dataset using `search_entities` 2. Use `create_experiment` with two model configurations and `auto_run` enabled 3. Return the experiment ID once both configurations have run *** **Compare two prompt strategies** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Create an experiment using the "customer-feedback" dataset with two prompts: one focused on empathy and one on brevity. Run it and summarize the results. ``` The assistant will: 1. Search for the dataset using `search_entities` 2. Use `create_experiment` with two prompt variants and `auto_run` enabled 3. Use `get_experiment_run` to retrieve evaluation metrics 4. Compare the variants and summarize which performed better *** **Export experiment results** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Export the latest experiment run as CSV ``` The assistant will: 1. Use `list_experiment_runs` to find the most recent run 2. Use `get_experiment_run` with CSV export format 3. Return a signed download URL for the CSV file **Create a synthetic dataset** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Generate 50 realistic customer support questions about a SaaS product and create a dataset called "Support Training Data" ``` The assistant will: 1. Generate 50 synthetic question/answer pairs 2. Use `create_dataset` to create the dataset 3. Use `create_datapoints` to add all entries in bulk, each formatted as `{ inputs: { question: "..." }, expected_output: "..." }` *** **Import data from code** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Create a dataset from the JSON array above and add it to my workspace ``` The assistant will: 1. Parse the JSON from the selection or context 2. Use `create_dataset` with an appropriate name 3. Use `create_datapoints` to add each entry as a datapoint *** **Update or clean up a dataset** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Delete all datapoints in the "staging-tests" dataset that have an empty expected_output field ``` The assistant will: 1. Use `search_entities` to find the "staging-tests" dataset and retrieve its ID 2. Use `list_datapoints` to retrieve all entries 3. Filter for datapoints with empty `expected_output` 4. Use `delete_datapoints` to remove them in batches **Retrieve an evaluator's configuration** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Show me the current configuration for the "tone-scorer" evaluator ``` The assistant will: 1. Search for the evaluator using `search_entities` to resolve its ID 2. Use `get_llm_eval` or `get_python_eval` to retrieve the full configuration 3. Display the prompt, model, output type, and other settings *** **Create an LLM-as-a-Judge evaluator** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Create an LLM-as-a-Judge evaluator that scores responses on tone: professional, neutral, or aggressive ``` The assistant will: 1. Use `create_llm_eval` with a scoring rubric for tone classification 2. Confirm the evaluator ID and configuration *** **Create a Python evaluator** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Create a Python evaluator that checks whether the response contains a valid JSON object ``` The assistant will: 1. Write a Python snippet that parses the response and validates JSON structure 2. Use `create_python_eval` to register it in the workspace *** **Create an experiment with evaluators** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Create an experiment from the "qa-dataset" dataset with the "tone-scorer" evaluator attached ``` The assistant will: 1. Search for the dataset using `search_entities` 2. Use `search_entities` to find the evaluator and get its key, or use the key returned by `create_llm_eval` / `create_python_eval` if created in the same session 3. Use `create_experiment` with both the dataset ID and evaluator ID, with `auto_run` enabled *** **Update an existing evaluator** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Update the "tone-scorer" evaluator to also check for formal language and return a boolean instead of a number ``` The assistant will: 1. Search for the evaluator using `search_entities` 2. Use `update_llm_eval` with the evaluator ID, updated `prompt`, and `output_type: "boolean"` 3. Confirm the new configuration **Delete a workspace entity** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Delete the experiment named "GPT-5 Test Run" from my workspace ``` The assistant will: 1. Search for the experiment using `search_entities` 2. Use `delete_entity` with `type: "experiment"` and the resolved ID 3. Confirm deletion Supported `type` values: `agent`, `prompt`, `experiment`, `evaluator`, `knowledge`, `memory_store`, `prompt_snippet` (Skills), `sheet`, `tool`. Use `delete_dataset` to delete a dataset along with all its datapoints. **Look up a feature in the Orq.ai docs** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} How does prompt caching work in the AI Gateway? ``` The assistant will: 1. Use `search_docs` with a relevant query 2. Return matching documentation sections with guidance and examples 3. Summarize the answer in context *** **Get started with a specific product area** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Show me how to set up the AI Gateway ``` The assistant will: 1. Use `search_docs` to find Router onboarding content 2. Return setup steps, configuration options, and quick-start examples **Get a workspace snapshot** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Give me an overview of my workspace metrics for the last 7 days ``` The assistant will: 1. Use `get_analytics_overview` with a 7-day range 2. Return total requests, cost, tokens, error rate, latency, and top models *** **Drill into a specific model's performance** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} How has gpt-5.2 performed this week? Focus on error rate and cost. ``` The assistant will: 1. Use `query_analytics` with `metric: "errors"`, filtered by model and a 7-day range 2. Use `query_analytics` with `metric: "cost"`, filtered by model and a 7-day range 3. Surface error rate trends and cost breakdown side by side *** **Identify the most expensive models** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Which models are costing the most this month? ``` The assistant will: 1. Use `query_analytics` with `metric: "cost"`, `group_by: ["model"]`, and a 30-day range 2. Aggregate cost per model across all time buckets and rank them by total spend ## Skills **Orq Skills** layer pre-built multi-step workflows on top of these MCP tools: build agents, run experiments, analyze trace failures, and more with a single command. Pre-built workflows and slash commands for the full Build, Evaluate, Optimize lifecycle