Orq Skills for code assistants

Overview

Orq Skills are pre-built, reusable workflows from the orq-ai/assistant-plugins repository. They come in two forms:

Skills: multi-step workflows that require reasoning, such as building an agent, running an experiment, or analyzing trace failures.
Commands: quick slash-command actions for immediate results, such as listing traces or showing analytics.

Both are built on the Agent Skills standard format, which means they work with any compatible assistant: Claude Code, Cursor, Gemini CLI, and others. Each skill encodes best practices from prompt engineering, agent design, evaluation methodology, and experimentation into a repeatable, triggered workflow.

Prerequisites

An active orq.ai account
An API key
The Orq MCP server connected to the assistant (see MCP Quickstart)

Installation

Choose the option that matches the assistant used:

# Installs skills, commands, agents, and the MCP server in one step
claude plugin marketplace add orq-ai/claude-plugins
claude plugin install orq-skills@orq-claude-plugin

# Skills (writes to ~/.agents/skills/, which Codex scans by default)
npx skills add orq-ai/assistant-plugins --agent codex -g -y

# orq.ai MCP server (writes [mcp_servers.orq-workspace] to ~/.codex/config.toml)
codex mcp add orq-workspace \
  --url https://my.orq.ai/v2/mcp \
  --bearer-token-env-var ORQ_API_KEY

# Installs skills only: works with Cursor, Gemini CLI, and other compatible assistants
npx skills add orq-ai/assistant-plugins

Use one path only. The Claude Code plugin install includes the MCP server. Running the Claude Code plugin path alongside any other path will install the MCP server twice. Commands (/orq:quickstart, /orq:workspace, and others) and agents are only available with the Claude Code plugin.

Verify

Claude Code: Run the interactive onboarding command to confirm everything is working:

/orq:quickstart

Cursor, Gemini CLI, and others: Describe a task (e.g., “list my Orq.ai agents”) and confirm the skill responds correctly.

Commands

Quick-action slash commands available in Claude Code. Use /orq:<command> to trigger them.

Command	Description	Usage
quickstart	Interactive onboarding: credentials, MCP setup, skills tour	`/orq:quickstart`
workspace	Workspace overview: Agents, Deployments, Prompts, Datasets, Experiments	`/orq:workspace [section]`
traces	Query and summarize Traces with filters	`/orq:traces [--deployment name] [--status error] [--last 24h]`
models	List available AI models by provider	`/orq:models [search-term]`
analytics	Usage Analytics: requests, cost, tokens, errors	`/orq:analytics [--last 24h] [--group-by model]`
manage-skills	Manage Orq.ai Skills (platform entities): list, get, create, update, retire, delete	`/orq:orq-manage-skills [list\|get\|create\|update\|retire\|delete] [name-or-id]`

Skills

Skills are triggered by describing what is needed. The assistant picks the right skill automatically.

Skill	Description	Docs
build-agent	Design, create, and configure an Orq.ai Agent with tools, instructions, Knowledge Bases, and Memory	SKILL.md
build-evaluator	Create validated LLM-as-a-Judge Evaluators following evaluation best practices	SKILL.md
analyze-trace-failures	Read production Traces, identify what is failing, build failure taxonomies, and categorize issues	SKILL.md
run-experiment	Create and run Orq.ai Experiments: compare configurations with specialized agent, conversation, and RAG evaluation	SKILL.md
generate-synthetic-dataset	Generate and curate evaluation Datasets: structured generation, quick from description, expansion, and dataset maintenance	SKILL.md
invoke-deployment	Invoke Orq.ai Deployments, Agents, and models via the Python SDK or HTTP API, with correct variable substitution, streaming, and identity tracking	SKILL.md
optimize-prompt	Analyze and optimize system Prompts using a structured prompting guidelines framework	SKILL.md
setup-observability	Instrument LLM applications with orq.ai tracing. Covers AI Gateway (zero-code traces) and OpenTelemetry/OpenInference. Guides from framework detection through baseline verification to trace enrichment	SKILL.md
compare-agents	Run cross-framework agent comparisons: compare any combination of orq.ai, LangGraph, CrewAI, or OpenAI Agents SDK agents using evaluatorq	SKILL.md
orq-red-team	Run adversarial attacks against deployed agents using the `evaluatorq` red team CLI. Covers OWASP-ASI (agentic: goal hijacking, tool misuse) and OWASP-LLM (model-level: prompt injection, system prompt leakage)	SKILL.md
evaluatorq	Write and run `evaluatorq` evaluation scripts for a single agent or deployment. Supports custom Python/TypeScript scorers and LLM-as-a-Judge Evaluators	SKILL.md
simulate-agent	Set up and run multi-turn conversational simulations with a `UserSimulatorAgent`, agent under test, and `JudgeAgent`. Define personas and scenarios to stress-test agent behavior before production	SKILL.md
manage-skills	List, inspect, create, update, retire, and delete Orq.ai Skills (platform entities). Handles naming rules, template integration (`{{skill.<name>}}`), reference scanning, and safe deletion	SKILL.md

Example workflows

Instrument an existing app

"Add orq.ai tracing to my app"                 → setup-observability
/orq:traces --last 1h                           # Verify traces are flowing
"Analyze these failures"                        → analyze-trace-failures

Build a new agent

"I need a customer support agent"              → build-agent
"Create test cases for it"                     → generate-synthetic-dataset
"Build an evaluator for response accuracy"     → build-evaluator
"Run an experiment to get a baseline"          → run-experiment

Debug production issues

/orq:traces --status error --last 24h          # Find errors
"Analyze these failures"                       → analyze-trace-failures
"Fix the prompt based on the failure analysis" → optimize-prompt
"Re-run the experiment to verify the fix"      → run-experiment

Improve an existing agent

/orq:analytics --group-by deployment           # Spot high error rates
"Analyze traces for the checkout agent"        → analyze-trace-failures
"Build evaluators for the failure modes"       → build-evaluator
"Generate a dataset covering edge cases"       → generate-synthetic-dataset
"Run an experiment and compare"                → run-experiment
"Optimize the prompt based on results"         → optimize-prompt

Improve an existing prompt

"My prompt isn't performing well, help me improve it" → optimize-prompt
"Create test cases to compare before and after"       → generate-synthetic-dataset
"Build an evaluator for a specific dimension"         → build-evaluator
"Run an experiment: current vs optimized prompt"      → run-experiment
"Refine the prompt based on failure cases"            → optimize-prompt

Red team and simulate a new agent

"I need to simulate user conversations with my agent"   → simulate-agent
"Run adversarial tests against it"                      → orq-red-team
"Build evaluators for the discovered failure modes"     → build-evaluator
"Run an experiment to compare patched vs original"      → run-experiment

Evaluate an agent with custom scorers

"Write an evaluatorq script for my support agent"       → evaluatorq
"Simulate edge-case personas against it"                → simulate-agent
"Red team the agent on prompt injection"                → orq-red-team

Resources

orq-ai/assistant-plugins

Source repository for all skills, commands, and agents

AI Providers

Coding Assistants

Frameworks

Automations

Overview

Prerequisites

Installation

Verify

Commands

Skills

Example workflows

Instrument an existing app

Build a new agent

Debug production issues

Improve an existing agent

Improve an existing prompt

Red team and simulate a new agent

Evaluate an agent with custom scorers

Resources

orq-ai/assistant-plugins

​Overview

​Prerequisites

​Installation

​Verify

​Commands

​Skills

​Example workflows

​Instrument an existing app

​Build a new agent

​Debug production issues

​Improve an existing agent

​Improve an existing prompt

​Red team and simulate a new agent

​Evaluate an agent with custom scorers

​Resources

orq-ai/assistant-plugins

Overview

Prerequisites

Installation

Verify

Commands

Skills

Example workflows

Instrument an existing app

Build a new agent

Debug production issues

Improve an existing agent

Improve an existing prompt

Red team and simulate a new agent

Evaluate an agent with custom scorers

Resources