Skip to main content

Overview

Orq Skills are pre-built, reusable workflows from the orq-ai/assistant-plugins repository. They come in two forms:
  • Skills: multi-step workflows that require reasoning, such as building an agent, running an experiment, or analyzing trace failures.
  • Commands: quick slash-command actions for immediate results, such as listing traces or showing analytics.
Both are built on the Agent Skills standard format, which means they work with any compatible assistant: Claude Code, Cursor, Gemini CLI, and others. Each skill encodes best practices from prompt engineering, agent design, evaluation methodology, and experimentation into a repeatable, triggered workflow.

Prerequisites

Installation

Choose the option that matches the assistant used:
# Installs skills, commands, agents, and the MCP server in one step
claude plugin marketplace add orq-ai/claude-plugins
claude plugin install orq-skills@orq-claude-plugin
Use one path only. The Claude Code plugin install includes the MCP server. Running the Claude Code plugin path alongside any other path will install the MCP server twice. Commands (/orq:quickstart, /orq:workspace, and others) and agents are only available with the Claude Code plugin.

Verify

Claude Code: Run the interactive onboarding command to confirm everything is working:
/orq:quickstart
Cursor, Gemini CLI, and others: Describe a task (e.g., “list my Orq.ai agents”) and confirm the skill responds correctly.

Commands

Quick-action slash commands available in Claude Code. Use /orq:<command> to trigger them.
CommandDescriptionUsage
quickstartInteractive onboarding: credentials, MCP setup, skills tour/orq:quickstart
workspaceWorkspace overview: Agents, Deployments, Prompts, Datasets, Experiments/orq:workspace [section]
tracesQuery and summarize Traces with filters/orq:traces [--deployment name] [--status error] [--last 24h]
modelsList available AI models by provider/orq:models [search-term]
analyticsUsage Analytics: requests, cost, tokens, errors/orq:analytics [--last 24h] [--group-by model]

Skills

Skills are triggered by describing what is needed. The assistant picks the right skill automatically.
SkillDescriptionDocs
build-agentDesign, create, and configure an Orq.ai Agent with tools, instructions, Knowledge Bases, and MemorySKILL.md
build-evaluatorCreate validated LLM-as-a-Judge Evaluators following evaluation best practicesSKILL.md
analyze-trace-failuresRead production Traces, identify what is failing, build failure taxonomies, and categorize issuesSKILL.md
run-experimentCreate and run Orq.ai Experiments: compare configurations with specialized agent, conversation, and RAG evaluationSKILL.md
generate-synthetic-datasetGenerate and curate evaluation Datasets: structured generation, quick from description, expansion, and dataset maintenanceSKILL.md
optimize-promptAnalyze and optimize system Prompts using a structured prompting guidelines frameworkSKILL.md
setup-observabilityInstrument LLM applications with orq.ai tracing. Covers AI Gateway (zero-code traces) and OpenTelemetry/OpenInference. Guides from framework detection through baseline verification to trace enrichmentSKILL.md
compare-agentsRun cross-framework agent comparisons: compare any combination of orq.ai, LangGraph, CrewAI, or OpenAI Agents SDK agents using evaluatorqSKILL.md
orq-red-teamRun adversarial attacks against deployed agents using the evaluatorq red team CLI. Covers OWASP LLM Top 10 categories: prompt injection, goal hijacking, tool misuse, system prompt leakageSKILL.md
evaluatorqWrite and run evaluatorq evaluation scripts for a single agent or deployment. Supports custom Python/TypeScript scorers and LLM-as-a-Judge EvaluatorsSKILL.md
simulate-agentSet up and run multi-turn conversational simulations with a UserSimulatorAgent, agent under test, and JudgeAgent. Define personas and scenarios to stress-test agent behavior before productionSKILL.md
manage-skillsList, inspect, create, update, and delete Orq.ai Skills (platform entities). Handles naming rules, template integration ({{skill.<name>}}), reference scanning, and safe deletionSKILL.md

Example workflows

Instrument an existing app

"Add orq.ai tracing to my app"                 → setup-observability
/orq:traces --last 1h                           # Verify traces are flowing
"Analyze these failures"                        → analyze-trace-failures

Build a new agent

"I need a customer support agent"              → build-agent
"Create test cases for it"                     → generate-synthetic-dataset
"Build an evaluator for response accuracy"     → build-evaluator
"Run an experiment to get a baseline"          → run-experiment

Debug production issues

/orq:traces --status error --last 24h          # Find errors
"Analyze these failures"                       → analyze-trace-failures
"Fix the prompt based on the failure analysis" → optimize-prompt
"Re-run the experiment to verify the fix"      → run-experiment

Improve an existing agent

/orq:analytics --group-by deployment           # Spot high error rates
"Analyze traces for the checkout agent"        → analyze-trace-failures
"Build evaluators for the failure modes"       → build-evaluator
"Generate a dataset covering edge cases"       → generate-synthetic-dataset
"Run an experiment and compare"                → run-experiment
"Optimize the prompt based on results"         → optimize-prompt

Improve an existing prompt

"My prompt isn't performing well, help me improve it" → optimize-prompt
"Create test cases to compare before and after"       → generate-synthetic-dataset
"Build an evaluator for a specific dimension"         → build-evaluator
"Run an experiment: current vs optimized prompt"      → run-experiment
"Refine the prompt based on failure cases"            → optimize-prompt

Red team and simulate a new agent

"I need to simulate user conversations with my agent"   → simulate-agent
"Run adversarial tests against it"                      → orq-red-team
"Build evaluators for the discovered failure modes"     → build-evaluator
"Run an experiment to compare patched vs original"      → run-experiment

Evaluate an agent with custom scorers

"Write an evaluatorq script for my support agent"       → evaluatorq
"Simulate edge-case personas against it"                → simulate-agent
"Red team the agent on prompt injection"                → orq-red-team

Resources

orq-ai/assistant-plugins

Source repository for all skills, commands, and agents