Using AI across your organization usually means managing separate subscriptions to ChatGPT, Claude, and Gemini, each with their own credentials, billing, and access controls. AI Chat gives your entire team a single interface to access every model provider connected to your workspace.
What’s Available:

- All providers in one interface - access OpenAI, Anthropic, Google, Mistral, and any other connected provider from a single chat. No need for separate accounts or API keys per provider.
- Switch providers mid-conversation - compare model outputs by switching between providers during the same conversation to find the best fit for the task.
- Custom models - connect your own fine-tuned or self-hosted models and use them alongside commercial providers in the same interface.
- Expose agents to your organization - deploy custom agents built in Orq directly into AI Chat, so your team can interact with purpose-built AI assistants without any setup.
- Image and PDF support - upload and chat with images, PDFs, and other file types directly in conversation.
- Workspace-scoped configuration - admins control which models and agents are available per workspace, keeping sensitive or expensive models restricted to the right teams.
Learn more in the AI Chat Documentation.
Keeping downstream systems in sync with changes in your AI stack used to require polling the API. Webhooks deliver HTTP callbacks when key events happen in your workspace, so you can trigger workflows, sync data, or alert your team in real-time.
What’s Available:

- Agent events - get notified when agents are created, updated, published, or deleted. Trigger CI/CD pipelines, update internal catalogs, or send Slack notifications automatically.
- Deployment events - receive callbacks when deployments change, keeping your monitoring systems and internal dashboards in sync. You can select specific deployments to send webhooks for, giving you fine-grained control over which events reach your endpoints.
- Prompt events - track prompt lifecycle changes to maintain version-controlled prompt registries or trigger evaluation runs when a prompt is updated.
- Signature validation - every webhook delivery includes a cryptographic signature so you can verify the payload originated from Orq and hasn’t been tampered with.
Configure webhooks in the webhook setup guide.
Not every request needs your most expensive model. The Auto Router automatically routes each request to the optimal model based on your optimization strategy, so you can reduce costs without sacrificing the quality that matters.
How It Works:

- Optimize for cost - set a high-quality model as your baseline, and the Auto Router will route simpler requests to cheaper models while escalating complex ones. You save on the requests that don’t need your most powerful model.
- Optimize for quality - start with a cost-efficient model and let the Auto Router escalate to more capable models only when the task demands it. Get the best output for every request without overspending.
- Configure your model pool - pick which models the router can choose from, mixing expensive and affordable options. The router learns which requests need which level of capability.
Set up Auto Router in the AI Router.
API keys in the AI Router previously only supported credit-based limits. Now you can set multiple constraints on a single key to prevent runaway costs and keep consumption predictable.
What’s Available:

- Requests per minute - cap the number of API calls a key can make within a time window to prevent traffic spikes from burning through your budget.
- Token consumption limits - set maximum token usage per cycle to control how much compute each key can consume.
- Cost limits - define maximum spend per key with automatic reset on your configured billing cycle.
- Flexible reset cycles - all limits reset automatically based on your configured time period: hourly, daily, weekly, or monthly.
Configure limits when creating an API key in the AI Router.
Sending one monolithic prompt for every request means paying for context that isn’t always relevant. Combine Jinja2 and Mustache templating with prompt snippets to build modular prompts that only include the sections relevant to each request.What’s Available:
- Modular prompt design - break complex prompts into reusable snippets and compose them with Jinja2 conditionals or Mustache sections. Only include the instructions, examples, or context blocks that match the current use case.
- Token cost optimization - use conditional blocks in Jinja2 or Mustache to include only what’s needed per request, cutting token costs on prompts with multiple use cases.
Learn more in the Prompt Snippets Documentation and Prompt Templating Guide.
evaluatorq now includes a red teaming module that automatically probes your deployments and agents for exploitable weaknesses based on the OWASP Top 10 for LLM Applications, and produces a scored report.What’s Available:
- Automated attack generation - generate adversarial prompts at runtime based on your target’s system prompt using dynamic, static, or hybrid modes.
- OWASP category targeting - scope runs to specific risk areas including prompt injection, sensitive information disclosure, system prompt leakage, goal hijacking, tool misuse, and more.
- Agent-aware testing - red team orq.ai Agents directly. The pipeline auto-discovers tools and memory stores and generates tailored attacks including tool-misuse and memory-poisoning vectors.
- CI integration - fail builds when vulnerabilities are detected using exit-code gating, so security regressions never reach production.
- Results in orq.ai - attack results are automatically pushed to your workspace as an Experiment run with per-attack category, vulnerability, prompt, response, and verdict.
Learn more in the Red Teaming Tutorial.
Three new variants of OpenAI’s GPT-5.4 family, expanding beyond the base model announced in Release 4.5.New Models:
- GPT-5.4 Pro - highest capability variant for complex reasoning, multimodal understanding, and extended context tasks.
- GPT-5.4 Mini - balanced performance and cost for general-purpose production workloads.
- GPT-5.4 Nano - cost-efficient variant optimized for high-throughput applications with reduced latency.
Explore GPT-5.4 models in the AI Router or via the AI Router API.
xAI’s latest Grok models are now available with reasoning, non-reasoning, and multi-agent variants.New Models:
- Grok 4.20 Beta (Reasoning) - analytical reasoning mode for problem-solving and complex tasks.
- Grok 4.20 Beta (Non-Reasoning) - standard mode for general conversation and content generation.
- Grok 4.20 Multi-Agent Beta - variant designed for multi-agent coordination workflows.
Explore Grok models in the Model Garden or via the AI Router.
Expanded model availability on AWS Bedrock.New Models:
- Meta Llama 3.1 405B Instruct - Meta’s largest open-source model now available through Bedrock.
- Mistral Magistral Small, Mistral Large 3, Pixtral Large - Mistral’s latest models for text and vision tasks.
- NVIDIA Nemotron Nano 3 - compact model optimized for efficient inference.
Explore all available models in the Model Garden or via the AI Router.