Not every request needs the most expensive model. The Auto Router automatically routes each request to the optimal model based on the chosen profile, to reduce costs without sacrificing quality.
Configure a Strong Model and an Economical Model, then choose how aggressively to route between them. The Auto Router evaluates each incoming request and sends it to whichever model best matches the task complexity and the optimization goal.
Two directions are supported:
- Optimize for cost: set a high-quality model as the baseline. The Auto Router routes simpler requests to the cheaper model and escalates only when complexity warrants it. This saves on requests that don’t need the most powerful model.
- Optimize for quality: start with a cost-efficient model and let the Auto Router escalate to the more capable model only when the task demands it. Get the best output for every request without overspending.
Use Cases
| Scenario | Setup | Outcome |
|---|
| Customer support chatbot | Strong: Claude Opus / Economical: Gemini Flash | Simple FAQs and acknowledgements go to the fast model; nuanced complaints or policy questions escalate automatically |
| Document summarization pipeline | Strong: GPT-4o / Economical: GPT-4o Mini | Short documents with clear structure route to the mini model; long, dense, or ambiguous documents go to the full model |
| Code assistant | Strong: Claude Sonnet / Economical: Gemini Flash | Autocomplete and boilerplate generation stay cheap; debugging, architecture questions, and multi-file reasoning escalate |
| Content generation at scale | Strong: GPT-5.1 / Economical: GPT-4o Mini | High-volume social copy and templated content uses the cheaper model; long-form articles or brand-sensitive copy uses the stronger one |
| Internal Q&A over documents | Strong: Claude Opus / Economical: Claude Haiku | Retrieval-augmented lookups with clear answers route to Haiku; open-ended synthesis or conflicting sources escalate to Opus |
How It Works
The Auto Router sits between the application and two models: a Strong Model for complex requests and an Economical Model for simpler ones. When a request comes in, it analyzes the task complexity and routes it to the appropriate model based on the configured profile.
Set Up the Auto Router
- Navigate to the Models page in AI Gateway.
- Click Add Model.
- Select Auto Router from the dropdown.
- Fill in the configuration:
- Model ID: a unique identifier for this router (lowercase letters, numbers, and hyphens only).
- Strong Model: the more capable model, used for complex requests.
- Economical Model: the cheaper model, used for simpler requests.
- Profile: choose how aggressively to route between the two models.
- Click Add model.
Profiles
| Profile | Behavior |
|---|
| Quality | Prioritizes the Strong Model for more requests |
| Balanced | Balances cost and quality across simple and complex requests |
| Cost | Prefers the Economical Model more aggressively to save money |
Recommended model pairs
These pairs combine high routing accuracy with significant cost ratios (over 10x), making them effective starting points.
| Strong Model | Economical Model |
|---|
| Google Gemini 2.5 Pro | Google Gemini 2.5 Flash |
| OpenAI GPT-5.1 | OpenAI GPT-4o Mini |
| Anthropic Claude Opus 4 | Google Gemini 2.5 Flash |
| OpenAI GPT-4o | OpenAI GPT-4o Mini |
Models from the same family or tier work well together (e.g. Claude Sonnet and Gemini Flash). Very large capability gaps reduce the effectiveness of routing.
Models from different providers can be combined in a single Auto Router configuration.
Use the Auto Router
Once created, the Auto Router appears in AI Gateway and can be referenced anywhere a model is accepted via the API or SDKs.
Reference in code
When using an Auto Router through the SDKs, API, or Supported Libraries, reference it by the string <workspacename>@orq/<model-id>.
Example: acme@orq/my-auto-router