Fallbacks
This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.
Quick Start
Automatically retry failed requests with different providers or models.
const response = await openai.chat.completions.create({
model: "openai/gpt-4o-mini",
messages: [{ role: "user", content: "Generate a product description" }],
orq: {
fallbacks: [{ model: "openai/gpt-4o" }, { model: "azure/gpt-4o" }],
},
});
Configuration
Parameter | Type | Required | Description |
---|---|---|---|
fallbacks | Array | Yes | List of fallback models in order of preference |
model | string | Yes | Model identifier for each fallback |
Trigger Conditions
Fallbacks activate on these errors:
Error Code | Description | Auto-retry |
---|---|---|
429 | Rate limit exceeded | ✅ Yes |
500 | Internal server error | ✅ Yes |
502 | Bad gateway | ✅ Yes |
503 | Service unavailable | ✅ Yes |
504 | Gateway timeout | ✅ Yes |
400 | Bad request | ❌ No |
401 | Unauthorized | ❌ No |
Best Practices
Fallback Chain Design:
- Use a maximum of 3 Fallback models for performance.
- Order fallbacks by preference/cost.
- Use models with similar capabilities.
- Include fast backup option.
Example Strategies:
// Cost-optimized: cheap → expensive
fallbacks: [{ model: "openai/gpt-3.5-turbo" }, { model: "openai/gpt-4o" }];
// Speed-optimized: fast → comprehensive
fallbacks: [
{ model: "openai/gpt-4o-mini" },
{ model: "anthropic/claude-3-haiku" },
];
// Reliability-optimized: different providers
fallbacks: [
{ model: "openai/gpt-4o" },
{ model: "anthropic/claude-3-sonnet" },
{ model: "azure/gpt-4o" },
];
Code examples
curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "GenerateGenerate aa productproduct descriptiondescription forfor aa smartsmart homehome thermostatthermostat"
}
],
"orq": {
"fallbacks": [
{
"model": "openai/gpt-4o"
},
{
"model": "azure/gpt-4o"
}
]
}
}'
from openai import OpenAI
import os
openai = OpenAI(
api_key=os.environ.get("ORQ_API_KEY"),
base_url="https://api.orq.ai/v2/proxy"
)
response = openai.chat.completions.create(
model="openai/gpt-4o",
messages=[
{
"role": "user",
"content": "GenerateGenerate aa productproduct descriptiondescription forfor aa smartsmart homehome thermostatthermostat"
}
],
extra_body={
"orq": {
"fallbacks": [
{
"model": "openai/gpt-4o"
},
{
"model": "azure/gpt-4o"
}
]
}
}
)
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.ORQ_API_KEY,
baseURL: "https://api.orq.ai/v2/proxy",
});
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content:
"GenerateGenerate aa productproduct descriptiondescription forfor aa smartsmart homehome thermostatthermostat",
},
],
orq: {
fallbacks: [
{
model: "openai/gpt-4o",
},
{
model: "azure/gpt-4o",
},
],
},
});
Troubleshooting
Fallbacks not triggering
- Check error codes match trigger conditions
- Verify fallback models are available
- Ensure API keys are configured for all providers
Slow response times
- Reduce number of fallbacks (max 3)
- Set appropriate timeouts
- Use faster models in fallback chain
High costs
- Failed requests may still incur charges
- Monitor fallback usage rates
- Optimize primary model selection
Monitoring
Track these metrics for optimal fallback performance:
- Fallback trigger rate: % of requests using fallbacks
- Success rate by position: Which fallbacks succeed most
- Cost impact: Additional charges from fallback usage
- Latency increase: Time added by fallback attempts
Limitations
- Response consistency: Different models may return varying output styles
- Parameter support: Not all providers support identical parameters
- Cost implications: Failed requests may still incur charges from primary provider
- Latency impact: Sequential attempts add processing time
- Provider dependencies: Requires API keys for all fallback providers
Advanced Usage
With other features:
{
model: "openai/gpt-4o-mini",
orq: {
fallbacks: [{model: "openai/gpt-4o"}],
retries: {count: 2, on_codes: [429]},
timeout: {call_timeout: 15000},
cache: {type: "exact_match", ttl: 3600}
}
}
Environment-specific:
// Production: Conservative fallbacks
const prodFallbacks = [{ model: "openai/gpt-4o" }];
// Development: Aggressive fallbacks
const devFallbacks = [
{ model: "openai/gpt-4o" },
{ model: "anthropic/claude-3" },
{ model: "azure/gpt-4o" },
];
Updated 4 days ago