Fallbacks

📖
This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.

Quick Start

Automatically retry failed requests with different providers or models.

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Generate a product description" }],
  orq: {
    fallbacks: [{ model: "openai/gpt-4o" }, { model: "azure/gpt-4o" }],
  },
});

Configuration

Parameter	Type	Required	Description
`fallbacks`	Array	Yes	List of fallback models in order of preference
`model`	string	Yes	Model identifier for each fallback

Trigger Conditions

Fallbacks activate on these errors:

Error Code	Description	Auto-retry
`429`	Rate limit exceeded	✅ Yes
`500`	Internal server error	✅ Yes
`502`	Bad gateway	✅ Yes
`503`	Service unavailable	✅ Yes
`504`	Gateway timeout	✅ Yes
`400`	Bad request	❌ No
`401`	Unauthorized	❌ No

Best Practices

Fallback Chain Design:

Use a maximum of 3 Fallback models for performance.
Order fallbacks by preference/cost.
Use models with similar capabilities.
Include fast backup option.

Example Strategies:

// Cost-optimized: cheap → expensive
fallbacks: [{ model: "openai/gpt-3.5-turbo" }, { model: "openai/gpt-4o" }];

// Speed-optimized: fast → comprehensive
fallbacks: [
  { model: "openai/gpt-4o-mini" },
  { model: "anthropic/claude-3-haiku" },
];

// Reliability-optimized: different providers
fallbacks: [
  { model: "openai/gpt-4o" },
  { model: "anthropic/claude-3-sonnet" },
  { model: "azure/gpt-4o" },
];

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "GenerateGenerate aa productproduct descriptiondescription forfor aa smartsmart homehome thermostatthermostat"
      }
    ],
    "orq": {
      "fallbacks": [
        {
          "model": "openai/gpt-4o"
        },
        {
          "model": "azure/gpt-4o"
        }
      ]
    }
  }'

from openai import OpenAI
import os

openai = OpenAI(
  api_key=os.environ.get("ORQ_API_KEY"),
  base_url="https://api.orq.ai/v2/proxy"
)

response = openai.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "GenerateGenerate aa productproduct descriptiondescription forfor aa smartsmart homehome thermostatthermostat"
        }
    ],
    extra_body={
        "orq": {
            "fallbacks": [
                {
                    "model": "openai/gpt-4o"
                },
                {
                    "model": "azure/gpt-4o"
                }
            ]
        }
    }
)

import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v2/proxy",
});

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content:
        "GenerateGenerate aa productproduct descriptiondescription forfor aa smartsmart homehome thermostatthermostat",
    },
  ],
  orq: {
    fallbacks: [
      {
        model: "openai/gpt-4o",
      },
      {
        model: "azure/gpt-4o",
      },
    ],
  },
});

Troubleshooting

Fallbacks not triggering

Check error codes match trigger conditions
Verify fallback models are available
Ensure API keys are configured for all providers

Slow response times

Reduce number of fallbacks (max 3)
Set appropriate timeouts
Use faster models in fallback chain

High costs

Failed requests may still incur charges
Monitor fallback usage rates
Optimize primary model selection

Monitoring

Track these metrics for optimal fallback performance:

Fallback trigger rate: % of requests using fallbacks
Success rate by position: Which fallbacks succeed most
Cost impact: Additional charges from fallback usage
Latency increase: Time added by fallback attempts

Limitations

Response consistency: Different models may return varying output styles
Parameter support: Not all providers support identical parameters
Cost implications: Failed requests may still incur charges from primary provider
Latency impact: Sequential attempts add processing time
Provider dependencies: Requires API keys for all fallback providers

Advanced Usage

With other features:

{
  model: "openai/gpt-4o-mini",
  orq: {
    fallbacks: [{model: "openai/gpt-4o"}],
    retries: {count: 2, on_codes: [429]},
    timeout: {call_timeout: 15000},
    cache: {type: "exact_match", ttl: 3600}
  }
}

Environment-specific:

// Production: Conservative fallbacks
const prodFallbacks = [{ model: "openai/gpt-4o" }];

// Development: Aggressive fallbacks
const devFallbacks = [
  { model: "openai/gpt-4o" },
  { model: "anthropic/claude-3" },
  { model: "azure/gpt-4o" },
];