Fallbacks

📖

This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.

Quick Start

Automatically retry failed requests with different providers or models.

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Generate a product description" }],
  orq: {
    fallbacks: [{ model: "openai/gpt-4o" }, { model: "azure/gpt-4o" }],
  },
});

Configuration

ParameterTypeRequiredDescription
fallbacksArrayYesList of fallback models in order of preference
modelstringYesModel identifier for each fallback

Trigger Conditions

Fallbacks activate on these errors:

Error CodeDescriptionAuto-retry
429Rate limit exceeded✅ Yes
500Internal server error✅ Yes
502Bad gateway✅ Yes
503Service unavailable✅ Yes
504Gateway timeout✅ Yes
400Bad request❌ No
401Unauthorized❌ No

Best Practices

Fallback Chain Design:

  • Use a maximum of 3 Fallback models for performance.
  • Order fallbacks by preference/cost.
  • Use models with similar capabilities.
  • Include fast backup option.

Example Strategies:

// Cost-optimized: cheap → expensive
fallbacks: [{ model: "openai/gpt-3.5-turbo" }, { model: "openai/gpt-4o" }];

// Speed-optimized: fast → comprehensive
fallbacks: [
  { model: "openai/gpt-4o-mini" },
  { model: "anthropic/claude-3-haiku" },
];

// Reliability-optimized: different providers
fallbacks: [
  { model: "openai/gpt-4o" },
  { model: "anthropic/claude-3-sonnet" },
  { model: "azure/gpt-4o" },
];

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "GenerateGenerate aa productproduct descriptiondescription forfor aa smartsmart homehome thermostatthermostat"
      }
    ],
    "orq": {
      "fallbacks": [
        {
          "model": "openai/gpt-4o"
        },
        {
          "model": "azure/gpt-4o"
        }
      ]
    }
  }'
from openai import OpenAI
import os

openai = OpenAI(
  api_key=os.environ.get("ORQ_API_KEY"),
  base_url="https://api.orq.ai/v2/proxy"
)

response = openai.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "GenerateGenerate aa productproduct descriptiondescription forfor aa smartsmart homehome thermostatthermostat"
        }
    ],
    extra_body={
        "orq": {
            "fallbacks": [
                {
                    "model": "openai/gpt-4o"
                },
                {
                    "model": "azure/gpt-4o"
                }
            ]
        }
    }
)
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v2/proxy",
});

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content:
        "GenerateGenerate aa productproduct descriptiondescription forfor aa smartsmart homehome thermostatthermostat",
    },
  ],
  orq: {
    fallbacks: [
      {
        model: "openai/gpt-4o",
      },
      {
        model: "azure/gpt-4o",
      },
    ],
  },
});

Troubleshooting

Fallbacks not triggering
  • Check error codes match trigger conditions
  • Verify fallback models are available
  • Ensure API keys are configured for all providers
Slow response times
  • Reduce number of fallbacks (max 3)
  • Set appropriate timeouts
  • Use faster models in fallback chain
High costs
  • Failed requests may still incur charges
  • Monitor fallback usage rates
  • Optimize primary model selection

Monitoring

Track these metrics for optimal fallback performance:

  • Fallback trigger rate: % of requests using fallbacks
  • Success rate by position: Which fallbacks succeed most
  • Cost impact: Additional charges from fallback usage
  • Latency increase: Time added by fallback attempts

Limitations

  • Response consistency: Different models may return varying output styles
  • Parameter support: Not all providers support identical parameters
  • Cost implications: Failed requests may still incur charges from primary provider
  • Latency impact: Sequential attempts add processing time
  • Provider dependencies: Requires API keys for all fallback providers

Advanced Usage

With other features:

{
  model: "openai/gpt-4o-mini",
  orq: {
    fallbacks: [{model: "openai/gpt-4o"}],
    retries: {count: 2, on_codes: [429]},
    timeout: {call_timeout: 15000},
    cache: {type: "exact_match", ttl: 3600}
  }
}

Environment-specific:

// Production: Conservative fallbacks
const prodFallbacks = [{ model: "openai/gpt-4o" }];

// Development: Aggressive fallbacks
const devFallbacks = [
  { model: "openai/gpt-4o" },
  { model: "anthropic/claude-3" },
  { model: "azure/gpt-4o" },
];