Retries and fallbacks in the AI Gateway

Use Cases

Surviving transient provider errors without surfacing failures to end users.
Automatic failover to a backup provider when the primary is degraded or rate-limited.
Absorbing short rate-limit bursts without manual intervention or custom retry logic.
Meeting availability SLAs on production features without adding retry code to every service.

Retries

Retry failed requests automatically with exponential backoff. Configure which HTTP error codes trigger retries and how many attempts to make.

Fallbacks

Route to a different model when the primary fails. Define a fallback chain across providers for high availability.

Retries

Automatically retry failed requests with exponential backoff.

Quick Start

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5-mini",
    "input": "Analyze customer feedback",
    "retry": {"count": 3, "on_codes": [429, 500, 502, 503, 504]}
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.responses.create({
  model: "openai/gpt-5-mini",
  input: "Analyze customer feedback",
  retry: {
    count: 3,
    on_codes: [429, 500, 502, 503, 504],
  },
});

console.log(response.output_text);

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.responses.create(
    model="openai/gpt-5-mini",
    input="Analyze customer feedback",
    extra_body={
        "retry": {"count": 3, "on_codes": [429, 500, 502, 503, 504]}
    },
)

print(response.output_text)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "openai/gpt-5-mini",
  messages: [{ role: "user", content: "Analyze customer feedback" }],
  retry: {
    count: 3,
    on_codes: [429, 500, 502, 503, 504],
  },
});

Configuration

Parameter	Type	Required	Description
`count`	number	Yes	Max retry attempts (1-5)
`on_codes`	number[]	No	HTTP status codes that trigger retries (default: [429])

Error Codes

Code	Meaning	Retry?	Common Cause
`429`	Rate limit exceeded	Yes	Too many requests
`500`	Internal server error	Yes	Provider issue
`501`	Not implemented	No	Definitive; retrying will not succeed
`502`	Bad gateway	Yes	Network/Gateway issue
`503`	Service unavailable	Yes	Provider maintenance
`504`	Gateway timeout	Yes	Provider overload
`400`	Bad request	No	Invalid parameters
`401`	Unauthorized	No	Invalid API key
`403`	Forbidden	No	Access denied

Retry Strategies

// Conservative
retry: {
  count: 2,
  on_codes: [429, 503]  // Only rate limits and service unavailable
}

// Balanced (recommended)
retry: {
  count: 3,
  on_codes: [429, 500, 502, 503, 504]  // All transient errors
}

// Aggressive
retry: {
  count: 5,
  on_codes: [429, 500, 502, 503, 504]  // Max retries
}

Backoff Algorithm

Exponential backoff with jitter

Attempt 1: 1s (±25%).
Attempt 2: 2s (±25%).
Attempt 3: 4s (±25%).
Attempt 4: 8s (±25%).
Attempt 5: 16s (±25%). Maximum total delay: ~31 seconds for 5 retries

Code examples

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5-mini",
    "input": "Analyze customer feedback and provide sentiment analysis",
    "retry": {
      "count": 3,
      "on_codes": [429, 500, 502, 503, 504]
    }
  }'

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5-mini",
    "messages": [
      {
        "role": "user",
        "content": "Analyze customer feedback and provide sentiment analysis"
      }
    ],
    "retry": {
      "count": 3,
      "on_codes": [429, 500, 502, 503, 504]
    }
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.responses.create({
  model: "openai/gpt-5-mini",
  input: "Analyze customer feedback and provide sentiment analysis",
  retry: {
    count: 3,
    on_codes: [429, 500, 502, 503, 504],
  },
});

console.log(response.output_text);

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.responses.create(
    model="openai/gpt-5-mini",
    input="Analyze customer feedback and provide sentiment analysis",
    extra_body={
        "retry": {
            "count": 3,
            "on_codes": [429, 500, 502, 503, 504],
        }
    },
)

print(response.output_text)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "openai/gpt-5-mini",
  messages: [
    {
      role: "user",
      content: "Analyze customer feedback and provide sentiment analysis",
    },
  ],
  retry: {
    count: 3,
    on_codes: [429, 500, 502, 503, 504],
  },
});

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[
        {
            "role": "user",
            "content": "Analyze customer feedback and provide sentiment analysis",
        }
    ],
    extra_body={
        "retry": {
            "count": 3,
            "on_codes": [429, 500, 502, 503, 504],
        }
    },
)

Best Practices

Production recommendations

Follow the following advice for a best production setup:

Use count: 2-3 for balance of reliability and speed.
Always include 429 (rate limits) in on_codes.
Monitor retry rates to detect systemic issues.
Implement circuit breaker for persistent failures.

Error handling

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

try {
  const response = await client.responses.create({
    model: "openai/gpt-5-mini",
    input: "Hello",
  });
} catch (error) {
  if (error instanceof OpenAI.APIError) {
    if (error.status === 400) {
      console.error('Bad request:', error.message);
    } else if (error.status >= 500) {
      console.error('Server error:', error.message);
    }
  }
}

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

try {
  const response = await client.chat.completions.create({
    model: "openai/gpt-5-mini",
    messages: [{ role: "user", content: "Hello" }],
  });
} catch (error) {
  if (error instanceof OpenAI.APIError) {
    if (error.status === 400) {
      // Don't retry client errors - fix the request
      console.error('Bad request:', error.message);
    } else if (error.status >= 500) {
      // Server errors might need manual intervention
      console.error('Server error:', error.message);
    }
  }
}

Troubleshooting

High retry rates

Check if you’re hitting rate limits frequently.
Verify API keys have sufficient quotas.
Monitor provider status pages for outages. Slow response times
Reduce retry count for latency-sensitive apps.
Use shorter timeout values with retries.
Consider fallbacks for faster alternatives. Still getting errors
Check if error codes are in on_codes list.
Verify retry count isn’t exhausted.
Review provider-specific error documentation.

Monitoring

Track these retry metrics:

const retryMetrics = {
  totalRequests: 0,
  retriedRequests: 0,
  retriesByAttempt: { 1: 0, 2: 0, 3: 0 }, // Retry attempt distribution
  retriesByCode: { 429: 0, 500: 0 }, // By error code
  avgRetryLatency: 0, // Added latency from retries
  finalFailures: 0, // Requests that failed after all retries
};

Limitations

Increased latency: Retries add delay (up to 31s for 5 attempts).
Cost implications: Failed requests may still incur charges.
Rate limit consumption: Each retry counts against quotas.
Limited retries: Maximum 5 attempts to prevent excessive delays.
Non-retryable errors: 4xx client errors are not retried.

Advanced Usage

Environment-specific configs:

const retryConfig = {
  development: { count: 1, on_codes: [429] }, // Fast feedback
  staging: { count: 2, on_codes: [429, 503] }, // Light retries
  production: { count: 3, on_codes: [429, 500, 502, 503, 504] }, // Full protection
};

With other features:

{
  "retry": { "count": 3, "on_codes": [429, 503] },
  "timeout": { "call_timeout": 10000 },
  "fallbacks": [{ "model": "backup-model" }],
  "cache": { "type": "exact_match", "ttl": 300 }
}

Custom retry logic (client-side):

const customRetry = async (requestFn, maxAttempts = 3) => {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await requestFn();
    } catch (error) {
      if (attempt === maxAttempts || error.status < 500) {
        throw error; // Final attempt or non-retryable error
      }
      await new Promise(
        (resolve) => setTimeout(resolve, Math.pow(2, attempt) * 1000), // Exponential backoff
      );
    }
  }
};

Fallbacks

Automatically switch to a different model when the primary fails.

Quick Start

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.responses.create({
  model: "openai/gpt-5-mini",
  input: "Generate a product description",
  fallbacks: [{ model: "openai/gpt-5" }, { model: "azure/gpt-5-mini" }],
});

console.log(response.output_text);

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "openai/gpt-5-mini",
  messages: [{ role: "user", content: "Generate a product description" }],
  fallbacks: [{ model: "openai/gpt-5" }, { model: "azure/gpt-5-mini" }],
});

Configuration

Parameter	Type	Required	Description
`fallbacks`	Array	Yes	List of fallback models in order of preference
`model`	string	Yes	Model identifier for each fallback

Trigger Conditions

Fallbacks activate on these errors:

Error Code	Description	Triggers Fallback
`429`	Rate limit exceeded	Yes
`500`	Internal server error	Yes
`501`	Not implemented	No
`502`	Bad gateway	Yes
`503`	Service unavailable	Yes
`504`	Gateway timeout	Yes
`400`	Bad request	No
`401`	Unauthorized	No
`403`	Forbidden	No

Best Practices

Use a maximum of 3 fallback models. Order them by preference or cost, and choose models with similar capabilities.

// Cost-optimized: cheap then expensive
fallbacks: [{ model: "openai/gpt-5-mini" }, { model: "openai/gpt-5" }];

// Reliability-optimized: different providers
fallbacks: [
  { model: "openai/gpt-5" },
  { model: "anthropic/claude-sonnet-4-6" },
  { model: "azure/gpt-5-mini" },
];

Code examples

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5-mini",
    "input": "Generate a product description",
    "fallbacks": [
      { "model": "openai/gpt-5" },
      { "model": "azure/gpt-5-mini" }
    ]
  }'

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5-mini",
    "messages": [{ "role": "user", "content": "Generate a product description" }],
    "fallbacks": [
      { "model": "openai/gpt-5" },
      { "model": "azure/gpt-5-mini" }
    ]
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.responses.create({
  model: "openai/gpt-5-mini",
  input: "Generate a product description",
  fallbacks: [{ model: "openai/gpt-5" }, { model: "azure/gpt-5-mini" }],
});

console.log(response.output_text);

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.responses.create(
    model="openai/gpt-5-mini",
    input="Generate a product description",
    extra_body={
        "fallbacks": [
            {"model": "openai/gpt-5"},
            {"model": "azure/gpt-5-mini"}
        ]
    }
)

print(response.output_text)

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{"role": "user", "content": "Generate a product description"}],
    extra_body={
        "fallbacks": [
            {"model": "openai/gpt-5"},
            {"model": "azure/gpt-5-mini"}
        ]
    }
)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "openai/gpt-5-mini",
  messages: [{ role: "user", content: "Generate a product description" }],
  fallbacks: [{ model: "openai/gpt-5" }, { model: "azure/gpt-5-mini" }],
});

Limitations

Response consistency: Different models may return varying output styles.
Parameter support: Not all providers support identical parameters.
Cost implications: Failed requests may still incur charges from the primary provider.
Latency impact: Sequential attempts add processing time.
Provider dependencies: Requires API keys for all fallback providers.

Retries

Fallbacks

​Retries

​Quick Start

​Configuration

​Error Codes

​Retry Strategies

​Backoff Algorithm

​Exponential backoff with jitter

​Code examples

​Best Practices

​Production recommendations

​Error handling

​Troubleshooting

​Monitoring

​Limitations

​Advanced Usage

​Fallbacks

​Quick Start

​Configuration

​Trigger Conditions

​Best Practices

​Code examples

​Limitations

Retries

Quick Start

Configuration

Error Codes

Retry Strategies

Backoff Algorithm

Exponential backoff with jitter

Code examples

Best Practices

Production recommendations

Error handling

Troubleshooting

Monitoring

Limitations

Advanced Usage

Fallbacks

Quick Start

Configuration

Trigger Conditions

Best Practices

Code examples

Limitations