> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Retries and fallbacks in the AI Gateway

> Retry failed LLM requests with exponential backoff and configure fallback models in Orq.ai to handle rate limits, server errors, and network failures.

**Use Cases**

* Surviving transient provider errors without surfacing failures to end users.
* Automatic failover to a backup provider when the primary is degraded or rate-limited.
* Absorbing short rate-limit bursts without manual intervention or custom retry logic.
* Meeting availability SLAs on production features without adding retry code to every service.

***

<CardGroup cols={2}>
  <Card title="Retries" icon="rotate-right" href="#retries">
    Retry failed requests automatically with exponential backoff. Configure which HTTP error codes trigger retries and how many attempts to make.
  </Card>

  <Card title="Fallbacks" icon="split" href="#fallbacks">
    Route to a different model when the primary fails. Define a fallback chain across providers for high availability.
  </Card>
</CardGroup>

## Retries

Automatically retry failed requests with exponential backoff.

### Quick Start

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/responses \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/gpt-5-mini",
      "input": "Analyze customer feedback",
      "retry": {"count": 3, "on_codes": [429, 500, 502, 503, 504]}
    }'
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.responses.create({
    model: "openai/gpt-5-mini",
    input: "Analyze customer feedback",
    retry: {
      count: 3,
      on_codes: [429, 500, 502, 503, 504],
    },
  });

  console.log(response.output_text);
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.responses.create(
      model="openai/gpt-5-mini",
      input="Analyze customer feedback",
      extra_body={
          "retry": {"count": 3, "on_codes": [429, 500, 502, 503, 504]}
      },
  )

  print(response.output_text)
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "openai/gpt-5-mini",
    messages: [{ role: "user", content: "Analyze customer feedback" }],
    retry: {
      count: 3,
      on_codes: [429, 500, 502, 503, 504],
    },
  });
  ```
</CodeGroup>

### Configuration

| Parameter  | Type      | Required | Description                                              |
| ---------- | --------- | -------- | -------------------------------------------------------- |
| `count`    | number    | Yes      | Max retry attempts (1-5)                                 |
| `on_codes` | number\[] | No       | HTTP status codes that trigger retries (default: \[429]) |

### Error Codes

| Code  | Meaning               | Retry?                    | Common Cause                          |
| ----- | --------------------- | ------------------------- | ------------------------------------- |
| `429` | Rate limit exceeded   | <Icon icon="check" /> Yes | Too many requests                     |
| `500` | Internal server error | <Icon icon="check" /> Yes | Provider issue                        |
| `501` | Not implemented       | <Icon icon="xmark" /> No  | Definitive; retrying will not succeed |
| `502` | Bad gateway           | <Icon icon="check" /> Yes | Network/Gateway issue                 |
| `503` | Service unavailable   | <Icon icon="check" /> Yes | Provider maintenance                  |
| `504` | Gateway timeout       | <Icon icon="check" /> Yes | Provider overload                     |
| `400` | Bad request           | <Icon icon="xmark" /> No  | Invalid parameters                    |
| `401` | Unauthorized          | <Icon icon="xmark" /> No  | Invalid API key                       |
| `403` | Forbidden             | <Icon icon="xmark" /> No  | Access denied                         |

### Retry Strategies

<CodeGroup>
  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  // Conservative
  retry: {
    count: 2,
    on_codes: [429, 503]  // Only rate limits and service unavailable
  }

  // Balanced (recommended)
  retry: {
    count: 3,
    on_codes: [429, 500, 502, 503, 504]  // All transient errors
  }

  // Aggressive
  retry: {
    count: 5,
    on_codes: [429, 500, 502, 503, 504]  // Max retries
  }
  ```
</CodeGroup>

### Backoff Algorithm

#### Exponential backoff with jitter

* Attempt 1: 1s (±25%).
* Attempt 2: 2s (±25%).
* Attempt 3: 4s (±25%).
* Attempt 4: 8s (±25%).
* Attempt 5: 16s (±25%).
  **Maximum total delay**: \~31 seconds for 5 retries

### Code examples

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/responses \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/gpt-5-mini",
      "input": "Analyze customer feedback and provide sentiment analysis",
      "retry": {
        "count": 3,
        "on_codes": [429, 500, 502, 503, 504]
      }
    }'
  ```

  ```bash cURL (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/chat/completions \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/gpt-5-mini",
      "messages": [
        {
          "role": "user",
          "content": "Analyze customer feedback and provide sentiment analysis"
        }
      ],
      "retry": {
        "count": 3,
        "on_codes": [429, 500, 502, 503, 504]
      }
    }'
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.responses.create({
    model: "openai/gpt-5-mini",
    input: "Analyze customer feedback and provide sentiment analysis",
    retry: {
      count: 3,
      on_codes: [429, 500, 502, 503, 504],
    },
  });

  console.log(response.output_text);
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.responses.create(
      model="openai/gpt-5-mini",
      input="Analyze customer feedback and provide sentiment analysis",
      extra_body={
          "retry": {
              "count": 3,
              "on_codes": [429, 500, 502, 503, 504],
          }
      },
  )

  print(response.output_text)
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "openai/gpt-5-mini",
    messages: [
      {
        role: "user",
        content: "Analyze customer feedback and provide sentiment analysis",
      },
    ],
    retry: {
      count: 3,
      on_codes: [429, 500, 502, 503, 504],
    },
  });
  ```

  ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.chat.completions.create(
      model="openai/gpt-5-mini",
      messages=[
          {
              "role": "user",
              "content": "Analyze customer feedback and provide sentiment analysis",
          }
      ],
      extra_body={
          "retry": {
              "count": 3,
              "on_codes": [429, 500, 502, 503, 504],
          }
      },
  )
  ```
</CodeGroup>

### Best Practices

#### Production recommendations

Follow the following advice for a best production setup:

* Use `count: 2-3` for balance of reliability and speed.
* Always include `429` (rate limits) in `on_codes`.
* Monitor retry rates to detect systemic issues.
* Implement circuit breaker for persistent failures.

#### Error handling

<CodeGroup>
  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  try {
    const response = await client.responses.create({
      model: "openai/gpt-5-mini",
      input: "Hello",
    });
  } catch (error) {
    if (error instanceof OpenAI.APIError) {
      if (error.status === 400) {
        console.error('Bad request:', error.message);
      } else if (error.status >= 500) {
        console.error('Server error:', error.message);
      }
    }
  }
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  try {
    const response = await client.chat.completions.create({
      model: "openai/gpt-5-mini",
      messages: [{ role: "user", content: "Hello" }],
    });
  } catch (error) {
    if (error instanceof OpenAI.APIError) {
      if (error.status === 400) {
        // Don't retry client errors - fix the request
        console.error('Bad request:', error.message);
      } else if (error.status >= 500) {
        // Server errors might need manual intervention
        console.error('Server error:', error.message);
      }
    }
  }
  ```
</CodeGroup>

### Troubleshooting

**High retry rates**

* Check if you're hitting rate limits frequently.

* Verify API keys have sufficient quotas.

* Monitor provider status pages for outages.
  **Slow response times**

* Reduce retry count for latency-sensitive apps.

* Use shorter timeout values with retries.

* Consider fallbacks for faster alternatives.
  **Still getting errors**

* Check if error codes are in `on_codes` list.

* Verify retry count isn't exhausted.

* Review provider-specific error documentation.

### Monitoring

Track these retry metrics:

<CodeGroup>
  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  const retryMetrics = {
    totalRequests: 0,
    retriedRequests: 0,
    retriesByAttempt: { 1: 0, 2: 0, 3: 0 }, // Retry attempt distribution
    retriesByCode: { 429: 0, 500: 0 }, // By error code
    avgRetryLatency: 0, // Added latency from retries
    finalFailures: 0, // Requests that failed after all retries
  };
  ```
</CodeGroup>

### Limitations

* **Increased latency**: Retries add delay (up to 31s for 5 attempts).
* **Cost implications**: Failed requests may still incur charges.
* **Rate limit consumption**: Each retry counts against quotas.
* **Limited retries**: Maximum 5 attempts to prevent excessive delays.
* **Non-retryable errors**: 4xx client errors are not retried.

### Advanced Usage

**Environment-specific configs:**

<CodeGroup>
  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  const retryConfig = {
    development: { count: 1, on_codes: [429] }, // Fast feedback
    staging: { count: 2, on_codes: [429, 503] }, // Light retries
    production: { count: 3, on_codes: [429, 500, 502, 503, 504] }, // Full protection
  };
  ```
</CodeGroup>

**With other features:**

<CodeGroup>
  ```json JSON theme={"theme":{"light":"github-light","dark":"github-dark"}}
  {
    "retry": { "count": 3, "on_codes": [429, 503] },
    "timeout": { "call_timeout": 10000 },
    "fallbacks": [{ "model": "backup-model" }],
    "cache": { "type": "exact_match", "ttl": 300 }
  }
  ```
</CodeGroup>

**Custom retry logic (client-side):**

<CodeGroup>
  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  const customRetry = async (requestFn, maxAttempts = 3) => {
    for (let attempt = 1; attempt <= maxAttempts; attempt++) {
      try {
        return await requestFn();
      } catch (error) {
        if (attempt === maxAttempts || error.status < 500) {
          throw error; // Final attempt or non-retryable error
        }
        await new Promise(
          (resolve) => setTimeout(resolve, Math.pow(2, attempt) * 1000), // Exponential backoff
        );
      }
    }
  };
  ```
</CodeGroup>

## Fallbacks

Automatically switch to a different model when the primary fails.

### Quick Start

<CodeGroup>
  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.responses.create({
    model: "openai/gpt-5-mini",
    input: "Generate a product description",
    fallbacks: [{ model: "openai/gpt-5" }, { model: "azure/gpt-5-mini" }],
  });

  console.log(response.output_text);
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "openai/gpt-5-mini",
    messages: [{ role: "user", content: "Generate a product description" }],
    fallbacks: [{ model: "openai/gpt-5" }, { model: "azure/gpt-5-mini" }],
  });
  ```
</CodeGroup>

### Configuration

| Parameter   | Type   | Required | Description                                    |
| ----------- | ------ | -------- | ---------------------------------------------- |
| `fallbacks` | Array  | Yes      | List of fallback models in order of preference |
| `model`     | string | Yes      | Model identifier for each fallback             |

### Trigger Conditions

Fallbacks activate on these errors:

| Error Code | Description           | Triggers Fallback         |
| ---------- | --------------------- | ------------------------- |
| `429`      | Rate limit exceeded   | <Icon icon="check" /> Yes |
| `500`      | Internal server error | <Icon icon="check" /> Yes |
| `501`      | Not implemented       | <Icon icon="xmark" /> No  |
| `502`      | Bad gateway           | <Icon icon="check" /> Yes |
| `503`      | Service unavailable   | <Icon icon="check" /> Yes |
| `504`      | Gateway timeout       | <Icon icon="check" /> Yes |
| `400`      | Bad request           | <Icon icon="xmark" /> No  |
| `401`      | Unauthorized          | <Icon icon="xmark" /> No  |
| `403`      | Forbidden             | <Icon icon="xmark" /> No  |

### Best Practices

Use a maximum of 3 fallback models. Order them by preference or cost, and choose models with similar capabilities.

<CodeGroup>
  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  // Cost-optimized: cheap then expensive
  fallbacks: [{ model: "openai/gpt-5-mini" }, { model: "openai/gpt-5" }];

  // Reliability-optimized: different providers
  fallbacks: [
    { model: "openai/gpt-5" },
    { model: "anthropic/claude-sonnet-4-6" },
    { model: "azure/gpt-5-mini" },
  ];
  ```
</CodeGroup>

### Code examples

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/responses \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/gpt-5-mini",
      "input": "Generate a product description",
      "fallbacks": [
        { "model": "openai/gpt-5" },
        { "model": "azure/gpt-5-mini" }
      ]
    }'
  ```

  ```bash cURL (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/chat/completions \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/gpt-5-mini",
      "messages": [{ "role": "user", "content": "Generate a product description" }],
      "fallbacks": [
        { "model": "openai/gpt-5" },
        { "model": "azure/gpt-5-mini" }
      ]
    }'
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.responses.create({
    model: "openai/gpt-5-mini",
    input: "Generate a product description",
    fallbacks: [{ model: "openai/gpt-5" }, { model: "azure/gpt-5-mini" }],
  });

  console.log(response.output_text);
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.responses.create(
      model="openai/gpt-5-mini",
      input="Generate a product description",
      extra_body={
          "fallbacks": [
              {"model": "openai/gpt-5"},
              {"model": "azure/gpt-5-mini"}
          ]
      }
  )

  print(response.output_text)
  ```

  ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.chat.completions.create(
      model="openai/gpt-5-mini",
      messages=[{"role": "user", "content": "Generate a product description"}],
      extra_body={
          "fallbacks": [
              {"model": "openai/gpt-5"},
              {"model": "azure/gpt-5-mini"}
          ]
      }
  )
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "openai/gpt-5-mini",
    messages: [{ role: "user", content: "Generate a product description" }],
    fallbacks: [{ model: "openai/gpt-5" }, { model: "azure/gpt-5-mini" }],
  });
  ```
</CodeGroup>

### Limitations

* **Response consistency**: Different models may return varying output styles.
* **Parameter support**: Not all providers support identical parameters.
* **Cost implications**: Failed requests may still incur charges from the primary provider.
* **Latency impact**: Sequential attempts add processing time.
* **Provider dependencies**: Requires API keys for all fallback providers.
