> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# AI Router | Retries and Fallbacks

> Automatically retry failed LLM requests with exponential backoff, and configure fallback models for high availability. Handle rate limits, server errors, and network issues.

<CardGroup cols={2}>
  <Card title="Retries" icon="rotate-right" href="#retries">
    Retry failed requests automatically with exponential backoff. Configure which HTTP error codes trigger retries and how many attempts to make.
  </Card>

  <Card title="Fallbacks" icon="split" href="#fallbacks">
    Route to a different model when the primary fails. Define a fallback chain across providers for high availability.
  </Card>
</CardGroup>

## Retries

Automatically retry failed requests with exponential backoff.

### Quick Start

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  const response = await openai.chat.completions.create({
    model: "openai/gpt-4o-mini",
    messages: [{ role: "user", content: "Analyze customer feedback" }],
    retry: {
      count: 3,
      on_codes: [429, 500, 502, 503, 504],
    },
  });
  ```
</CodeGroup>

### Configuration

| Parameter  | Type      | Required | Description                                              |
| ---------- | --------- | -------- | -------------------------------------------------------- |
| `count`    | number    | Yes      | Max retry attempts (1-5)                                 |
| `on_codes` | number\[] | No       | HTTP status codes that trigger retries (default: \[429]) |

### Error Codes

| Code  | Meaning               | Retry?                    | Common Cause          |
| ----- | --------------------- | ------------------------- | --------------------- |
| `429` | Rate limit exceeded   | <Icon icon="check" /> Yes | Too many requests     |
| `500` | Internal server error | <Icon icon="check" /> Yes | Provider issue        |
| `501` | Not implemented       | <Icon icon="check" /> Yes | Feature unavailable   |
| `502` | Bad gateway           | <Icon icon="check" /> Yes | Network/Gateway issue |
| `503` | Service unavailable   | <Icon icon="check" /> Yes | Provider maintenance  |
| `504` | Gateway timeout       | <Icon icon="check" /> Yes | Provider overload     |
| `400` | Bad request           | <Icon icon="xmark" /> No  | Invalid parameters    |
| `401` | Unauthorized          | <Icon icon="xmark" /> No  | Invalid API key       |
| `403` | Forbidden             | <Icon icon="xmark" /> No  | Access denied         |

### Retry Strategies

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  // Conservative (production)
  retry: {
    count: 2,
    on_codes: [429, 503]  // Only rate limits and service unavailable
  }

  // Balanced (recommended)
  retry: {
    count: 3,
    on_codes: [429, 500, 502, 503, 504]  // All transient errors
  }

  // Aggressive (development)
  retry: {
    count: 5,
    on_codes: [429, 500, 502, 503, 504]  // Max retries
  }
  ```
</CodeGroup>

### Backoff Algorithm

#### Exponential backoff with jitter

* Attempt 1: 1s (±25%)
* Attempt 2: 2s (±25%)
* Attempt 3: 4s (±25%)
* Attempt 4: 8s (±25%)
* Attempt 5: 16s (±25%)

**Maximum total delay**: \~31 seconds for 5 retries

### Code examples

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/chat/completions \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/gpt-4o-mini",
      "messages": [
        {
          "role": "user",
          "content": "Analyze customer feedback and provide sentiment analysis"
        }
      ],
      "retry": {
        "count": 3,
        "on_codes": [429, 500, 502, 503, 504]
      }
    }'
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  openai = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router"
  )

  response = openai.chat.completions.create(
      model="openai/gpt-4o",
      messages=[
          {
              "role": "user",
              "content": "Analyze customer feedback and provide sentiment analysis"
          }
      ],
      extra_body={
          "retry": {
              "count": 3,
              "on_codes": [429, 500, 502, 503, 504]
          }
      }
  )
  ```

  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const openai = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await openai.chat.completions.create({
    model: "openai/gpt-4o",
    messages: [
      {
        role: "user",
        content: "Analyze customer feedback and provide sentiment analysis",
      },
    ],
    retry: {
      count: 3,
      on_codes: [429, 500, 502, 503, 504],
    },
  });
  ```
</CodeGroup>

### Best Practices

#### Production recommendations

Follow the following advice for a best production setup:

* Use `count: 2-3` for balance of reliability and speed
* Always include `429` (rate limits) in `on_codes`
* Monitor retry rates to detect systemic issues
* Implement circuit breaker for persistent failures

#### Error handling

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  try {
    const response = await openai.chat.completions.create({...});
  } catch (error) {
    if (error.status === 400) {
      // Don't retry client errors - fix the request
      console.error('Bad request:', error.message);
    } else if (error.status >= 500) {
      // Server errors might need manual intervention
      console.error('Server error:', error.message);
    }
  }
  ```
</CodeGroup>

### Troubleshooting

**High retry rates**

* Check if you're hitting rate limits frequently
* Verify API keys have sufficient quotas
* Monitor provider status pages for outages

**Slow response times**

* Reduce retry count for latency-sensitive apps
* Use shorter timeout values with retries
* Consider fallbacks for faster alternatives

**Still getting errors**

* Check if error codes are in `on_codes` list
* Verify retry count isn't exhausted
* Review provider-specific error documentation

### Monitoring

Track these retry metrics:

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  const retryMetrics = {
    totalRequests: 0,
    retriedRequests: 0,
    retriesByAttempt: { 1: 0, 2: 0, 3: 0 }, // Retry attempt distribution
    retriesByCode: { 429: 0, 500: 0 }, // By error code
    avgRetryLatency: 0, // Added latency from retries
    finalFailures: 0, // Requests that failed after all retries
  };
  ```
</CodeGroup>

### Limitations

* **Increased latency**: Retries add delay (up to 31s for 5 attempts)
* **Cost implications**: Failed requests may still incur charges
* **Rate limit consumption**: Each retry counts against quotas
* **Limited retries**: Maximum 5 attempts to prevent excessive delays
* **Non-retryable errors**: 4xx client errors are not retried

### Advanced Usage

**Environment-specific configs:**

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  const retryConfig = {
    development: { count: 1, on_codes: [429] }, // Fast feedback
    staging: { count: 2, on_codes: [429, 503] }, // Light retries
    production: { count: 3, on_codes: [429, 500, 502, 503, 504] }, // Full protection
  };
  ```
</CodeGroup>

**With other features:**

<CodeGroup>
  ```json JSON theme={"theme":{"light":"github-light","dark":"github-dark"}}
  {
    "retry": { "count": 3, "on_codes": [429, 503] },
    "timeout": { "call_timeout": 10000 },
    "fallbacks": [{ "model": "backup-model" }],
    "cache": { "type": "exact_match", "ttl": 300 }
  }
  ```
</CodeGroup>

**Custom retry logic (client-side):**

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  const customRetry = async (requestFn, maxAttempts = 3) => {
    for (let attempt = 1; attempt <= maxAttempts; attempt++) {
      try {
        return await requestFn();
      } catch (error) {
        if (attempt === maxAttempts || error.status < 500) {
          throw error; // Final attempt or non-retryable error
        }
        await new Promise(
          (resolve) => setTimeout(resolve, Math.pow(2, attempt) * 1000), // Exponential backoff
        );
      }
    }
  };
  ```
</CodeGroup>

## Fallbacks

Automatically switch to a different model when the primary fails.

### Quick Start

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  const response = await openai.chat.completions.create({
    model: "openai/gpt-4o-mini",
    messages: [{ role: "user", content: "Generate a product description" }],
    fallbacks: [{ model: "openai/gpt-4o" }, { model: "azure/gpt-4o" }],
  });
  ```
</CodeGroup>

### Configuration

| Parameter   | Type   | Required | Description                                    |
| ----------- | ------ | -------- | ---------------------------------------------- |
| `fallbacks` | Array  | Yes      | List of fallback models in order of preference |
| `model`     | string | Yes      | Model identifier for each fallback             |

### Trigger Conditions

Fallbacks activate on these errors:

| Error Code | Description           | Triggers Fallback         |
| ---------- | --------------------- | ------------------------- |
| `429`      | Rate limit exceeded   | <Icon icon="check" /> Yes |
| `500`      | Internal server error | <Icon icon="check" /> Yes |
| `501`      | Not implemented       | <Icon icon="check" /> Yes |
| `502`      | Bad gateway           | <Icon icon="check" /> Yes |
| `503`      | Service unavailable   | <Icon icon="check" /> Yes |
| `504`      | Gateway timeout       | <Icon icon="check" /> Yes |
| `400`      | Bad request           | <Icon icon="xmark" /> No  |
| `401`      | Unauthorized          | <Icon icon="xmark" /> No  |
| `403`      | Forbidden             | <Icon icon="xmark" /> No  |

### Best Practices

Use a maximum of 3 fallback models. Order them by preference or cost, and choose models with similar capabilities.

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  // Cost-optimized: cheap then expensive
  fallbacks: [{ model: "openai/gpt-3.5-turbo" }, { model: "openai/gpt-4o" }];

  // Reliability-optimized: different providers
  fallbacks: [
    { model: "openai/gpt-4o" },
    { model: "anthropic/claude-sonnet-4-0" },
    { model: "azure/gpt-4o" },
  ];
  ```
</CodeGroup>

### Code examples

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/chat/completions \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/gpt-4o-mini",
      "messages": [{ "role": "user", "content": "Generate a product description" }],
      "fallbacks": [
        { "model": "openai/gpt-4o" },
        { "model": "azure/gpt-4o" }
      ]
    }'
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  openai = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router"
  )

  response = openai.chat.completions.create(
      model="openai/gpt-4o-mini",
      messages=[{"role": "user", "content": "Generate a product description"}],
      extra_body={
          "fallbacks": [
              {"model": "openai/gpt-4o"},
              {"model": "azure/gpt-4o"}
          ]
      }
  )
  ```

  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const openai = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await openai.chat.completions.create({
    model: "openai/gpt-4o-mini",
    messages: [{ role: "user", content: "Generate a product description" }],
    fallbacks: [{ model: "openai/gpt-4o" }, { model: "azure/gpt-4o" }],
  });
  ```
</CodeGroup>

### Limitations

* **Response consistency**: Different models may return varying output styles
* **Parameter support**: Not all providers support identical parameters
* **Cost implications**: Failed requests may still incur charges from the primary provider
* **Latency impact**: Sequential attempts add processing time
* **Provider dependencies**: Requires API keys for all fallback providers
