Retries & Error Handling

📖
This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.

Quick Start

Automatically retry failed requests with exponential backoff.

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Analyze customer feedback" }],
  orq: {
    retries: {
      count: 3,
      on_codes: [429, 500, 502, 503, 504],
    },
  },
});

Configuration

Parameter	Type	Required	Description
`count`	number	Yes	Max retry attempts (1-5)
`on_codes`	number[]	Yes	HTTP status codes that trigger retries

Error Codes

Code	Meaning	Retry?	Common Cause
`429`	Rate limit exceeded	✅ Yes	Too many requests
`500`	Internal server error	✅ Yes	Provider issue
`502`	Bad gateway	✅ Yes	Network/proxy issue
`503`	Service unavailable	✅ Yes	Provider maintenance
`504`	Gateway timeout	✅ Yes	Provider overload
`400`	Bad request	❌ No	Invalid parameters
`401`	Unauthorized	❌ No	Invalid API key
`403`	Forbidden	❌ No	Access denied

Retry Strategies

// Conservative (production)
retries: {
  count: 2,
  on_codes: [429, 503]  // Only rate limits and service unavailable
}

// Balanced (recommended)
retries: {
  count: 3,
  on_codes: [429, 500, 502, 503, 504]  // All transient errors
}

// Aggressive (development)
retries: {
  count: 5,
  on_codes: [429, 500, 502, 503, 504]  // Max retries
}

Backoff Algorithm

Exponential backoff with jitter

Attempt 1: 1s (±25%)
Attempt 2: 2s (±25%)
Attempt 3: 4s (±25%)
Attempt 4: 8s (±25%)
Attempt 5: 16s (±25%)

Maximum total delay: ~31 seconds for 5 retries

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "AnalyzeAnalyze customercustomer feedbackfeedback and provideprovide sentiment analysissentiment analysis"
      }
    ],
    "orq": {
      "retries": {
        "count": 3,
        "on_codes": [429, 500, 502, 503, 504]
      }
    }
  }'

from openai import OpenAI
import os

openai = OpenAI(
  api_key=os.environ.get("ORQ_API_KEY"),
  base_url="https://api.orq.ai/v2/proxy"
)

response = openai.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "AnalyzeAnalyze customercustomer feedbackfeedback and provideprovide sentiment analysissentiment analysis"
        }
    ],
    extra_body={
        "orq": {
            "retries": {
                "count": 3,
                "on_codes": [429, 500, 502, 503, 504]
            }
        }
    }
)

import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v2/proxy",
});

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content:
        "AnalyzeAnalyze customercustomer feedbackfeedback and provideprovide sentiment analysissentiment analysis",
    },
  ],
  orq: {
    retries: {
      count: 3,
      on_codes: [429, 500, 502, 503, 504],
    },
  },
});

Best Practices

Production recommendations

Follow the following advice for a best production setup:

Use count: 2-3 for balance of reliability and speed
Always include 429 (rate limits) in on_codes
Monitor retry rates to detect systemic issues
Implement circuit breaker for persistent failures

Error handling

try {
  const response = await openai.chat.completions.create({...});
} catch (error) {
  if (error.status === 400) {
    // Don't retry client errors - fix the request
    console.error('Bad request:', error.message);
  } else if (error.status >= 500) {
    // Server errors might need manual intervention
    console.error('Server error:', error.message);
  }
}

Troubleshooting

High retry rates

Check if you're hitting rate limits frequently
Verify API keys have sufficient quotas
Monitor provider status pages for outages

Slow response times

Reduce retry count for latency-sensitive apps
Use shorter timeout values with retries
Consider fallbacks for faster alternatives

Still getting errors

Check if error codes are in on_codes list
Verify retry count isn't exhausted
Review provider-specific error documentation

Monitoring

Track these retry metrics:

const retryMetrics = {
  totalRequests: 0,
  retriedRequests: 0,
  retriesByAttempt: { 1: 0, 2: 0, 3: 0 }, // Retry attempt distribution
  retriesByCode: { 429: 0, 500: 0 }, // By error code
  avgRetryLatency: 0, // Added latency from retries
  finalFailures: 0, // Requests that failed after all retries
};

Limitations

Increased latency: Retries add delay (up to 31s for 5 attempts)
Cost implications: Failed requests may still incur charges
Rate limit consumption: Each retry counts against quotas
Limited retries: Maximum 5 attempts to prevent excessive delays
Non-retryable errors: 4xx client errors are not retried

Advanced Usage

Environment-specific configs:

const retryConfig = {
  development: { count: 1, on_codes: [429] }, // Fast feedback
  staging: { count: 2, on_codes: [429, 503] }, // Light retries
  production: { count: 3, on_codes: [429, 500, 502, 503, 504] }, // Full protection
};

With other features:

{
  orq: {
    retries: {count: 3, on_codes: [429, 503]},
    timeout: {call_timeout: 10000},               // Timeout before retry
    fallbacks: [{model: "backup-model"}],         // If all retries fail
    cache: {type: "exact_match", ttl: 300}        // Cache successful results
  }
}

Custom retry logic (client-side):

const customRetry = async (requestFn, maxAttempts = 3) => {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await requestFn();
    } catch (error) {
      if (attempt === maxAttempts || error.status < 500) {
        throw error; // Final attempt or non-retryable error
      }
      await new Promise(
        (resolve) => setTimeout(resolve, Math.pow(2, attempt) * 1000), // Exponential backoff
      );
    }
  }
};