Retries & Error Handling

📖

This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.

Quick Start

Automatically retry failed requests with exponential backoff.

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Analyze customer feedback" }],
  orq: {
    retries: {
      count: 3,
      on_codes: [429, 500, 502, 503, 504],
    },
  },
});

Configuration

ParameterTypeRequiredDescription
countnumberYesMax retry attempts (1-5)
on_codesnumber[]YesHTTP status codes that trigger retries

Error Codes

CodeMeaningRetry?Common Cause
429Rate limit exceeded✅ YesToo many requests
500Internal server error✅ YesProvider issue
502Bad gateway✅ YesNetwork/proxy issue
503Service unavailable✅ YesProvider maintenance
504Gateway timeout✅ YesProvider overload
400Bad request❌ NoInvalid parameters
401Unauthorized❌ NoInvalid API key
403Forbidden❌ NoAccess denied

Retry Strategies

// Conservative (production)
retries: {
  count: 2,
  on_codes: [429, 503]  // Only rate limits and service unavailable
}

// Balanced (recommended)
retries: {
  count: 3,
  on_codes: [429, 500, 502, 503, 504]  // All transient errors
}

// Aggressive (development)
retries: {
  count: 5,
  on_codes: [429, 500, 502, 503, 504]  // Max retries
}

Backoff Algorithm

Exponential backoff with jitter

  • Attempt 1: 1s (±25%)
  • Attempt 2: 2s (±25%)
  • Attempt 3: 4s (±25%)
  • Attempt 4: 8s (±25%)
  • Attempt 5: 16s (±25%)

Maximum total delay: ~31 seconds for 5 retries

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "AnalyzeAnalyze customercustomer feedbackfeedback and provideprovide sentiment analysissentiment analysis"
      }
    ],
    "orq": {
      "retries": {
        "count": 3,
        "on_codes": [429, 500, 502, 503, 504]
      }
    }
  }'
from openai import OpenAI
import os

openai = OpenAI(
  api_key=os.environ.get("ORQ_API_KEY"),
  base_url="https://api.orq.ai/v2/proxy"
)

response = openai.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "AnalyzeAnalyze customercustomer feedbackfeedback and provideprovide sentiment analysissentiment analysis"
        }
    ],
    extra_body={
        "orq": {
            "retries": {
                "count": 3,
                "on_codes": [429, 500, 502, 503, 504]
            }
        }
    }
)
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v2/proxy",
});

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content:
        "AnalyzeAnalyze customercustomer feedbackfeedback and provideprovide sentiment analysissentiment analysis",
    },
  ],
  orq: {
    retries: {
      count: 3,
      on_codes: [429, 500, 502, 503, 504],
    },
  },
});

Best Practices

Production recommendations

Follow the following advice for a best production setup:

  • Use count: 2-3 for balance of reliability and speed
  • Always include 429 (rate limits) in on_codes
  • Monitor retry rates to detect systemic issues
  • Implement circuit breaker for persistent failures

Error handling

try {
  const response = await openai.chat.completions.create({...});
} catch (error) {
  if (error.status === 400) {
    // Don't retry client errors - fix the request
    console.error('Bad request:', error.message);
  } else if (error.status >= 500) {
    // Server errors might need manual intervention
    console.error('Server error:', error.message);
  }
}

Troubleshooting

High retry rates
  • Check if you're hitting rate limits frequently
  • Verify API keys have sufficient quotas
  • Monitor provider status pages for outages
Slow response times
  • Reduce retry count for latency-sensitive apps
  • Use shorter timeout values with retries
  • Consider fallbacks for faster alternatives
Still getting errors
  • Check if error codes are in on_codes list
  • Verify retry count isn't exhausted
  • Review provider-specific error documentation

Monitoring

Track these retry metrics:

const retryMetrics = {
  totalRequests: 0,
  retriedRequests: 0,
  retriesByAttempt: { 1: 0, 2: 0, 3: 0 }, // Retry attempt distribution
  retriesByCode: { 429: 0, 500: 0 }, // By error code
  avgRetryLatency: 0, // Added latency from retries
  finalFailures: 0, // Requests that failed after all retries
};

Limitations

  • Increased latency: Retries add delay (up to 31s for 5 attempts)
  • Cost implications: Failed requests may still incur charges
  • Rate limit consumption: Each retry counts against quotas
  • Limited retries: Maximum 5 attempts to prevent excessive delays
  • Non-retryable errors: 4xx client errors are not retried

Advanced Usage

Environment-specific configs:

const retryConfig = {
  development: { count: 1, on_codes: [429] }, // Fast feedback
  staging: { count: 2, on_codes: [429, 503] }, // Light retries
  production: { count: 3, on_codes: [429, 500, 502, 503, 504] }, // Full protection
};

With other features:

{
  orq: {
    retries: {count: 3, on_codes: [429, 503]},
    timeout: {call_timeout: 10000},               // Timeout before retry
    fallbacks: [{model: "backup-model"}],         // If all retries fail
    cache: {type: "exact_match", ttl: 300}        // Cache successful results
  }
}

Custom retry logic (client-side):

const customRetry = async (requestFn, maxAttempts = 3) => {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await requestFn();
    } catch (error) {
      if (attempt === maxAttempts || error.status < 500) {
        throw error; // Final attempt or non-retryable error
      }
      await new Promise(
        (resolve) => setTimeout(resolve, Math.pow(2, attempt) * 1000), // Exponential backoff
      );
    }
  }
};