Skip to main content
This page describes features extending the AI Gateway, which provides a unified API for accessing multiple AI providers. To learn more, see AI Gateway.

Quick Start

Automatically retry failed requests with exponential backoff.
const response = await openai.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Analyze customer feedback" }],
  orq: {
    retries: {
      count: 3,
      on_codes: [429, 500, 502, 503, 504],
    },
  },
});

Configuration

ParameterTypeRequiredDescription
countnumberYesMax retry attempts (1-5)
on_codesnumber[]YesHTTP status codes that trigger retries

Error Codes

CodeMeaningRetry?Common Cause
429Rate limit exceeded✅ YesToo many requests
500Internal server error✅ YesProvider issue
502Bad gateway✅ YesNetwork/Gateway issue
503Service unavailable✅ YesProvider maintenance
504Gateway timeout✅ YesProvider overload
400Bad request❌ NoInvalid parameters
401Unauthorized❌ NoInvalid API key
403Forbidden❌ NoAccess denied

Retry Strategies

// Conservative (production)
retries: {
  count: 2,
  on_codes: [429, 503]  // Only rate limits and service unavailable
}

// Balanced (recommended)
retries: {
  count: 3,
  on_codes: [429, 500, 502, 503, 504]  // All transient errors
}

// Aggressive (development)
retries: {
  count: 5,
  on_codes: [429, 500, 502, 503, 504]  // Max retries
}

Backoff Algorithm

Exponential backoff with jitter

  • Attempt 1: 1s (±25%)
  • Attempt 2: 2s (±25%)
  • Attempt 3: 4s (±25%)
  • Attempt 4: 8s (±25%)
  • Attempt 5: 16s (±25%)
Maximum total delay: ~31 seconds for 5 retries

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "AnalyzeAnalyze customercustomer feedbackfeedback and provideprovide sentiment analysissentiment analysis"
      }
    ],
    "orq": {
      "retries": {
        "count": 3,
        "on_codes": [429, 500, 502, 503, 504]
      }
    }
  }'

Best Practices

Production recommendations

Follow the following advice for a best production setup:
  • Use count: 2-3 for balance of reliability and speed
  • Always include 429 (rate limits) in on_codes
  • Monitor retry rates to detect systemic issues
  • Implement circuit breaker for persistent failures

Error handling

try {
  const response = await openai.chat.completions.create({...});
} catch (error) {
  if (error.status === 400) {
    // Don't retry client errors - fix the request
    console.error('Bad request:', error.message);
  } else if (error.status >= 500) {
    // Server errors might need manual intervention
    console.error('Server error:', error.message);
  }
}

Troubleshooting

**High retry rates
  • Check if you’re hitting rate limits frequently
  • Verify API keys have sufficient quotas
  • Monitor provider status pages for outages
**Slow response times
  • Reduce retry count for latency-sensitive apps
  • Use shorter timeout values with retries
  • Consider fallbacks for faster alternatives
**Still getting errors
  • Check if error codes are in on_codes list
  • Verify retry count isn’t exhausted
  • Review provider-specific error documentation

Monitoring

Track these retry metrics:
const retryMetrics = {
  totalRequests: 0,
  retriedRequests: 0,
  retriesByAttempt: { 1: 0, 2: 0, 3: 0 }, // Retry attempt distribution
  retriesByCode: { 429: 0, 500: 0 }, // By error code
  avgRetryLatency: 0, // Added latency from retries
  finalFailures: 0, // Requests that failed after all retries
};

Limitations

  • Increased latency: Retries add delay (up to 31s for 5 attempts)
  • Cost implications: Failed requests may still incur charges
  • Rate limit consumption: Each retry counts against quotas
  • Limited retries: Maximum 5 attempts to prevent excessive delays
  • Non-retryable errors: 4xx client errors are not retried

Advanced Usage

Environment-specific configs:
const retryConfig = {
  development: { count: 1, on_codes: [429] }, // Fast feedback
  staging: { count: 2, on_codes: [429, 503] }, // Light retries
  production: { count: 3, on_codes: [429, 500, 502, 503, 504] }, // Full protection
};
With other features:
{
  orq: {
    retries: {count: 3, on_codes: [429, 503]},
    timeout: {call_timeout: 10000},               // Timeout before retry
    fallbacks: [{model: "backup-model"}],         // If all retries fail
    cache: {type: "exact_match", ttl: 300}        // Cache successful results
  }
}
Custom retry logic (client-side):
const customRetry = async (requestFn, maxAttempts = 3) => {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await requestFn();
    } catch (error) {
      if (attempt === maxAttempts || error.status < 500) {
        throw error; // Final attempt or non-retryable error
      }
      await new Promise(
        (resolve) => setTimeout(resolve, Math.pow(2, attempt) * 1000), // Exponential backoff
      );
    }
  }
};