Retries & Error Handling
This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.
Quick Start
Automatically retry failed requests with exponential backoff.
const response = await openai.chat.completions.create({
model: "openai/gpt-4o-mini",
messages: [{ role: "user", content: "Analyze customer feedback" }],
orq: {
retries: {
count: 3,
on_codes: [429, 500, 502, 503, 504],
},
},
});
Configuration
Parameter | Type | Required | Description |
---|---|---|---|
count | number | Yes | Max retry attempts (1-5) |
on_codes | number[] | Yes | HTTP status codes that trigger retries |
Error Codes
Code | Meaning | Retry? | Common Cause |
---|---|---|---|
429 | Rate limit exceeded | ✅ Yes | Too many requests |
500 | Internal server error | ✅ Yes | Provider issue |
502 | Bad gateway | ✅ Yes | Network/proxy issue |
503 | Service unavailable | ✅ Yes | Provider maintenance |
504 | Gateway timeout | ✅ Yes | Provider overload |
400 | Bad request | ❌ No | Invalid parameters |
401 | Unauthorized | ❌ No | Invalid API key |
403 | Forbidden | ❌ No | Access denied |
Retry Strategies
// Conservative (production)
retries: {
count: 2,
on_codes: [429, 503] // Only rate limits and service unavailable
}
// Balanced (recommended)
retries: {
count: 3,
on_codes: [429, 500, 502, 503, 504] // All transient errors
}
// Aggressive (development)
retries: {
count: 5,
on_codes: [429, 500, 502, 503, 504] // Max retries
}
Backoff Algorithm
Exponential backoff with jitter
- Attempt 1: 1s (±25%)
- Attempt 2: 2s (±25%)
- Attempt 3: 4s (±25%)
- Attempt 4: 8s (±25%)
- Attempt 5: 16s (±25%)
Maximum total delay: ~31 seconds for 5 retries
Code examples
curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "AnalyzeAnalyze customercustomer feedbackfeedback and provideprovide sentiment analysissentiment analysis"
}
],
"orq": {
"retries": {
"count": 3,
"on_codes": [429, 500, 502, 503, 504]
}
}
}'
from openai import OpenAI
import os
openai = OpenAI(
api_key=os.environ.get("ORQ_API_KEY"),
base_url="https://api.orq.ai/v2/proxy"
)
response = openai.chat.completions.create(
model="openai/gpt-4o",
messages=[
{
"role": "user",
"content": "AnalyzeAnalyze customercustomer feedbackfeedback and provideprovide sentiment analysissentiment analysis"
}
],
extra_body={
"orq": {
"retries": {
"count": 3,
"on_codes": [429, 500, 502, 503, 504]
}
}
}
)
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.ORQ_API_KEY,
baseURL: "https://api.orq.ai/v2/proxy",
});
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content:
"AnalyzeAnalyze customercustomer feedbackfeedback and provideprovide sentiment analysissentiment analysis",
},
],
orq: {
retries: {
count: 3,
on_codes: [429, 500, 502, 503, 504],
},
},
});
Best Practices
Production recommendations
Follow the following advice for a best production setup:
- Use
count: 2-3
for balance of reliability and speed - Always include
429
(rate limits) inon_codes
- Monitor retry rates to detect systemic issues
- Implement circuit breaker for persistent failures
Error handling
try {
const response = await openai.chat.completions.create({...});
} catch (error) {
if (error.status === 400) {
// Don't retry client errors - fix the request
console.error('Bad request:', error.message);
} else if (error.status >= 500) {
// Server errors might need manual intervention
console.error('Server error:', error.message);
}
}
Troubleshooting
High retry rates
- Check if you're hitting rate limits frequently
- Verify API keys have sufficient quotas
- Monitor provider status pages for outages
Slow response times
- Reduce retry count for latency-sensitive apps
- Use shorter timeout values with retries
- Consider fallbacks for faster alternatives
Still getting errors
- Check if error codes are in
on_codes
list - Verify retry count isn't exhausted
- Review provider-specific error documentation
Monitoring
Track these retry metrics:
const retryMetrics = {
totalRequests: 0,
retriedRequests: 0,
retriesByAttempt: { 1: 0, 2: 0, 3: 0 }, // Retry attempt distribution
retriesByCode: { 429: 0, 500: 0 }, // By error code
avgRetryLatency: 0, // Added latency from retries
finalFailures: 0, // Requests that failed after all retries
};
Limitations
- Increased latency: Retries add delay (up to 31s for 5 attempts)
- Cost implications: Failed requests may still incur charges
- Rate limit consumption: Each retry counts against quotas
- Limited retries: Maximum 5 attempts to prevent excessive delays
- Non-retryable errors: 4xx client errors are not retried
Advanced Usage
Environment-specific configs:
const retryConfig = {
development: { count: 1, on_codes: [429] }, // Fast feedback
staging: { count: 2, on_codes: [429, 503] }, // Light retries
production: { count: 3, on_codes: [429, 500, 502, 503, 504] }, // Full protection
};
With other features:
{
orq: {
retries: {count: 3, on_codes: [429, 503]},
timeout: {call_timeout: 10000}, // Timeout before retry
fallbacks: [{model: "backup-model"}], // If all retries fail
cache: {type: "exact_match", ttl: 300} // Cache successful results
}
}
Custom retry logic (client-side):
const customRetry = async (requestFn, maxAttempts = 3) => {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return await requestFn();
} catch (error) {
if (attempt === maxAttempts || error.status < 500) {
throw error; // Final attempt or non-retryable error
}
await new Promise(
(resolve) => setTimeout(resolve, Math.pow(2, attempt) * 1000), // Exponential backoff
);
}
}
};
Updated 4 days ago