This page describes features extending the AI Gateway, which provides a unified API for accessing multiple AI providers. To learn more, see AI Gateway.
Quick Start
Automatically retry failed requests with exponential backoff.Configuration
| Parameter | Type | Required | Description | 
|---|---|---|---|
| count | number | Yes | Max retry attempts (1-5) | 
| on_codes | number[] | Yes | HTTP status codes that trigger retries | 
Error Codes
| Code | Meaning | Retry? | Common Cause | 
|---|---|---|---|
| 429 | Rate limit exceeded | ✅ Yes | Too many requests | 
| 500 | Internal server error | ✅ Yes | Provider issue | 
| 502 | Bad gateway | ✅ Yes | Network/Gateway issue | 
| 503 | Service unavailable | ✅ Yes | Provider maintenance | 
| 504 | Gateway timeout | ✅ Yes | Provider overload | 
| 400 | Bad request | ❌ No | Invalid parameters | 
| 401 | Unauthorized | ❌ No | Invalid API key | 
| 403 | Forbidden | ❌ No | Access denied | 
Retry Strategies
Backoff Algorithm
Exponential backoff with jitter
- Attempt 1: 1s (±25%)
- Attempt 2: 2s (±25%)
- Attempt 3: 4s (±25%)
- Attempt 4: 8s (±25%)
- Attempt 5: 16s (±25%)
Code examples
Best Practices
Production recommendations
Follow the following advice for a best production setup:- Use count: 2-3for balance of reliability and speed
- Always include 429(rate limits) inon_codes
- Monitor retry rates to detect systemic issues
- Implement circuit breaker for persistent failures
Error handling
Troubleshooting
**High retry rates- Check if you’re hitting rate limits frequently
- Verify API keys have sufficient quotas
- Monitor provider status pages for outages
- Reduce retry count for latency-sensitive apps
- Use shorter timeout values with retries
- Consider fallbacks for faster alternatives
- Check if error codes are in on_codeslist
- Verify retry count isn’t exhausted
- Review provider-specific error documentation
Monitoring
Track these retry metrics:Limitations
- Increased latency: Retries add delay (up to 31s for 5 attempts)
- Cost implications: Failed requests may still incur charges
- Rate limit consumption: Each retry counts against quotas
- Limited retries: Maximum 5 attempts to prevent excessive delays
- Non-retryable errors: 4xx client errors are not retried