This page describes features extending the AI Gateway, which provides a unified API for accessing multiple AI providers. To learn more, see AI Gateway.
Quick Start
Automatically retry failed requests with exponential backoff.Configuration
| Parameter | Type | Required | Description |
|---|---|---|---|
count | number | Yes | Max retry attempts (1-5) |
on_codes | number[] | Yes | HTTP status codes that trigger retries |
Error Codes
| Code | Meaning | Retry? | Common Cause |
|---|---|---|---|
429 | Rate limit exceeded | ✅ Yes | Too many requests |
500 | Internal server error | ✅ Yes | Provider issue |
502 | Bad gateway | ✅ Yes | Network/Gateway issue |
503 | Service unavailable | ✅ Yes | Provider maintenance |
504 | Gateway timeout | ✅ Yes | Provider overload |
400 | Bad request | ❌ No | Invalid parameters |
401 | Unauthorized | ❌ No | Invalid API key |
403 | Forbidden | ❌ No | Access denied |
Retry Strategies
Backoff Algorithm
Exponential backoff with jitter
- Attempt 1: 1s (±25%)
- Attempt 2: 2s (±25%)
- Attempt 3: 4s (±25%)
- Attempt 4: 8s (±25%)
- Attempt 5: 16s (±25%)
Code examples
Best Practices
Production recommendations
Follow the following advice for a best production setup:- Use
count: 2-3for balance of reliability and speed - Always include
429(rate limits) inon_codes - Monitor retry rates to detect systemic issues
- Implement circuit breaker for persistent failures
Error handling
Troubleshooting
**High retry rates- Check if you’re hitting rate limits frequently
- Verify API keys have sufficient quotas
- Monitor provider status pages for outages
- Reduce retry count for latency-sensitive apps
- Use shorter timeout values with retries
- Consider fallbacks for faster alternatives
- Check if error codes are in
on_codeslist - Verify retry count isn’t exhausted
- Review provider-specific error documentation
Monitoring
Track these retry metrics:Limitations
- Increased latency: Retries add delay (up to 31s for 5 attempts)
- Cost implications: Failed requests may still incur charges
- Rate limit consumption: Each retry counts against quotas
- Limited retries: Maximum 5 attempts to prevent excessive delays
- Non-retryable errors: 4xx client errors are not retried