- Preventing slow models from blocking user-facing requests indefinitely.
- Setting different limits for interactive (short) vs. batch (long) workloads.
- Triggering fallback logic when a provider exceeds an acceptable wait time.
- Enforcing response-time SLAs on latency-sensitive features.
Quick Start
Set maximum request duration to prevent hanging requests.Configuration
| Parameter | Type | Required | Description |
|---|---|---|---|
call_timeout | number | Yes | Maximum execution time in milliseconds |
- Request processing time.
- Model generation time.
- Network transfer time.
- All fallback attempts (each gets same timeout).
Recommended Values
| Use Case | Timeout (ms) | Reason |
|---|---|---|
| Chat applications | 15000 (15s) | User expectation for responses |
| Real-time features | 5000 (5s) | Immediate feedback required |
| Batch processing | 60000 (60s) | Complex analysis tasks |
| Streaming responses | 30000 (30s) | Longer generation time |
| Development/testing | 10000 (10s) | Fast iteration cycles |
Code examples
Error Handling
Best Practices
Timeout selection:- Set based on user experience requirements.
- Consider model complexity and prompt length.
- Factor in network latency (add 2-5s buffer).
- Test with realistic prompts and data. Environment-specific timeouts:
Fallback Integration
Timeouts work seamlessly with fallbacks:timeout × (1 + fallback_count)
- Primary + 2 fallbacks with 15s timeout = up to 45s total.
Troubleshooting
Frequent timeouts- Increase timeout value.
- Use faster models (gpt-3.5-turbo vs gpt-4).
- Reduce prompt complexity/length.
- Check provider status for slowdowns. User experience issues
- Set timeout based on user expectations.
- Show loading states for longer operations.
- Implement progressive enhancement.
- Consider async processing for long tasks. Performance optimization
Advanced Patterns
Dynamic timeout adjustment:Limitations
- Fixed timeout: Same timeout applies to all requests.
- No granular control: Cannot set different timeouts for different operations.
- Fallback multiplication: Each fallback gets the same timeout duration.
- Provider variations: Different providers have different baseline response times.
- Streaming considerations: Streaming responses may need longer timeouts.
Monitoring
Key metrics to track:- Timeout rate: % of requests that timeout.
- Average response time: Baseline performance.
- 95th percentile: For setting optimal timeouts.
- Timeout impact: User experience degradation.
- Model performance: Response times by model.