This page describes features extending the AI Gateway, which provides a unified API for accessing multiple AI providers. To learn more, see AI Gateway.
Quick Start
Distribute requests across multiple providers using weighted routing.Configuration
| Parameter | Type | Required | Description |
|---|---|---|---|
load_balancer | Array | Yes | List of models with weights |
model | string | Yes | Model identifier |
weight | number | Yes | Relative weight (0.0 - 1.0) |
- Weights are normalized:
[0.4, 0.8]→[33%, 67%] - Higher weight = more traffic
- Minimum weight:
0.1(10%)
Common Patterns
Use Cases
| Scenario | Weight Strategy | Example |
|---|---|---|
| Cost optimization | Heavy on cheaper models | 80% GPT-3.5, 20% GPT-4 |
| Performance testing | Small traffic to new model | 95% current, 5% experimental |
| Provider redundancy | Split across providers | 60% OpenAI, 40% Anthropic |
| Capacity management | Distribute during peaks | Even split across models |
Code examples
Monitoring
Track these metrics for optimal load balancing:- Traffic distribution: Actual vs expected percentages
- Cost per model: Monitor spending across providers
- Response times: Compare latency by model
- Error rates: Track failures by provider
Troubleshooting
**Uneven distribution- Check if weights are normalized correctly
- Verify sufficient request volume (min 100 requests for accuracy)
- Monitor over longer time periods
- Track actual vs expected cost distribution
- Monitor for expensive model overuse
- Set up cost alerts per provider
- Check latency differences between models
- Monitor for provider-specific slowdowns
- Adjust weights based on performance data
Limitations
- Probabilistic routing: Short-term traffic may not match exact weights
- Minimum volume needed: Requires sufficient requests for statistical accuracy
- Response variations: Different models may return varying output quality
- Cost complexity: Managing billing across multiple providers
- Provider dependencies: Requires API access to all models