This page describes features extending the AI Gatewway, which provides a unified API for accessing multiple AI providers. To learn more, see AI Gateway.
Quick Start
Cache identical requests to reduce latency by ~95% and save costs.Configuration
| Parameter | Type | Required | Description | Example | 
|---|---|---|---|---|
| type | "exact_match" | Yes | Only supported cache type | "exact_match" | 
| ttl | number | Yes | Cache expiration in seconds | 3600 | 
TTL Recommendations
| Use Case | TTL (seconds) | Reason | 
|---|---|---|
| FAQ responses | 86400(24h) | Static content | 
| Content generation | 3600(1h) | Moderate freshness | 
| Development/testing | 300(5min) | Rapid iteration | 
| Data analysis | 1800(30min) | Balance speed/freshness | 
Code examples
Troubleshooting
**Low cache hit rate- Ensure identical parameters (temperature, max_tokens, etc.)
- Check TTL isn’t too short for your use case
- Verify requests are truly identical (case-sensitive)
- Confirm type: "exact_match"is specified
- Verify TTL is set (required parameter)
- Check response headers for cache status
- Use shorter TTL for dynamic content
- Consider cache warming for predictable requests
- Monitor cache hit/miss ratios
Limitations
- Exact match only: Any parameter change creates new cache key
- Case sensitive: “Hello” and “hello” are different cache keys
- No semantic matching: Similar but not identical requests won’t match
- Storage limits: Very large responses consume more cache space
- TTL constraints: Minimum 1 second, maximum 86400 seconds (24 hours)
Best Practices
- Set TTL based on content freshness requirements
- Use cache for repeated, deterministic requests
- Monitor cache hit rates to optimize TTL values
- Avoid caching personalized or time-sensitive content
- Test cache behavior in development before production