Cache
This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.
Quick Start
Cache identical requests to reduce latency by ~95% and save costs.
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [{ role: "user", content: "Explain renewable energy" }],
orq: {
cache: {
type: "exact_match",
ttl: 3600, // 1 hour
},
},
});
Configuration
Parameter | Type | Required | Description | Example |
---|---|---|---|---|
type | "exact_match" | Yes | Only supported cache type | "exact_match" |
ttl | number | Yes | Cache expiration in seconds | 3600 |
Cache Key: Generated from model + messages + all parameters. Identical requests share the same key.
TTL Recommendations
Use Case | TTL (seconds) | Reason |
---|---|---|
FAQ responses | 86400 (24h) | Static content |
Content generation | 3600 (1h) | Moderate freshness |
Development/testing | 300 (5min) | Rapid iteration |
Data analysis | 1800 (30min) | Balance speed/freshness |
Code examples
curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"messages": [
{
"role": "user",
"content": "Explain the benefits of renewable energy for businesses"
}
],
"orq": {
"cache": {
"type": "exact_match",
"ttl": 3600
}
}
}'
from openai import OpenAI
import os
openai = OpenAI(
api_key=os.environ.get("ORQ_API_KEY"),
base_url="https://api.orq.ai/v2/proxy"
)
response = openai.chat.completions.create(
model="openai/gpt-4o",
messages=[
{
"role": "user",
"content": "Explain the benefits of renewable energy for businesses"
}
],
extra_body={
"orq": {
"cache": {
"type": "exact_match",
"ttl": 3600
}
}
}
)
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.ORQ_API_KEY,
baseURL: "https://api.orq.ai/v2/proxy",
});
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content: "Explain the benefits of renewable energy for businesses",
},
],
orq: {
cache: {
type: "exact_match",
ttl: 3600,
},
},
});
Troubleshooting
Low cache hit rate
- Ensure identical parameters (temperature, max_tokens, etc.)
- Check TTL isn't too short for your use case
- Verify requests are truly identical (case-sensitive)
Cache not working
- Confirm
type: "exact_match"
is specified - Verify TTL is set (required parameter)
- Check response headers for cache status
Performance issues
- Use shorter TTL for dynamic content
- Consider cache warming for predictable requests
- Monitor cache hit/miss ratios
Limitations
- Exact match only: Any parameter change creates new cache key
- Case sensitive: "Hello" and "hello" are different cache keys
- No semantic matching: Similar but not identical requests won't match
- Storage limits: Very large responses consume more cache space
- TTL constraints: Minimum 1 second, maximum 86400 seconds (24 hours)
Best Practices
- Set TTL based on content freshness requirements
- Use cache for repeated, deterministic requests
- Monitor cache hit rates to optimize TTL values
- Avoid caching personalized or time-sensitive content
- Test cache behavior in development before production
Updated 4 days ago