Skip to main content

Quick Start

Cache identical requests to reduce latency by ~95% and save costs.
const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Explain renewable energy" }],
  cache: {
    type: "exact_match",
    ttl: 3600, // 1 hour (optional, default: 1800)
  },
});

Configuration

ParameterTypeRequiredDescriptionExample
type"exact_match"YesOnly supported cache type"exact_match"
ttlnumberNoCache expiration in seconds (default: 1800, max: 259200)3600
Cache Key: Generated from model + messages + all parameters. Identical requests share the same key.

TTL Recommendations

Use CaseTTL (seconds)Reason
FAQ responses86400 (24h)Static content
Content generation3600 (1h)Moderate freshness
Development/testing300 (5min)Rapid iteration
Data analysis1800 (30min)Balance speed/freshness

Code examples

curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Explain the benefits of renewable energy for businesses"
      }
    ],
    "cache": {
      "type": "exact_match",
      "ttl": 3600
    }
  }'

Troubleshooting

Low cache hit rate
  • Ensure identical parameters (temperature, max_tokens, etc.)
  • Check TTL isn’t too short for your use case
  • Verify requests are truly identical (case-sensitive)
Cache not working
  • Confirm type: "exact_match" is specified
  • Check response headers for cache status
Performance issues
  • Use shorter TTL for dynamic content
  • Consider cache warming for predictable requests
  • Monitor cache hit/miss ratios

Limitations

  • Exact match only: Any parameter change creates new cache key
  • Case sensitive: “Hello” and “hello” are different cache keys
  • No semantic matching: Similar but not identical requests won’t match
  • Storage limits: Very large responses consume more cache space
  • TTL constraints: Minimum 1 second, maximum 259200 seconds (3 days)

Best Practices

  • Set TTL based on content freshness requirements
  • Use cache for repeated, deterministic requests
  • Monitor cache hit rates to optimize TTL values
  • Avoid caching personalized or time-sensitive content
  • Test cache behavior in development before production