Skip to main content
This page describes features extending the AI Gatewway, which provides a unified API for accessing multiple AI providers. To learn more, see AI Gateway.

Quick Start

Cache identical requests to reduce latency by ~95% and save costs.
const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Explain renewable energy" }],
  orq: {
    cache: {
      type: "exact_match",
      ttl: 3600, // 1 hour
    },
  },
});

Configuration

ParameterTypeRequiredDescriptionExample
type"exact_match"YesOnly supported cache type"exact_match"
ttlnumberYesCache expiration in seconds3600
Cache Key: Generated from model + messages + all parameters. Identical requests share the same key.

TTL Recommendations

Use CaseTTL (seconds)Reason
FAQ responses86400 (24h)Static content
Content generation3600 (1h)Moderate freshness
Development/testing300 (5min)Rapid iteration
Data analysis1800 (30min)Balance speed/freshness

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Explain the benefits of renewable energy for businesses"
      }
    ],
    "orq": {
      "cache": {
          "type": "exact_match",
          "ttl": 3600
      }
    }
  }'

Troubleshooting

**Low cache hit rate
  • Ensure identical parameters (temperature, max_tokens, etc.)
  • Check TTL isn’t too short for your use case
  • Verify requests are truly identical (case-sensitive)
**Cache not working
  • Confirm type: "exact_match" is specified
  • Verify TTL is set (required parameter)
  • Check response headers for cache status
**Performance issues
  • Use shorter TTL for dynamic content
  • Consider cache warming for predictable requests
  • Monitor cache hit/miss ratios

Limitations

  • Exact match only: Any parameter change creates new cache key
  • Case sensitive: “Hello” and “hello” are different cache keys
  • No semantic matching: Similar but not identical requests won’t match
  • Storage limits: Very large responses consume more cache space
  • TTL constraints: Minimum 1 second, maximum 86400 seconds (24 hours)

Best Practices

  • Set TTL based on content freshness requirements
  • Use cache for repeated, deterministic requests
  • Monitor cache hit rates to optimize TTL values
  • Avoid caching personalized or time-sensitive content
  • Test cache behavior in development before production