Skip to main content
Use Cases
  • Eliminating redundant costs on repeated identical queries (FAQs, product lookups).
  • Speeding up development and test loops by caching fixture requests.
  • Serving the same prompt to many concurrent users without paying per call.
  • Reducing tail latency on frequently-called endpoints.

Quick Start

Cache identical requests to reduce latency by ~95% and save costs.
curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{ "role": "user", "content": "Explain renewable energy" }],
    "cache": { "type": "exact_match", "ttl": 3600 }
  }'

Configuration

ParameterTypeRequiredDescriptionExample
type"exact_match"YesOnly supported cache type"exact_match"
ttlnumberNoCache expiration in seconds (default: 1800, max: 259200)3600
Cache Key: Generated from model + input + all parameters. Identical requests share the same key.

TTL Recommendations

Use CaseTTL (seconds)Reason
FAQ responses86400 (24h)Static content
Content generation3600 (1h)Moderate freshness
Development/testing300 (5min)Rapid iteration
Data analysis1800 (30min)Balance speed/freshness

Code examples

The examples below use the Chat Completions endpoint. The same cache parameter applies to the Responses API: replace chat.completions.create(...) with responses.create(...).
curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Explain the benefits of renewable energy for businesses"
      }
    ],
    "cache": {
      "type": "exact_match",
      "ttl": 3600
    }
  }'

Troubleshooting

Low cache hit rate
  • Ensure identical parameters (temperature, max_tokens, etc.).
  • Check TTL isn’t too short for your use case.
  • Verify requests are truly identical (case-sensitive).
Cache not working
  • Confirm type: "exact_match" is specified.
  • Check response headers for cache status.
Performance issues
  • Use shorter TTL for dynamic content.
  • Consider cache warming for predictable requests.
  • Monitor cache hit/miss ratios.

Limitations

  • Exact match only: Any parameter change creates new cache key.
  • Case sensitive: “Hello” and “hello” are different cache keys.
  • No semantic matching: Similar but not identical requests won’t match.
  • Storage limits: Very large responses consume more cache space.
  • TTL constraints: Minimum 1 second, maximum 259200 seconds (3 days).

Best Practices

  • Set TTL based on content freshness requirements.
  • Use cache for repeated, deterministic requests.
  • Monitor cache hit rates to optimize TTL values.
  • Avoid caching personalized or time-sensitive content.
  • Test cache behavior in development before production.