Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

Quick Start

Cache identical requests to reduce latency by ~95% and save costs.

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Explain renewable energy" }],
  cache: {
    type: "exact_match",
    ttl: 3600, // 1 hour (optional, default: 1800)
  },
});

Configuration

Parameter	Type	Required	Description	Example
`type`	`"exact_match"`	Yes	Only supported cache type	`"exact_match"`
`ttl`	number	No	Cache expiration in seconds (default: 1800, max: 259200)	`3600`

Cache Key: Generated from model + messages + all parameters. Identical requests share the same key.

TTL Recommendations

Use Case	TTL (seconds)	Reason
FAQ responses	`86400` (24h)	Static content
Content generation	`3600` (1h)	Moderate freshness
Development/testing	`300` (5min)	Rapid iteration
Data analysis	`1800` (30min)	Balance speed/freshness

Code examples

curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Explain the benefits of renewable energy for businesses"
      }
    ],
    "cache": {
      "type": "exact_match",
      "ttl": 3600
    }
  }'

Troubleshooting

Low cache hit rate

Ensure identical parameters (temperature, max_tokens, etc.)
Check TTL isn’t too short for your use case
Verify requests are truly identical (case-sensitive)

Cache not working

Confirm type: "exact_match" is specified
Check response headers for cache status

Performance issues

Use shorter TTL for dynamic content
Consider cache warming for predictable requests
Monitor cache hit/miss ratios

Limitations

Exact match only: Any parameter change creates new cache key
Case sensitive: “Hello” and “hello” are different cache keys
No semantic matching: Similar but not identical requests won’t match
Storage limits: Very large responses consume more cache space
TTL constraints: Minimum 1 second, maximum 259200 seconds (3 days)

Best Practices

Set TTL based on content freshness requirements
Use cache for repeated, deterministic requests
Monitor cache hit rates to optimize TTL values
Avoid caching personalized or time-sensitive content
Test cache behavior in development before production

AI & Execution

Access & Security

AI Router Features

API Reference

LLM Response Caching | Reduce Latency by 95%

Quick Start

Configuration

TTL Recommendations

Code examples

Troubleshooting

Limitations

Best Practices

AI & Execution

Access & Security

AI Router Features

API Reference

​Quick Start

​Configuration

​TTL Recommendations

​Code examples

​Troubleshooting

​Limitations

​Best Practices

Quick Start

Configuration

TTL Recommendations

Code examples

Troubleshooting

Limitations

Best Practices