Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

This page describes features extending the AI Gatewway, which provides a unified API for accessing multiple AI providers. To learn more, see AI Gateway.

Quick Start

Cache identical requests to reduce latency by ~95% and save costs.

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Explain renewable energy" }],
  orq: {
    cache: {
      type: "exact_match",
      ttl: 3600, // 1 hour
    },
  },
});

Configuration

Parameter	Type	Required	Description	Example
`type`	`"exact_match"`	Yes	Only supported cache type	`"exact_match"`
`ttl`	number	Yes	Cache expiration in seconds	`3600`

Cache Key: Generated from model + messages + all parameters. Identical requests share the same key.

TTL Recommendations

Use Case	TTL (seconds)	Reason
FAQ responses	`86400` (24h)	Static content
Content generation	`3600` (1h)	Moderate freshness
Development/testing	`300` (5min)	Rapid iteration
Data analysis	`1800` (30min)	Balance speed/freshness

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Explain the benefits of renewable energy for businesses"
      }
    ],
    "orq": {
      "cache": {
          "type": "exact_match",
          "ttl": 3600
      }
    }
  }'

Troubleshooting

**Low cache hit rate

Ensure identical parameters (temperature, max_tokens, etc.)
Check TTL isn’t too short for your use case
Verify requests are truly identical (case-sensitive)

**Cache not working

Confirm type: "exact_match" is specified
Verify TTL is set (required parameter)
Check response headers for cache status

**Performance issues

Use shorter TTL for dynamic content
Consider cache warming for predictable requests
Monitor cache hit/miss ratios

Limitations

Exact match only: Any parameter change creates new cache key
Case sensitive: “Hello” and “hello” are different cache keys
No semantic matching: Similar but not identical requests won’t match
Storage limits: Very large responses consume more cache space
TTL constraints: Minimum 1 second, maximum 86400 seconds (24 hours)

Best Practices

Set TTL based on content freshness requirements
Use cache for repeated, deterministic requests
Monitor cache hit rates to optimize TTL values
Avoid caching personalized or time-sensitive content
Test cache behavior in development before production

Getting Started

Reference

Admin

Cache

Quick Start

Configuration

TTL Recommendations

Code examples

Troubleshooting

Limitations

Best Practices

Getting Started

Reference

Admin

​Quick Start

​Configuration

​TTL Recommendations

​Code examples

​Troubleshooting

​Limitations

​Best Practices

Quick Start

Configuration

TTL Recommendations

Code examples

Troubleshooting

Limitations

Best Practices