Cache

📖
This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.

Quick Start

Cache identical requests to reduce latency by ~95% and save costs.

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Explain renewable energy" }],
  orq: {
    cache: {
      type: "exact_match",
      ttl: 3600, // 1 hour
    },
  },
});

Configuration

Parameter	Type	Required	Description	Example
`type`	`"exact_match"`	Yes	Only supported cache type	`"exact_match"`
`ttl`	number	Yes	Cache expiration in seconds	`3600`

Cache Key: Generated from model + messages + all parameters. Identical requests share the same key.

TTL Recommendations

Use Case	TTL (seconds)	Reason
FAQ responses	`86400` (24h)	Static content
Content generation	`3600` (1h)	Moderate freshness
Development/testing	`300` (5min)	Rapid iteration
Data analysis	`1800` (30min)	Balance speed/freshness

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Explain the benefits of renewable energy for businesses"
      }
    ],
    "orq": {
      "cache": {
          "type": "exact_match",
          "ttl": 3600
      }
    }
  }'

from openai import OpenAI
import os

openai = OpenAI(
  api_key=os.environ.get("ORQ_API_KEY"),
  base_url="https://api.orq.ai/v2/proxy"
)

response = openai.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "Explain the benefits of renewable energy for businesses"
        }
    ],
    extra_body={
        "orq": {
            "cache": {
                "type": "exact_match",
                "ttl": 3600
            }
        }
    }
)

import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v2/proxy",
});

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content: "Explain the benefits of renewable energy for businesses",
    },
  ],
  orq: {
    cache: {
      type: "exact_match",
      ttl: 3600,
    },
  },
});

Troubleshooting

Low cache hit rate

Ensure identical parameters (temperature, max_tokens, etc.)
Check TTL isn't too short for your use case
Verify requests are truly identical (case-sensitive)

Cache not working

Confirm type: "exact_match" is specified
Verify TTL is set (required parameter)
Check response headers for cache status

Performance issues

Use shorter TTL for dynamic content
Consider cache warming for predictable requests
Monitor cache hit/miss ratios

Limitations

Exact match only: Any parameter change creates new cache key
Case sensitive: "Hello" and "hello" are different cache keys
No semantic matching: Similar but not identical requests won't match
Storage limits: Very large responses consume more cache space
TTL constraints: Minimum 1 second, maximum 86400 seconds (24 hours)

Best Practices

Set TTL based on content freshness requirements
Use cache for repeated, deterministic requests
Monitor cache hit rates to optimize TTL values
Avoid caching personalized or time-sensitive content
Test cache behavior in development before production