Cache

📖

This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.

Quick Start

Cache identical requests to reduce latency by ~95% and save costs.

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Explain renewable energy" }],
  orq: {
    cache: {
      type: "exact_match",
      ttl: 3600, // 1 hour
    },
  },
});

Configuration

ParameterTypeRequiredDescriptionExample
type"exact_match"YesOnly supported cache type"exact_match"
ttlnumberYesCache expiration in seconds3600

Cache Key: Generated from model + messages + all parameters. Identical requests share the same key.

TTL Recommendations

Use CaseTTL (seconds)Reason
FAQ responses86400 (24h)Static content
Content generation3600 (1h)Moderate freshness
Development/testing300 (5min)Rapid iteration
Data analysis1800 (30min)Balance speed/freshness

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Explain the benefits of renewable energy for businesses"
      }
    ],
    "orq": {
      "cache": {
          "type": "exact_match",
          "ttl": 3600
      }
    }
  }'
from openai import OpenAI
import os

openai = OpenAI(
  api_key=os.environ.get("ORQ_API_KEY"),
  base_url="https://api.orq.ai/v2/proxy"
)

response = openai.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "Explain the benefits of renewable energy for businesses"
        }
    ],
    extra_body={
        "orq": {
            "cache": {
                "type": "exact_match",
                "ttl": 3600
            }
        }
    }
)
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v2/proxy",
});

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content: "Explain the benefits of renewable energy for businesses",
    },
  ],
  orq: {
    cache: {
      type: "exact_match",
      ttl: 3600,
    },
  },
});

Troubleshooting

Low cache hit rate
  • Ensure identical parameters (temperature, max_tokens, etc.)
  • Check TTL isn't too short for your use case
  • Verify requests are truly identical (case-sensitive)
Cache not working
  • Confirm type: "exact_match" is specified
  • Verify TTL is set (required parameter)
  • Check response headers for cache status
Performance issues
  • Use shorter TTL for dynamic content
  • Consider cache warming for predictable requests
  • Monitor cache hit/miss ratios

Limitations

  • Exact match only: Any parameter change creates new cache key
  • Case sensitive: "Hello" and "hello" are different cache keys
  • No semantic matching: Similar but not identical requests won't match
  • Storage limits: Very large responses consume more cache space
  • TTL constraints: Minimum 1 second, maximum 86400 seconds (24 hours)

Best Practices

  • Set TTL based on content freshness requirements
  • Use cache for repeated, deterministic requests
  • Monitor cache hit rates to optimize TTL values
  • Avoid caching personalized or time-sensitive content
  • Test cache behavior in development before production