> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# LLM response caching

> Cache identical LLM requests to reduce latency by 95% and cut API costs. Configure TTL, exact match caching, and optimize response times for repeated queries.

## Quick Start

Cache identical requests to reduce latency by \~95% and save costs.

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  const response = await openai.chat.completions.create({
    model: "openai/gpt-4o",
    messages: [{ role: "user", content: "Explain renewable energy" }],
    cache: {
      type: "exact_match",
      ttl: 3600, // 1 hour (optional, default: 1800)
    },
  });
  ```
</CodeGroup>

## Configuration

| Parameter | Type            | Required | Description                                              | Example         |
| --------- | --------------- | -------- | -------------------------------------------------------- | --------------- |
| `type`    | `"exact_match"` | Yes      | Only supported cache type                                | `"exact_match"` |
| `ttl`     | number          | No       | Cache expiration in seconds (default: 1800, max: 259200) | `3600`          |

**Cache Key**: Generated from model + messages + all parameters. Identical requests share the same key.

## TTL Recommendations

| Use Case            | TTL (seconds)  | Reason                  |
| ------------------- | -------------- | ----------------------- |
| FAQ responses       | `86400` (24h)  | Static content          |
| Content generation  | `3600` (1h)    | Moderate freshness      |
| Development/testing | `300` (5min)   | Rapid iteration         |
| Data analysis       | `1800` (30min) | Balance speed/freshness |

## Code examples

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/chat/completions \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/gpt-4o",
      "messages": [
        {
          "role": "user",
          "content": "Explain the benefits of renewable energy for businesses"
        }
      ],
      "cache": {
        "type": "exact_match",
        "ttl": 3600
      }
    }'
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  openai = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router"
  )

  response = openai.chat.completions.create(
      model="openai/gpt-4o",
      messages=[
          {
              "role": "user",
              "content": "Explain the benefits of renewable energy for businesses"
          }
      ],
      extra_body={
          "cache": {
              "type": "exact_match",
              "ttl": 3600
          }
      }
  )
  ```

  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const openai = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await openai.chat.completions.create({
    model: "openai/gpt-4o",
    messages: [
      {
        role: "user",
        content: "Explain the benefits of renewable energy for businesses",
      },
    ],
    cache: {
      type: "exact_match",
      ttl: 3600,
    },
  });
  ```
</CodeGroup>

## Troubleshooting

**Low cache hit rate**

* Ensure identical parameters (temperature, max\_tokens, etc.)
* Check TTL isn't too short for your use case
* Verify requests are truly identical (case-sensitive)

**Cache not working**

* Confirm `type: "exact_match"` is specified
* Check response headers for cache status

**Performance issues**

* Use shorter TTL for dynamic content
* Consider cache warming for predictable requests
* Monitor cache hit/miss ratios

## Limitations

* **Exact match only**: Any parameter change creates new cache key
* **Case sensitive**: "Hello" and "hello" are different cache keys
* **No semantic matching**: Similar but not identical requests won't match
* **Storage limits**: Very large responses consume more cache space
* **TTL constraints**: Minimum 1 second, maximum 259200 seconds (3 days)

## Best Practices

* Set TTL based on content freshness requirements
* Use cache for repeated, deterministic requests
* Monitor cache hit rates to optimize TTL values
* Avoid caching personalized or time-sensitive content
* Test cache behavior in development before production
