> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Prompt caching for reduced token costs

> Cache repeated prompt prefixes at the provider level to reduce input token costs and latency. Supported on Anthropic, OpenAI, and Google models.

## Overview

Prompt Caching is a provider-level feature that caches prompts so that **repeated requests** are charged at a reduced rate.

This is most effective when your requests share a **large, stable prefix**:

* a long system prompt
* a reference document
* a tool definition list

Unlike [Response Caching](/docs/proxy/cache), which serves a stored response for identical requests, Prompt Caching still calls the model on every request, at a reduced cost. Both can be used together.

## Supported Models

Prompt Caching is available on **[Anthropic](/docs/integrations/providers/anthropic)** models: Claude Haiku, Sonnet, and Opus.

## How to Enable Prompt Caching

Add a `cache_control` object to any message part you want to mark as cacheable:

```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
{
  "cache_control": { "type": "ephemeral" }
}
```

`"ephemeral"` is the only supported type. You can place it on:

* System message text parts
* User message text parts
* User message images, documents, and files (including PDFs)
* Tool result content

## Minimum Token Thresholds

Caching only activates once the marked content exceeds a minimum token count. Requests below the threshold are processed normally at full cost.

| Model                                                     | Minimum tokens |
| --------------------------------------------------------- | -------------- |
| Claude Opus 4.6, Opus 4.5                                 | 4,096          |
| Claude Sonnet 4.6                                         | 2,048          |
| Claude Sonnet 4.5, Opus 4.1, Opus 4, Sonnet 4, Sonnet 3.7 | 1,024          |
| Claude Haiku 4.5                                          | 4,096          |
| Claude Haiku 3.5, Haiku 3                                 | 2,048          |

## Cache TTL

The `ttl` parameter controls how long cached content persists before expiring.

| Value            | Duration                |
| ---------------- | ----------------------- |
| `"5m"` (default) | 5 minutes from last use |
| `"1h"`           | 1 hour                  |

```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
{
  "cache_control": {
    "type": "ephemeral",
    "ttl": "1h"
  }
}
```

## Example

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/chat/completions \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "anthropic/claude-sonnet-4-6",
      "messages": [
        {
          "role": "system",
          "content": [
            {
              "type": "text",
              "text": "You are a senior legal assistant. The following is our complete contract template library...",
              "cache_control": { "type": "ephemeral" }
            }
          ]
        },
        {
          "role": "user",
          "content": "Summarize clause 7 of the NDA template."
        }
      ]
    }'
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.chat.completions.create(
      model="anthropic/claude-sonnet-4-6",
      messages=[
          {
              "role": "system",
              "content": [
                  {
                      "type": "text",
                      "text": "You are a senior legal assistant. The following is our complete contract template library...",
                      "cache_control": {"type": "ephemeral"},
                  }
              ],
          },
          {"role": "user", "content": "Summarize clause 7 of the NDA template."},
      ],
  )
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "anthropic/claude-sonnet-4-6",
    messages: [
      {
        role: "system",
        content: [
          {
            type: "text",
            text: "You are a senior legal assistant. The following is our complete contract template library...",
            // @ts-ignore - cache_control is not part of the OpenAI SDK types
            cache_control: { type: "ephemeral" },
          },
        ],
      },
      { role: "user", content: "Summarize clause 7 of the NDA template." },
    ],
  });
  ```
</CodeGroup>

## Usage in the response

When a cache hit occurs, the response `usage` object reflects it under `prompt_tokens_details.cached_tokens`:

```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
{
  "usage": {
    "prompt_tokens": 1200,
    "completion_tokens": 180,
    "total_tokens": 1380,
    "prompt_tokens_details": {
      "cached_tokens": 1024
    }
  }
}
```
