Skip to main content
This page describes Anthropic’s Prompt Caching feature. To learn more about Anthropic models, see Anthropic Overview.

Quick Start

Cache frequently used context (system prompts, large documents, code bases) to reduce costs by up to 90% and latency by up to 85%.
curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "You are an expert Python developer with deep knowledge of best practices.",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "user",
        "content": "Write a function to parse JSON"
      }
    ],
    "max_tokens": 1024
  }'

How It Works

Prompt caching stores frequently used content blocks on Anthropic’s servers for reuse across requests:
  1. Mark content for caching: Add cache_control: { type: "ephemeral" } to text blocks
  2. First request: Content is processed normally and cached (cache write)
  3. Subsequent requests: Cached content is reused (cache read)
  4. Cache lifetime: 5 minutes from last use (automatically managed)

Configuration

Mark content blocks for caching by adding the cache_control parameter:
ParameterTypeRequiredDescription
type"ephemeral"YesOnly supported cache type
Cache placement rules:
  • Add cache_control to the last message or content block you want cached
  • Everything up to that point is included in the cache
  • Minimum cacheable content: 1024 tokens (~800 words)
  • Maximum: 4 cache breakpoints per request

Supported Models

Prompt caching is available on all current Claude Opus, Sonnet, and Haiku models. For the complete list of supported models, see Anthropic’s official documentation. Provider availability: All models supporting prompt caching are available through anthropic, aws, and google providers.

Use Cases

Static System Prompts

Cache role definitions and instructions that don’t change:
curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "You are an expert software engineer specializing in Python.\nYour responses should be:\n- Clear and concise\n- Include code examples\n- Follow PEP 8 style guidelines\n- Include error handling",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "user",
        "content": "How do I read a CSV file?"
      }
    ],
    "max_tokens": 1024
  }'

Large Document Context

Cache documents, codebases, or knowledge bases:
curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Here is our API documentation:\n\n[Large documentation content here...]",
            "cache_control": { "type": "ephemeral" }
          },
          {
            "type": "text",
            "text": "How do I authenticate with the API?"
          }
        ]
      }
    ],
    "max_tokens": 1024
  }'

Multi-turn Conversations

Cache conversation history for long interactions:
curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "user",
        "content": "What is Python?"
      },
      {
        "role": "assistant",
        "content": "Python is a high-level programming language..."
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What are its main features?",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "assistant",
        "content": "Pythons main features include..."
      },
      {
        "role": "user",
        "content": "Can you give me a code example?"
      }
    ],
    "max_tokens": 1024
  }'

RAG with Document Collections

Cache retrieved documents for multiple queries:
curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that answers based on provided context."
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Context:\n[Retrieved document content here...]",
            "cache_control": { "type": "ephemeral" }
          },
          {
            "type": "text",
            "text": "Question: What is the main topic of these documents?"
          }
        ]
      }
    ],
    "max_tokens": 1024
  }'