This page describes Anthropic’s Prompt Caching feature. To learn more about
Anthropic models, see Anthropic
Overview.
Quick Start
Cache frequently used context (system prompts, large documents, code bases) to reduce costs by up to 90% and latency by up to 85%.
curl -X POST https://api.orq.ai/v2/router/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-5-20250929",
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are an expert Python developer with deep knowledge of best practices.",
"cache_control": { "type": "ephemeral" }
}
]
},
{
"role": "user",
"content": "Write a function to parse JSON"
}
],
"max_tokens": 1024
}'
How It Works
Prompt caching stores frequently used content blocks on Anthropic’s servers for reuse across requests:
- Mark content for caching: Add
cache_control: { type: "ephemeral" } to text blocks
- First request: Content is processed normally and cached (cache write)
- Subsequent requests: Cached content is reused (cache read)
- Cache lifetime: 5 minutes from last use (automatically managed)
Configuration
Mark content blocks for caching by adding the cache_control parameter:
| Parameter | Type | Required | Description |
|---|
type | "ephemeral" | Yes | Only supported cache type |
ttl | "5m" | "1h" | No | Cache duration (default: "5m") |
Cache TTL Options
The ttl parameter controls how long cached content persists:
"5m" (5 minutes) - Default cache duration
"1h" (1 hour) - Extended cache duration for longer-running workflows
{
"cache_control": {
"type": "ephemeral",
"ttl": "1h"
}
}
The ttl option is only available for Anthropic Claude models.
Cache placement rules:
- Add
cache_control to the last message or content block you want cached
- Everything up to that point is included in the cache
- Minimum cacheable content: 1024 tokens (~800 words)
- Maximum: 4 cache breakpoints per request
Supported Models
Prompt caching is available on all current Claude Opus, Sonnet, and Haiku models.
For the complete list of supported models, see Anthropic’s official documentation.
Provider availability: All models supporting prompt caching are available through anthropic, aws, and google providers.
Use Cases
Static System Prompts
Cache role definitions and instructions that don’t change:
curl -X POST https://api.orq.ai/v2/router/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-5-20250929",
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are an expert software engineer specializing in Python.\nYour responses should be:\n- Clear and concise\n- Include code examples\n- Follow PEP 8 style guidelines\n- Include error handling",
"cache_control": { "type": "ephemeral" }
}
]
},
{
"role": "user",
"content": "How do I read a CSV file?"
}
],
"max_tokens": 1024
}'
Large Document Context
Cache documents, codebases, or knowledge bases:
curl -X POST https://api.orq.ai/v2/router/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-5-20250929",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Here is our API documentation:\n\n[Large documentation content here...]",
"cache_control": { "type": "ephemeral" }
},
{
"type": "text",
"text": "How do I authenticate with the API?"
}
]
}
],
"max_tokens": 1024
}'
Multi-turn Conversations
Cache conversation history for long interactions:
curl -X POST https://api.orq.ai/v2/router/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-5-20250929",
"messages": [
{
"role": "user",
"content": "What is Python?"
},
{
"role": "assistant",
"content": "Python is a high-level programming language..."
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are its main features?",
"cache_control": { "type": "ephemeral" }
}
]
},
{
"role": "assistant",
"content": "Pythons main features include..."
},
{
"role": "user",
"content": "Can you give me a code example?"
}
],
"max_tokens": 1024
}'
RAG with Document Collections
Cache retrieved documents for multiple queries:
curl -X POST https://api.orq.ai/v2/router/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-5-20250929",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that answers based on provided context."
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Context:\n[Retrieved document content here...]",
"cache_control": { "type": "ephemeral" }
},
{
"type": "text",
"text": "Question: What is the main topic of these documents?"
}
]
}
],
"max_tokens": 1024
}'