Overview
Prompt Caching is a provider-level feature that caches prompts so that repeated requests are charged at a reduced rate. This is most effective when your requests share a large, stable prefix:- a long system prompt
- a reference document
- a tool definition list
Supported Models
Prompt Caching is available on Anthropic models: Claude Haiku, Sonnet, and Opus.How to Enable Prompt Caching
Add acache_control object to any message part you want to mark as cacheable:
"ephemeral" is the only supported type. You can place it on:
- System message text parts
- User message text parts
- User message images, documents, and files (including PDFs)
- Tool result content
Minimum Token Thresholds
Caching only activates once the marked content exceeds a minimum token count. Requests below the threshold are processed normally at full cost.| Model | Minimum tokens |
|---|---|
| Claude Opus 4.6, Opus 4.5 | 4,096 |
| Claude Sonnet 4.6 | 2,048 |
| Claude Sonnet 4.5, Opus 4.1, Opus 4, Sonnet 4, Sonnet 3.7 | 1,024 |
| Claude Haiku 4.5 | 4,096 |
| Claude Haiku 3.5, Haiku 3 | 2,048 |
Cache TTL
Thettl parameter controls how long cached content persists before expiring.
| Value | Duration |
|---|---|
"5m" (default) | 5 minutes from last use |
"1h" | 1 hour |
Example
Usage in the response
When a cache hit occurs, the responseusage object reflects it under prompt_tokens_details.cached_tokens: