This page describes Anthropic’s Prompt Caching feature. To learn more about
Anthropic models, see Anthropic
Overview.
Quick Start
Cache frequently used context (system prompts, large documents, code bases) to reduce costs by up to 90% and latency by up to 85%.How It Works
Prompt caching stores frequently used content blocks on Anthropic’s servers for reuse across requests:- Mark content for caching: Add
cache_control: { type: "ephemeral" }to text blocks - First request: Content is processed normally and cached (cache write)
- Subsequent requests: Cached content is reused (cache read)
- Cache lifetime: 5 minutes from last use (automatically managed)
Configuration
Mark content blocks for caching by adding thecache_control parameter:
| Parameter | Type | Required | Description |
|---|---|---|---|
type | "ephemeral" | Yes | Only supported cache type |
- Add
cache_controlto the last message or content block you want cached - Everything up to that point is included in the cache
- Minimum cacheable content: 1024 tokens (~800 words)
- Maximum: 4 cache breakpoints per request
Supported Models
Prompt caching is available on all current Claude Opus, Sonnet, and Haiku models. For the complete list of supported models, see Anthropic’s official documentation. Provider availability: All models supporting prompt caching are available throughanthropic, aws, and google providers.