Access Claude models (Claude 4.5 Opus, Sonnet, Haiku) through the AI Router with advanced message APIs, tool use capabilities, and intelligent model routing. All Claude models are available with consistent formatting and pricing across multiple providers.
Claude models use the provider slug format: anthropic/model-name. For example: anthropic/claude-opus-4-5-20251101
Supported ModelsPrompt caching is available on all current Claude Opus, Sonnet, and Haiku models.For the complete list of supported models, see Anthropic’s official documentation.Provider availability All models supporting prompt caching are available through anthropic, aws, and google providers.Use Cases
Static System Prompts
Cache role definitions and instructions that don’t change.
Copy
curl -X POST https://api.orq.ai/v2/router/chat/completions \ -H "Authorization: Bearer $ORQ_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4-5-20250929", "messages": [ { "role": "system", "content": [ { "type": "text", "text": "You are an expert software engineer specializing in Python.\nYour responses should be:\n- Clear and concise\n- Include code examples\n- Follow PEP 8 style guidelines\n- Include error handling", "cache_control": { "type": "ephemeral" } } ] }, { "role": "user", "content": "How do I read a CSV file?" } ], "max_tokens": 1024 }'
Large Document Context
Cache documents, codebases, or knowledge bases for reuse across multiple queries.
Copy
curl -X POST https://api.orq.ai/v2/router/chat/completions \ -H "Authorization: Bearer $ORQ_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4-5-20250929", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Here is our API documentation:\n\n[Large documentation content here...]", "cache_control": { "type": "ephemeral" } }, { "type": "text", "text": "How do I authenticate with the API?" } ] } ], "max_tokens": 1024 }'
Multi-turn Conversations
Cache conversation history for long interactions to reduce processing time and costs on subsequent messages.
Copy
curl -X POST https://api.orq.ai/v2/router/chat/completions \ -H "Authorization: Bearer $ORQ_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4-5-20250929", "messages": [ { "role": "user", "content": "What is Python?" }, { "role": "assistant", "content": "Python is a high-level programming language..." }, { "role": "user", "content": [ { "type": "text", "text": "What are its main features?", "cache_control": { "type": "ephemeral" } } ] }, { "role": "assistant", "content": "Pythons main features include..." }, { "role": "user", "content": "Can you give me a code example?" } ], "max_tokens": 1024 }'
RAG with Document Collections
Cache retrieved documents for multiple queries in retrieval-augmented generation scenarios.
Copy
curl -X POST https://api.orq.ai/v2/router/chat/completions \ -H "Authorization: Bearer $ORQ_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4-5-20250929", "messages": [ { "role": "system", "content": "You are a helpful assistant that answers based on provided context." }, { "role": "user", "content": [ { "type": "text", "text": "Context:\n[Retrieved document content here...]", "cache_control": { "type": "ephemeral" } }, { "type": "text", "text": "Question: What is the main topic of these documents?" } ] } ], "max_tokens": 1024 }'
Important: Always include the signature field when passing reasoning content back to the API. The signature cryptographically verifies the reasoning was generated by the model and is required for multi-turn conversations.
Combine with prompt caching for repeated contexts
Cache system prompts and context to reduce costs and latency when using extended thinking:
Copy
const response = await openai.chat.completions.create({ model: "anthropic/claude-opus-4-5-20251101", messages: [ { role: "system", content: [{ type: "text", text: "You are a system architect...", // Cache this cache_control: { type: "ephemeral" } }] }, { role: "user", content: "Design a notification system" } ], thinking: { type: "enabled", budget_tokens: 8000 }});
Configuration & Best Practices
Aspect
Guidance
Details
thinking.type
Set to "enabled"
Enables extended thinking
thinking.budget_tokens
Set based on complexity
Min: 1024, must be < max_tokens. Billed as output tokens.
Supported Models: Extended thinking is available on Claude Opus 4.5 (recommended), Sonnet 4.5, and newer models. Available through anthropic/, aws/, and google/ providers. For the complete list, see Anthropic’s documentation.
Note: max_tokens is required for Anthropic models. Typical values: 1024 for responses, 4096+ for long content.
Do not use temperature and top_p together on newer Anthropic models. Using both parameters simultaneously will result in an API error. Choose one or the other.