Skip to main content
This page describes Anthropic’s Extended Thinking feature. To learn more about Anthropic models, see Anthropic Overview.

Quick Start

Enable deep reasoning for complex problems by allocating token budget for internal analysis before generating responses.
curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-opus-4-5-20251101",
    "messages": [
      {
        "role": "user",
        "content": "Design a distributed rate limiting system for 1M requests/second"
      }
    ],
    "thinking": {
      "type": "enabled",
      "budget_tokens": 8000
    },
    "max_tokens": 2048
  }'

Multi-turn Conversations

When building chat applications with extended thinking, properly map the reasoning content with its signature for conversation history:
const messages = [
  { role: "user", content: "Design a rate limiting system" }
];

const response = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages,
  thinking: { type: "enabled", budget_tokens: 8000 },
  max_tokens: 2048
});

// Map response to assistant message
const msg = response.choices[0].message;
const contentParts = [];

// Add reasoning if present (flat field in response)
if (msg.reasoning) {
  contentParts.push({
    type: "reasoning",
    reasoning: msg.reasoning,
    signature: msg.reasoning_signature
  });
}

// Add redacted reasoning if present (flat field in response)
if (msg.redacted_reasoning) {
  contentParts.push({
    type: "redacted_reasoning",
    data: msg.redacted_reasoning
  });
}

// Add main text content
if (msg.content) {
  contentParts.push({
    type: "text",
    text: msg.content
  });
}

const assistantMessage = {
  role: "assistant",
  content: contentParts
};

// Continue conversation
messages.push(assistantMessage);
messages.push({ role: "user", content: "How would you handle 10M req/s?" });

const followUp = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages,
  thinking: { type: "enabled", budget_tokens: 8000 },
  max_tokens: 2048
});
Important: Always include the signature field when passing reasoning content back to the API. The signature cryptographically verifies the reasoning was generated by the model and is required for multi-turn conversations.
If the model’s reasoning is flagged by safety systems, it will be returned as redacted_reasoning (encrypted). Pass this back with type: "redacted_reasoning" so the model can decrypt and continue without losing context.

Configuration

ParameterTypeRequiredDescription
thinking.type"enabled"YesEnables extended thinking
thinking.budget_tokensnumberYesTokens allocated for thinking (1024-64000)
Budget recommendations:
  • 2000-5000: Code reviews, moderate analysis
  • 5000-10000: System design, strategic planning
  • 10000+: Research, complex multi-step reasoning
Important: Budget tokens are billed as output tokens and count toward total usage.

Supported Models

Extended thinking is available on Claude Opus 4.5 (recommended), Sonnet 4.5, and newer models. For the complete list, see Anthropic’s documentation. Provider availability: anthropic/, aws/, google/

When to Use

Best for:
  • System architecture and design decisions
  • Code optimization and performance analysis
  • Strategic planning with constraints
  • Complex mathematical or logical problems
  • Multi-step analysis requiring deep reasoning
Skip for:
  • Simple queries or factual questions
  • Time-sensitive operations (adds 2-4s per 1000 tokens)
  • High-volume, low-complexity tasks
  • Budget-constrained scenarios

Best Practices

Choose budget based on complexity:
const getBudget = (taskType: string) => ({
  code_review: 4000,
  system_design: 8000,
  research: 12000,
}[taskType] || 5000);
Use clear, structured prompts:
// Good: Encourages systematic thinking
"Analyze this architecture design, considering scalability, cost, and maintainability"

// Less effective: Vague request
"Look at this design"
Combine with prompt caching for repeated contexts:
const response = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages: [
    {
      role: "system",
      content: [{
        type: "text",
        text: "You are a system architect...", // Cache this
        cache_control: { type: "ephemeral" }
      }]
    },
    { role: "user", content: "Design a notification system" }
  ],
  thinking: { type: "enabled", budget_tokens: 8000 }
});