Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

This page describes Anthropic’s Extended Thinking feature. To learn more about Anthropic models, see Anthropic Overview.

Quick Start

Enable deep reasoning for complex problems by allocating token budget for internal analysis before generating responses.

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-opus-4-5-20251101",
    "messages": [
      {
        "role": "user",
        "content": "Design a distributed rate limiting system for 1M requests/second"
      }
    ],
    "thinking": {
      "type": "enabled",
      "budget_tokens": 8000
    },
    "max_tokens": 2048
  }'

Multi-turn Conversations

When building chat applications with extended thinking, properly map the reasoning content with its signature for conversation history:

const messages = [
  { role: "user", content: "Design a rate limiting system" }
];

const response = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages,
  thinking: { type: "enabled", budget_tokens: 8000 },
  max_tokens: 2048
});

// Map response to assistant message
const msg = response.choices[0].message;
const contentParts = [];

// Add reasoning if present (flat field in response)
if (msg.reasoning) {
  contentParts.push({
    type: "reasoning",
    reasoning: msg.reasoning,
    signature: msg.reasoning_signature
  });
}

// Add redacted reasoning if present (flat field in response)
if (msg.redacted_reasoning) {
  contentParts.push({
    type: "redacted_reasoning",
    data: msg.redacted_reasoning
  });
}

// Add main text content
if (msg.content) {
  contentParts.push({
    type: "text",
    text: msg.content
  });
}

const assistantMessage = {
  role: "assistant",
  content: contentParts
};

// Continue conversation
messages.push(assistantMessage);
messages.push({ role: "user", content: "How would you handle 10M req/s?" });

const followUp = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages,
  thinking: { type: "enabled", budget_tokens: 8000 },
  max_tokens: 2048
});

Important: Always include the signature field when passing reasoning content back to the API. The signature cryptographically verifies the reasoning was generated by the model and is required for multi-turn conversations.

If the model’s reasoning is flagged by safety systems, it will be returned as redacted_reasoning (encrypted). Pass this back with type: "redacted_reasoning" so the model can decrypt and continue without losing context.

Configuration

Parameter	Type	Required	Description
`thinking.type`	`"enabled"`	Yes	Enables extended thinking
`thinking.budget_tokens`	number	Yes	Tokens allocated for thinking (1024-64000)

Budget recommendations:

2000-5000: Code reviews, moderate analysis
5000-10000: System design, strategic planning
10000+: Research, complex multi-step reasoning

Important: Budget tokens are billed as output tokens and count toward total usage.

Supported Models

Extended thinking is available on Claude Opus 4.5 (recommended), Sonnet 4.5, and newer models. For the complete list, see Anthropic’s documentation. Provider availability: anthropic/, aws/, google/

When to Use

Best for:

System architecture and design decisions
Code optimization and performance analysis
Strategic planning with constraints
Complex mathematical or logical problems
Multi-step analysis requiring deep reasoning

Skip for:

Simple queries or factual questions
Time-sensitive operations (adds 2-4s per 1000 tokens)
High-volume, low-complexity tasks
Budget-constrained scenarios

Best Practices

Choose budget based on complexity:

const getBudget = (taskType: string) => ({
  code_review: 4000,
  system_design: 8000,
  research: 12000,
}[taskType] || 5000);

Use clear, structured prompts:

// Good: Encourages systematic thinking
"Analyze this architecture design, considering scalability, cost, and maintainability"

// Less effective: Vague request
"Look at this design"

Combine with prompt caching for repeated contexts:

const response = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages: [
    {
      role: "system",
      content: [{
        type: "text",
        text: "You are a system architect...", // Cache this
        cache_control: { type: "ephemeral" }
      }]
    },
    { role: "user", content: "Design a notification system" }
  ],
  thinking: { type: "enabled", budget_tokens: 8000 }
});

Getting Started

Reference

Administer

Extended Thinking

Quick Start

Multi-turn Conversations

Configuration

Supported Models

When to Use

Best Practices

Getting Started

Reference

Administer

​Quick Start

​Multi-turn Conversations

​Configuration

​Supported Models

​When to Use

​Best Practices

Quick Start

Multi-turn Conversations

Configuration

Supported Models

When to Use

Best Practices