Anthropic Claude integration - Orq.ai Documentation

Setup API Key

To use Anthropic with Orq.ai, follow these steps:

Navigate to AI Gateway > BYOK
Find Anthropic in the list
Click the Configure button next to Anthropic
In the modal that opens, select Setup your own API Key
Enter a name for this configuration (e.g., “Anthropic Production”)
Paste your Anthropic API Key into the provided field
Click Save to complete the setup

Your Anthropic API key is now configured and ready to use through the AI Gateway.

Quick Start

Access Anthropic’s Claude models through the AI Gateway.

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "input": "Explain quantum computing in simple terms"
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.responses.create({
  model: "anthropic/claude-sonnet-4-6",
  input: "Explain quantum computing in simple terms",
});

console.log(response.output_text);

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.responses.create(
    model="anthropic/claude-sonnet-4-6",
    input="Explain quantum computing in simple terms",
)

print(response.output_text)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [
    {
      role: "user",
      content: "Explain quantum computing in simple terms",
    },
  ],
  max_tokens: 1024,
});

console.log(response.choices[0].message.content);

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const stream = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [{ role: "user", content: "Tell me a story" }],
  max_tokens: 2048,
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[
        {
            "role": "user",
            "content": "Explain quantum computing in simple terms",
        }
    ],
    max_tokens=1024,
)

print(response.choices[0].message.content)

Available Models

Orq supports all Anthropic Claude models across multiple providers for optimal availability and pricing:

Latest Models

Model	Context	Strengths	Best For
`claude-opus-4-8`	1M	Latest Opus, highest intelligence	Coding, agentic tasks, complex reasoning
`claude-opus-4-7`	1M	Highest intelligence, extra-high reasoning effort	Coding, agentic tasks, complex reasoning
`claude-opus-4-6`	1M	High intelligence	Complex reasoning, research
`claude-sonnet-4-6`	1M	Best balance	Most tasks, coding
`claude-haiku-4-5-20251001`	200K	Fast responses	Simple tasks, chat

Provider Options

Anthropic models are available through multiple providers:

anthropic/: Direct Anthropic API
aws/: AWS Bedrock (enterprise features)
google/: Google Vertex AI (GCP integration)

// Use these model strings inside your responses.create() or chat.completions.create() call

// Direct Anthropic
model: "anthropic/claude-sonnet-4-6"

// AWS Bedrock
model: "aws/anthropic/claude-sonnet-4-6"

// Google Vertex AI
model: "google/anthropic/claude-opus-4-6"

For a complete list of supported models, see Supported Models.

Using the AI Gateway

Access Claude models (Claude 4.6 Opus, Sonnet, and Claude 4.5 Haiku) through the AI Gateway with advanced message APIs, tool use capabilities, and intelligent model routing. All Claude models are available with consistent formatting and pricing across multiple providers.

Claude models use the provider slug format: anthropic/model-name. For example: anthropic/claude-sonnet-4-6

Prerequisites

Before making requests to the AI Gateway, configure the environment and install the SDKs if you choose to use them. Endpoint

POST https://api.orq.ai/v3/router/responses

Required Headers Include the following headers in all requests:

Authorization: Bearer $ORQ_API_KEY
Content-Type: application/json

Getting an API Key:

Go to API Keys
Click Create API Key and copy it
Store it in your environment as ORQ_API_KEY

SDK Installation Install the OpenAI SDK:

npm install openai
# or
yarn add openai

pip install openai

Basic Usage

If the existing OpenAI code is already functioning, change only the base_url and api_key to the AI Gateway endpoint and ORQ_API_KEY.

Chat Completion

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "input": "Explain quantum computing in simple terms"
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.responses.create({
  model: "anthropic/claude-sonnet-4-6",
  input: "Explain quantum computing in simple terms",
});

console.log(response.output_text);

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.responses.create(
    model="anthropic/claude-sonnet-4-6",
    input="Explain quantum computing in simple terms",
)

print(response.output_text)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [
    {
      role: "system",
      content: [
        {
          type: "text",
          text: "You are an expert Python developer with deep knowledge of best practices.",
          cache_control: { type: "ephemeral" },
        },
      ],
    },
    {
      role: "user",
      content: "Write a function to parse JSON",
    },
  ],
  max_tokens: 1024,
});

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}],
    max_tokens=1024,
)

print(response.choices[0].message.content)

from anthropic import Anthropic
import os

client = Anthropic(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

message = client.messages.create(
    model="anthropic/claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}],
)

print(message.content[0].text)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const message = await client.messages.create({
  model: "anthropic/claude-sonnet-4-6",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Explain quantum computing in simple terms" }],
});

console.log(message.content[0].text);

Streaming

Stream responses for real-time output instead of waiting for the complete response:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const stream = await client.responses.create({
  model: "anthropic/claude-sonnet-4-6",
  input: "Tell me a story",
  stream: true,
});

for await (const event of stream) {
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
}

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

stream = client.responses.create(
    model="anthropic/claude-sonnet-4-6",
    input="Tell me a story",
    stream=True,
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const stream = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [{ role: "user", content: "Tell me a story" }],
  max_tokens: 2048,
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Tell me a story"}],
    max_tokens=2048,
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Advanced Usage

Prompt Caching

Prompt caching is supported on the Chat Completions endpoint (/v3/router/chat/completions). The examples below use Chat Completions tabs.

For a full guide, see Prompt Caching. Cache frequently used context (system prompts, large documents, code bases) to reduce costs by up to 90% and latency by up to 85%.

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "You are an expert Python developer with deep knowledge of best practices.",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "user",
        "content": "Write a function to parse JSON"
      }
    ],
    "max_tokens": 1024
  }'

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get('ORQ_API_KEY'),
    base_url='https://api.orq.ai/v3/router'
)

response = client.chat.completions.create(
    model='anthropic/claude-sonnet-4-6',
    messages=[
        {
            'role': 'system',
            'content': [
                {
                    'type': 'text',
                    'text': 'You are an expert Python developer with deep knowledge of best practices.',
                    'cache_control': {'type': 'ephemeral'}
                }
            ]
        },
        {
            'role': 'user',
            'content': 'Write a function to parse JSON'
        }
    ],
    max_tokens=1024
)

How It Works Prompt caching stores frequently used content blocks on Anthropic’s servers for reuse across requests:

Mark content for caching: Add cache_control: { type: "ephemeral" } to text blocks
First request: Content is processed normally and cached (cache write)
Subsequent requests: Cached content is reused (cache read)
Cache lifetime: 5 minutes from last use (automatically managed)

Configuration Mark content blocks for caching by adding the cache_control parameter:

Parameter	Type	Required	Description
`type`	`"ephemeral"`	Yes	Only supported cache type
`ttl`	`"5m"` \| `"1h"`	No	Cache duration (default: `"5m"`)

Cache TTL Options The ttl parameter controls how long cached content persists:

"5m" (5 minutes): Default cache duration
"1h" (1 hour): Extended cache duration for longer-running workflows

{
  "cache_control": {
    "type": "ephemeral",
    "ttl": "1h"
  }
}

Cache placement rules

Add cache_control to the last message or content block you want cached
Everything up to that point is included in the cache
Maximum: 4 cache breakpoints per request

Minimum token thresholds Caching only activates once the marked content meets the model’s minimum. Requests below the threshold are processed normally at full cost.

Model	Minimum tokens
Claude Opus 4.6, Opus 4.5	4,096
Claude Sonnet 4.6	2,048
Claude Sonnet 4.5, Opus 4.1, Opus 4, Sonnet 4, Sonnet 3.7	1,024
Claude Haiku 4.5	4,096
Claude Haiku 3.5, Haiku 3	2,048

Use Cases

Static System Prompts

Cache role definitions and instructions that don’t change.

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "You are an expert software engineer specializing in Python.\nYour responses should be:\n- Clear and concise\n- Include code examples\n- Follow PEP 8 style guidelines\n- Include error handling",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "user",
        "content": "How do I read a CSV file?"
      }
    ],
    "max_tokens": 1024
  }'

    from openai import OpenAI
    import os

    client = OpenAI(
        api_key=os.environ.get('ORQ_API_KEY'),
        base_url='https://api.orq.ai/v3/router'
    )

    system_prompt = """You are an expert software engineer specializing in Python.
    Your responses should be:
    - Clear and concise
    - Include code examples
    - Follow PEP 8 style guidelines
    - Include error handling"""

    response = client.chat.completions.create(
        model='anthropic/claude-sonnet-4-6',
        messages=[
            {
                'role': 'system',
                'content': [
                    {
                        'type': 'text',
                        'text': system_prompt,
                        'cache_control': {'type': 'ephemeral'}
                    }
                ]
            },
            {
                'role': 'user',
                'content': 'How do I read a CSV file?'
            }
        ],
        max_tokens=1024
    )

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: 'anthropic/claude-sonnet-4-6',
  messages: [
    {
      role: 'system',
      content: [
        {
          type: 'text',
          text: `You are an expert software engineer specializing in Python.
Your responses should be:
- Clear and concise
- Include code examples
- Follow PEP 8 style guidelines
- Include error handling`,
          cache_control: { type: 'ephemeral' },
        },
      ],
    },
    {
      role: 'user',
      content: 'How do I read a CSV file?',
    },
  ],
  max_tokens: 1024,
});

Large Document Context

Cache documents, codebases, or knowledge bases for reuse across multiple queries.

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Here is our API documentation:\n\n[Large documentation content here...]",
            "cache_control": { "type": "ephemeral" }
          },
          {
            "type": "text",
            "text": "How do I authenticate with the API?"
          }
        ]
      }
    ],
    "max_tokens": 1024
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const apiDocs = "Your API documentation content goes here...";

const response = await client.chat.completions.create({
  model: 'anthropic/claude-sonnet-4-6',
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'text',
          text: 'Here is our API documentation:\n\n' + apiDocs,
          cache_control: { type: 'ephemeral' },
        },
        {
          type: 'text',
          text: 'How do I authenticate with the API?',
        },
      ],
    },
  ],
  max_tokens: 1024,
});

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get('ORQ_API_KEY'),
    base_url='https://api.orq.ai/v3/router'
)

api_docs = "Your large API documentation content goes here..."

response = client.chat.completions.create(
    model='anthropic/claude-sonnet-4-6',
    messages=[
        {
            'role': 'user',
            'content': [
                {
                    'type': 'text',
                    'text': f'Here is our API documentation:\n\n{api_docs}',
                    'cache_control': {'type': 'ephemeral'}
                },
                {
                    'type': 'text',
                    'text': 'How do I authenticate with the API?'
                }
            ]
        }
    ],
    max_tokens=1024
)

Multi-turn Conversations

Cache conversation history for long interactions to reduce processing time and costs on subsequent messages.

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      {
        "role": "user",
        "content": "What is Python?"
      },
      {
        "role": "assistant",
        "content": "Python is a high-level programming language..."
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What are its main features?",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "assistant",
        "content": "Python's main features include..."
      },
      {
        "role": "user",
        "content": "Can you give me a code example?"
      }
    ],
    "max_tokens": 1024
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const conversationHistory = [
  { role: 'user', content: 'What is Python?' },
  { role: 'assistant', content: 'Python is a high-level...' },
  { role: 'user', content: 'What are its main features?' },
  { role: 'assistant', content: "Python's main features include..." },
];

const lastHistoryMessage = conversationHistory[conversationHistory.length - 1];

const response = await client.chat.completions.create({
  model: 'anthropic/claude-sonnet-4-6',
  messages: [
    ...conversationHistory.slice(0, -1),
    {
      ...lastHistoryMessage,
      content: [
        {
          type: 'text',
          text: lastHistoryMessage.content,
          cache_control: { type: 'ephemeral' },
        },
      ],
    },
    { role: 'user', content: 'Can you give me a code example?' },
  ],
  max_tokens: 1024,
});

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get('ORQ_API_KEY'),
    base_url='https://api.orq.ai/v3/router'
)

conversation_history = [
    {'role': 'user', 'content': 'What is Python?'},
    {'role': 'assistant', 'content': 'Python is a high-level...'},
    {'role': 'user', 'content': 'What are its main features?'},
    {'role': 'assistant', 'content': 'Python\'s main features include...'},
]

# Mark last history message for caching
last_message = conversation_history[-1]
messages = conversation_history[:-1] + [
    {
        'role': last_message['role'],
        'content': [
            {
                'type': 'text',
                'text': last_message['content'],
                'cache_control': {'type': 'ephemeral'}
            }
        ]
    },
    {
        'role': 'user',
        'content': 'Can you give me a code example?'
    }
]

response = client.chat.completions.create(
    model='anthropic/claude-sonnet-4-6',
    messages=messages,
    max_tokens=1024
)

RAG with Document Collections

Cache retrieved documents for multiple queries in retrieval-augmented generation scenarios.

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that answers based on provided context."
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Context:\n[Retrieved document content here...]",
            "cache_control": { "type": "ephemeral" }
          },
          {
            "type": "text",
            "text": "Question: What is the main topic of these documents?"
          }
        ]
      }
    ],
    "max_tokens": 1024
  }'

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get('ORQ_API_KEY'),
    base_url='https://api.orq.ai/v3/router'
)

user_question = "What is the main topic of these documents?"
context_text = "Retrieved document content goes here..."

response = client.chat.completions.create(
    model='anthropic/claude-sonnet-4-6',
    messages=[
        {
            'role': 'system',
            'content': 'You are a helpful assistant that answers based on provided context.'
        },
        {
            'role': 'user',
            'content': [
                {
                    'type': 'text',
                    'text': f'Context:\n{context_text}',
                    'cache_control': {'type': 'ephemeral'}
                },
                {
                    'type': 'text',
                    'text': f'Question: {user_question}'
                }
            ]
        }
    ],
    max_tokens=1024
)

Extended Thinking

Enable deep reasoning for complex problems by allocating token budget for internal analysis before generating responses.

Extended thinking uses the thinking parameter, which is only supported via the Chat Completions endpoint (POST /v3/router/chat/completions). Use the Chat Completions tabs below.

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-opus-4-6",
    "messages": [
      {
        "role": "user",
        "content": "Design a distributed rate limiting system for 1M requests/second"
      }
    ],
    "thinking": {
      "type": "enabled",
      "budget_tokens": 8000
    },
    "max_tokens": 16000
  }'

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get('ORQ_API_KEY'),
    base_url='https://api.orq.ai/v3/router'
)

response = client.chat.completions.create(
    model='anthropic/claude-opus-4-6',
    messages=[
        {
            'role': 'user',
            'content': 'Design a distributed rate limiting system for 1M requests/second'
        }
    ],
    extra_body={
        'thinking': {
            'type': 'enabled',
            'budget_tokens': 8000
        }
    },
    max_tokens=16000
)

Multi-turn Extended Thinking

Include reasoning content with its signature when continuing conversations:

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-opus-4-6",
    "messages": [
      {"role": "user", "content": "Design a rate limiting system"},
      {
        "role": "assistant",
        "content": [
          {
            "type": "reasoning",
            "reasoning": "...",
            "signature": "..."
          },
          {
            "type": "text",
            "text": "Here'\''s a distributed rate limiting design..."
          }
        ]
      },
      {"role": "user", "content": "How would you handle 10M req/s?"}
    ],
    "thinking": {"type": "enabled", "budget_tokens": 8000},
    "max_tokens": 16000
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-opus-4-6",
  messages: [
    {
      role: "user",
      content: "Design a distributed rate limiting system for 1M requests/second",
    },
  ],
  thinking: {
    type: "adaptive",
  },
  max_tokens: 16000,
});

console.log(response.choices[0].message.content);

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

messages = [
  {"role": "user", "content": "Design a rate limiting system"}
]

response = client.chat.completions.create(
  model='anthropic/claude-opus-4-6',
  messages=messages,
  extra_body={
    'thinking': {
      'type': 'enabled',
      'budget_tokens': 8000
    }
  },
  max_tokens=16000
)

msg = response.choices[0].message
content_parts = []

if getattr(msg, 'reasoning', None):
  content_parts.append({
    "type": "reasoning",
    "reasoning": msg.reasoning,
    "signature": getattr(msg, 'reasoning_signature', None)
  })

if getattr(msg, 'redacted_reasoning', None):
  content_parts.append({
    "type": "redacted_reasoning",
    "data": msg.redacted_reasoning
  })

if msg.content:
  content_parts.append({
    "type": "text",
    "text": msg.content
  })

assistant_message = {
  "role": "assistant",
  "content": content_parts
}

messages.append(assistant_message)
messages.append({"role": "user", "content": "How would you handle 10M req/s?"})

follow_up = client.chat.completions.create(
  model='anthropic/claude-opus-4-6',
  messages=messages,
  extra_body={
    'thinking': {
      'type': 'enabled',
      'budget_tokens': 8000
    }
  },
  max_tokens=16000
)

Important: Always include the signature field when passing reasoning content back to the API. The signature cryptographically verifies the reasoning was generated by the model and is required for multi-turn conversations.

Combine with prompt caching for repeated contexts

Cache system prompts and context to reduce costs and latency when using extended thinking:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-opus-4-6",
  messages: [
    {
      role: "system",
      content: [{
        type: "text",
        text: "You are a system architect...", // Cache this
        cache_control: { type: "ephemeral" }
      }]
    },
    { role: "user", content: "Design a notification system" }
  ],
  max_tokens: 16000,
  thinking: { type: "enabled", budget_tokens: 8000 }
});

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.chat.completions.create(
    model="anthropic/claude-opus-4-6",
    messages=[
        {
            "role": "system",
            "content": [{
                "type": "text",
                "text": "You are a system architect...",
                "cache_control": {"type": "ephemeral"}
            }]
        },
        {"role": "user", "content": "Design a notification system"}
    ],
    max_tokens=16000,
    extra_body={"thinking": {"type": "enabled", "budget_tokens": 8000}}
)

Configuration & Best Practices

Aspect	Guidance	Details
`thinking.type`	Set to `"enabled"`	Enables extended thinking with manual budget
`thinking.budget_tokens`	Set based on complexity	Min: 1024, must be < `max_tokens`. Billed as output tokens.

Supported Models: Extended thinking with budget_tokens is available on Claude Opus 4.5, Sonnet 4.5, and newer models. For Claude Opus 4.6 and Sonnet 4.6, consider using adaptive thinking instead (see below). Available through anthropic/, aws/, and google/ providers.

Reasoning models

Configure thinking.budget_tokens and other extended thinking controls for Claude through the AI Gateway.

Adaptive Thinking

Adaptive thinking is the recommended way to use extended thinking with Claude Opus 4.6 and Sonnet 4.6. Instead of manually setting a thinking token budget, adaptive thinking lets Claude dynamically determine when and how much to think based on the complexity of each request.

Adaptive thinking uses the thinking parameter, which is only supported via the Chat Completions endpoint (POST /v3/router/chat/completions). Use the Chat Completions tabs below.

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-opus-4-6",
    "messages": [
      {
        "role": "user",
        "content": "Design a distributed rate limiting system for 1M requests/second"
      }
    ],
    "thinking": {
      "type": "adaptive"
    },
    "max_tokens": 16000
  }'

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get('ORQ_API_KEY'),
    base_url='https://api.orq.ai/v3/router'
)

response = client.chat.completions.create(
    model='anthropic/claude-opus-4-6',
    messages=[
        {
            'role': 'user',
            'content': 'Design a distributed rate limiting system for 1M requests/second'
        }
    ],
    extra_body={
        'thinking': {
            'type': 'adaptive'
        }
    },
    max_tokens=16000
)

print(response.choices[0].message.content)

Adaptive vs Manual Thinking

Mode	Config	When to use
Adaptive	`thinking: { type: "adaptive" }`	Recommended for Claude 4.6 models. Claude determines thinking depth automatically.
Manual	`thinking: { type: "enabled", budget_tokens: N }`	When you need precise control over thinking token spend. Supported on all thinking-capable models.
Disabled	Omit `thinking` parameter	When you don’t need extended thinking and want the lowest latency.

Supported Models: Adaptive thinking is available on Claude Opus 4.6 and Claude Sonnet 4.6 only. Older models (Opus 4.5, Sonnet 4.5, etc.) require type: "enabled" with budget_tokens.

Vision Capabilities

All Claude 3+ models support image analysis with high accuracy. Choose between URL-based or base64-encoded images:

Image from URL

Use images from URLs for remote files:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: { url: "https://example.com/image.jpg" }
        },
      ],
    },
  ],
});

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/image.jpg"},
                },
            ],
        }
    ],
)

Image from Base64

Embed images directly as base64-encoded strings:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const imageBase64 = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==";

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: { url: `data:image/png;base64,${imageBase64}` }
        },
      ],
    },
  ],
});

from openai import OpenAI
import os
import base64

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

image_base64 = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{image_base64}"},
                },
            ],
        }
    ],
)

PDF Input

The examples in this section use the Chat Completions endpoint. For the Responses API equivalent, use openai.responses.create() with POST /v3/router/responses and adapt the message structure to the Responses API input format.

Claude Opus 4.6 supports direct PDF analysis:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-opus-4-6",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Summarize this document" },
        {
          type: "document",
          document: {
            type: "pdf",
            url: "https://example.com/document.pdf"
          }
        },
      ],
    },
  ],
  max_tokens: 2048,
});

Multimodal

Full reference for image input, PDF input, image generation, and audio through the AI Gateway.

Tool Use (Function Calling)

Claude excels at tool use with sophisticated planning and execution.

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.responses.create({
  model: "anthropic/claude-sonnet-4-6",
  input: "What's the weather in Tokyo?",
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get current weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: { type: "string" },
        },
        required: ["location"],
      },
    },
  ],
});

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.responses.create(
    model="anthropic/claude-sonnet-4-6",
    input="What's the weather in Tokyo?",
    tools=[
        {
            "type": "function",
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                },
                "required": ["location"],
            },
        }
    ],
)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [{ role: "user", content: "What's the weather in Tokyo?" }],
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get current weather for a location",
        parameters: {
          type: "object",
          properties: {
            location: { type: "string" },
          },
          required: ["location"],
        },
      },
    },
  ],
});

console.log(response.choices[0].message);

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string"},
                    },
                    "required": ["location"],
                },
            },
        }
    ],
)

Tool Calling

Full reference for function tools, tool_choice, and streaming with tool calls through the AI Gateway.

Multi-provider strategy

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.responses.create({
  model: "anthropic/claude-sonnet-4-6",
  input: "...",
  fallbacks: [
    { model: "aws/anthropic/claude-sonnet-4-6" },
    { model: "anthropic/claude-opus-4-6" },
  ],
});

console.log(response.output_text);

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.responses.create(
    model="anthropic/claude-sonnet-4-6",
    input="...",
    extra_body={
        "fallbacks": [
            {"model": "aws/anthropic/claude-sonnet-4-6"},
            {"model": "anthropic/claude-opus-4-6"},
        ]
    },
)

print(response.output_text)

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "..."}],
    extra_body={
        "fallbacks": [
            {"model": "aws/anthropic/claude-sonnet-4-6"},
            {"model": "anthropic/claude-opus-4-6"},
        ]
    },
)

print(response.choices[0].message.content)

Configuration

Model Parameters

Parameter	Type	Description	Default
`max_tokens`	number	Maximum tokens to generate (required)	-
`temperature`	number	Randomness (0-1)	1
`top_p`	number	Nucleus sampling (0-1)	-
`top_k`	number	Top-K sampling	-
`stop_sequences`	string[]	Custom stop sequences	-

Note: max_tokens is required for Anthropic models. Typical values: 1024 for responses, 4096+ for long content.

Do not use temperature and top_p together on newer Anthropic models. Using both parameters simultaneously will result in an API error. Choose one or the other.

Token Management

// Set appropriate max_tokens based on task
const getMaxTokens = (taskType: string) => {
  const limits = {
    chat: 1024,
    summary: 500,
    generation: 4096,
    analysis: 2048,
  };
  return limits[taskType as keyof typeof limits] ?? 1024;
};

Troubleshooting

Issue	Problem	Solution
Missing `max_tokens`	Anthropic models require `max_tokens` parameter	Add `max_tokens: 1024` (or appropriate value) to your request
High costs	Token usage accumulates quickly on large requests	Enable prompt caching for repeated context, use smaller models (Haiku) for simple tasks, monitor and optimize token usage
Rate limits	Anthropic has tiered rate limits based on usage	Use Orq’s automatic retries and fallbacks, or consider AWS/Google providers for higher limits

Limitations

max_tokens required: Unlike OpenAI, must specify maximum output length
Rate limits: Vary by tier and provider
Context window: 200K tokens (may vary by provider)
System prompts: Handled differently than OpenAI (automatically converted by Orq)

Reference

Claude Cowork

The Orq.ai AI Gateway is compatible with Claude Cowork’s third-party inference mode. Route Cowork traffic through Orq.ai to get EU data residency, provider fallbacks, and cost control without changing the Cowork interface.

Claude Cowork

Set up Orq.ai as a Cowork third-party inference gateway.

​Setup API Key

​Quick Start

​Available Models

​Latest Models

​Provider Options

​Using the AI Gateway

​Prerequisites

​Basic Usage

​Chat Completion

​Streaming

​Advanced Usage

​Prompt Caching

​Extended Thinking

Reasoning models

​Adaptive Thinking

​Vision Capabilities

​PDF Input

Multimodal

​Tool Use (Function Calling)

Tool Calling

​Multi-provider strategy

​Configuration

​Model Parameters

​Token Management

​Troubleshooting

​Limitations

​Reference

​Claude Cowork

Claude Cowork

Setup API Key

Quick Start

Available Models

Latest Models

Provider Options

Using the AI Gateway

Prerequisites

Basic Usage

Chat Completion

Streaming

Advanced Usage

Prompt Caching

Extended Thinking

Adaptive Thinking

Vision Capabilities

PDF Input

Tool Use (Function Calling)

Multi-provider strategy

Configuration

Model Parameters

Token Management

Troubleshooting

Limitations

Reference

Claude Cowork