Skip to main content

Setup API Key

To use Anthropic with Orq.ai, follow these steps:
  1. Navigate to AI Gateway > BYOK
  2. Find Anthropic in the list
  3. Click the Configure button next to Anthropic
  4. In the modal that opens, select Setup your own API Key
  5. Enter a name for this configuration (e.g., “Anthropic Production”)
  6. Paste your Anthropic API Key into the provided field
  7. Click Save to complete the setup
Your Anthropic API key is now configured and ready to use through the AI Gateway.

Quick Start

Access Anthropic’s Claude models through the AI Gateway.
curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "input": "Explain quantum computing in simple terms"
  }'

Available Models

Orq supports all Anthropic Claude models across multiple providers for optimal availability and pricing:

Latest Models

ModelContextStrengthsBest For
claude-opus-4-81MLatest Opus, highest intelligenceCoding, agentic tasks, complex reasoning
claude-opus-4-71MHighest intelligence, extra-high reasoning effortCoding, agentic tasks, complex reasoning
claude-opus-4-61MHigh intelligenceComplex reasoning, research
claude-sonnet-4-61MBest balanceMost tasks, coding
claude-haiku-4-5-20251001200KFast responsesSimple tasks, chat

Provider Options

Anthropic models are available through multiple providers:
  • anthropic/: Direct Anthropic API
  • aws/: AWS Bedrock (enterprise features)
  • google/: Google Vertex AI (GCP integration)
// Use these model strings inside your responses.create() or chat.completions.create() call

// Direct Anthropic
model: "anthropic/claude-sonnet-4-6"

// AWS Bedrock
model: "aws/anthropic/claude-sonnet-4-6"

// Google Vertex AI
model: "google/anthropic/claude-opus-4-6"
For a complete list of supported models, see Supported Models.

Using the AI Gateway

Access Claude models (Claude 4.6 Opus, Sonnet, and Claude 4.5 Haiku) through the AI Gateway with advanced message APIs, tool use capabilities, and intelligent model routing. All Claude models are available with consistent formatting and pricing across multiple providers.
Claude models use the provider slug format: anthropic/model-name. For example: anthropic/claude-sonnet-4-6

Prerequisites

Before making requests to the AI Gateway, configure the environment and install the SDKs if you choose to use them. Endpoint
POST https://api.orq.ai/v3/router/responses
Required Headers Include the following headers in all requests:
Authorization: Bearer $ORQ_API_KEY
Content-Type: application/json
Getting an API Key:
  1. Go to API Keys
  2. Click Create API Key and copy it
  3. Store it in your environment as ORQ_API_KEY
SDK Installation Install the OpenAI SDK:
npm install openai
# or
yarn add openai

Basic Usage

If the existing OpenAI code is already functioning, change only the base_url and api_key to the AI Gateway endpoint and ORQ_API_KEY.

Chat Completion

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "input": "Explain quantum computing in simple terms"
  }'

Streaming

Stream responses for real-time output instead of waiting for the complete response:
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const stream = await client.responses.create({
  model: "anthropic/claude-sonnet-4-6",
  input: "Tell me a story",
  stream: true,
});

for await (const event of stream) {
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
}

Advanced Usage

Prompt Caching

Prompt caching is supported on the Chat Completions endpoint (/v3/router/chat/completions). The examples below use Chat Completions tabs.
For a full guide, see Prompt Caching. Cache frequently used context (system prompts, large documents, code bases) to reduce costs by up to 90% and latency by up to 85%.
curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "You are an expert Python developer with deep knowledge of best practices.",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "user",
        "content": "Write a function to parse JSON"
      }
    ],
    "max_tokens": 1024
  }'
How It Works Prompt caching stores frequently used content blocks on Anthropic’s servers for reuse across requests:
  1. Mark content for caching: Add cache_control: { type: "ephemeral" } to text blocks
  2. First request: Content is processed normally and cached (cache write)
  3. Subsequent requests: Cached content is reused (cache read)
  4. Cache lifetime: 5 minutes from last use (automatically managed)
Configuration Mark content blocks for caching by adding the cache_control parameter:
ParameterTypeRequiredDescription
type"ephemeral"YesOnly supported cache type
ttl"5m" | "1h"NoCache duration (default: "5m")
Cache TTL Options The ttl parameter controls how long cached content persists:
  • "5m" (5 minutes): Default cache duration
  • "1h" (1 hour): Extended cache duration for longer-running workflows
{
  "cache_control": {
    "type": "ephemeral",
    "ttl": "1h"
  }
}
Cache placement rules
  • Add cache_control to the last message or content block you want cached
  • Everything up to that point is included in the cache
  • Maximum: 4 cache breakpoints per request
Minimum token thresholds Caching only activates once the marked content meets the model’s minimum. Requests below the threshold are processed normally at full cost.
ModelMinimum tokens
Claude Opus 4.6, Opus 4.54,096
Claude Sonnet 4.62,048
Claude Sonnet 4.5, Opus 4.1, Opus 4, Sonnet 4, Sonnet 3.71,024
Claude Haiku 4.54,096
Claude Haiku 3.5, Haiku 32,048
Use Cases
Cache role definitions and instructions that don’t change.
curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "You are an expert software engineer specializing in Python.\nYour responses should be:\n- Clear and concise\n- Include code examples\n- Follow PEP 8 style guidelines\n- Include error handling",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "user",
        "content": "How do I read a CSV file?"
      }
    ],
    "max_tokens": 1024
  }'
Cache documents, codebases, or knowledge bases for reuse across multiple queries.
curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Here is our API documentation:\n\n[Large documentation content here...]",
            "cache_control": { "type": "ephemeral" }
          },
          {
            "type": "text",
            "text": "How do I authenticate with the API?"
          }
        ]
      }
    ],
    "max_tokens": 1024
  }'
Cache conversation history for long interactions to reduce processing time and costs on subsequent messages.
curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      {
        "role": "user",
        "content": "What is Python?"
      },
      {
        "role": "assistant",
        "content": "Python is a high-level programming language..."
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What are its main features?",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "assistant",
        "content": "Python's main features include..."
      },
      {
        "role": "user",
        "content": "Can you give me a code example?"
      }
    ],
    "max_tokens": 1024
  }'
Cache retrieved documents for multiple queries in retrieval-augmented generation scenarios.
curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that answers based on provided context."
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Context:\n[Retrieved document content here...]",
            "cache_control": { "type": "ephemeral" }
          },
          {
            "type": "text",
            "text": "Question: What is the main topic of these documents?"
          }
        ]
      }
    ],
    "max_tokens": 1024
  }'

Extended Thinking

Enable deep reasoning for complex problems by allocating token budget for internal analysis before generating responses.
Extended thinking uses the thinking parameter, which is only supported via the Chat Completions endpoint (POST /v3/router/chat/completions). Use the Chat Completions tabs below.
curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-opus-4-6",
    "messages": [
      {
        "role": "user",
        "content": "Design a distributed rate limiting system for 1M requests/second"
      }
    ],
    "thinking": {
      "type": "enabled",
      "budget_tokens": 8000
    },
    "max_tokens": 16000
  }'
Include reasoning content with its signature when continuing conversations:
curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-opus-4-6",
    "messages": [
      {"role": "user", "content": "Design a rate limiting system"},
      {
        "role": "assistant",
        "content": [
          {
            "type": "reasoning",
            "reasoning": "...",
            "signature": "..."
          },
          {
            "type": "text",
            "text": "Here'\''s a distributed rate limiting design..."
          }
        ]
      },
      {"role": "user", "content": "How would you handle 10M req/s?"}
    ],
    "thinking": {"type": "enabled", "budget_tokens": 8000},
    "max_tokens": 16000
  }'
Important: Always include the signature field when passing reasoning content back to the API. The signature cryptographically verifies the reasoning was generated by the model and is required for multi-turn conversations.
Cache system prompts and context to reduce costs and latency when using extended thinking:
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-opus-4-6",
  messages: [
    {
      role: "system",
      content: [{
        type: "text",
        text: "You are a system architect...", // Cache this
        cache_control: { type: "ephemeral" }
      }]
    },
    { role: "user", content: "Design a notification system" }
  ],
  max_tokens: 16000,
  thinking: { type: "enabled", budget_tokens: 8000 }
});
Configuration & Best Practices
AspectGuidanceDetails
thinking.typeSet to "enabled"Enables extended thinking with manual budget
thinking.budget_tokensSet based on complexityMin: 1024, must be < max_tokens. Billed as output tokens.
Supported Models: Extended thinking with budget_tokens is available on Claude Opus 4.5, Sonnet 4.5, and newer models. For Claude Opus 4.6 and Sonnet 4.6, consider using adaptive thinking instead (see below). Available through anthropic/, aws/, and google/ providers.

Reasoning models

Configure thinking.budget_tokens and other extended thinking controls for Claude through the AI Gateway.

Adaptive Thinking

Adaptive thinking is the recommended way to use extended thinking with Claude Opus 4.6 and Sonnet 4.6. Instead of manually setting a thinking token budget, adaptive thinking lets Claude dynamically determine when and how much to think based on the complexity of each request.
Adaptive thinking uses the thinking parameter, which is only supported via the Chat Completions endpoint (POST /v3/router/chat/completions). Use the Chat Completions tabs below.
curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-opus-4-6",
    "messages": [
      {
        "role": "user",
        "content": "Design a distributed rate limiting system for 1M requests/second"
      }
    ],
    "thinking": {
      "type": "adaptive"
    },
    "max_tokens": 16000
  }'
Adaptive vs Manual Thinking
ModeConfigWhen to use
Adaptivethinking: { type: "adaptive" }Recommended for Claude 4.6 models. Claude determines thinking depth automatically.
Manualthinking: { type: "enabled", budget_tokens: N }When you need precise control over thinking token spend. Supported on all thinking-capable models.
DisabledOmit thinking parameterWhen you don’t need extended thinking and want the lowest latency.
Supported Models: Adaptive thinking is available on Claude Opus 4.6 and Claude Sonnet 4.6 only. Older models (Opus 4.5, Sonnet 4.5, etc.) require type: "enabled" with budget_tokens.

Vision Capabilities

All Claude 3+ models support image analysis with high accuracy. Choose between URL-based or base64-encoded images:
Use images from URLs for remote files:
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: { url: "https://example.com/image.jpg" }
        },
      ],
    },
  ],
});
Embed images directly as base64-encoded strings:
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const imageBase64 = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==";

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: { url: `data:image/png;base64,${imageBase64}` }
        },
      ],
    },
  ],
});

PDF Input

The examples in this section use the Chat Completions endpoint. For the Responses API equivalent, use openai.responses.create() with POST /v3/router/responses and adapt the message structure to the Responses API input format.
Claude Opus 4.6 supports direct PDF analysis:
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-opus-4-6",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Summarize this document" },
        {
          type: "document",
          document: {
            type: "pdf",
            url: "https://example.com/document.pdf"
          }
        },
      ],
    },
  ],
  max_tokens: 2048,
});

Multimodal

Full reference for image input, PDF input, image generation, and audio through the AI Gateway.

Tool Use (Function Calling)

Claude excels at tool use with sophisticated planning and execution.
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.responses.create({
  model: "anthropic/claude-sonnet-4-6",
  input: "What's the weather in Tokyo?",
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get current weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: { type: "string" },
        },
        required: ["location"],
      },
    },
  ],
});

Tool Calling

Full reference for function tools, tool_choice, and streaming with tool calls through the AI Gateway.

Multi-provider strategy

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.responses.create({
  model: "anthropic/claude-sonnet-4-6",
  input: "...",
  fallbacks: [
    { model: "aws/anthropic/claude-sonnet-4-6" },
    { model: "anthropic/claude-opus-4-6" },
  ],
});

console.log(response.output_text);

Configuration

Model Parameters

ParameterTypeDescriptionDefault
max_tokensnumberMaximum tokens to generate (required)-
temperaturenumberRandomness (0-1)1
top_pnumberNucleus sampling (0-1)-
top_knumberTop-K sampling-
stop_sequencesstring[]Custom stop sequences-
Note: max_tokens is required for Anthropic models. Typical values: 1024 for responses, 4096+ for long content.
Do not use temperature and top_p together on newer Anthropic models. Using both parameters simultaneously will result in an API error. Choose one or the other.

Token Management

// Set appropriate max_tokens based on task
const getMaxTokens = (taskType: string) => {
  const limits = {
    chat: 1024,
    summary: 500,
    generation: 4096,
    analysis: 2048,
  };
  return limits[taskType as keyof typeof limits] ?? 1024;
};

Troubleshooting

IssueProblemSolution
Missing max_tokensAnthropic models require max_tokens parameterAdd max_tokens: 1024 (or appropriate value) to your request
High costsToken usage accumulates quickly on large requestsEnable prompt caching for repeated context, use smaller models (Haiku) for simple tasks, monitor and optimize token usage
Rate limitsAnthropic has tiered rate limits based on usageUse Orq’s automatic retries and fallbacks, or consider AWS/Google providers for higher limits

Limitations

  • max_tokens required: Unlike OpenAI, must specify maximum output length
  • Rate limits: Vary by tier and provider
  • Context window: 200K tokens (may vary by provider)
  • System prompts: Handled differently than OpenAI (automatically converted by Orq)

Reference

Claude Cowork

The Orq.ai AI Gateway is compatible with Claude Cowork’s third-party inference mode. Route Cowork traffic through Orq.ai to get EU data residency, provider fallbacks, and cost control without changing the Cowork interface.

Claude Cowork

Set up Orq.ai as a Cowork third-party inference gateway.