Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

This page describes how to use Anthropic models through the AI Gateway. To learn more about the AI Gateway, see AI Gateway.

Quick Start

Access Anthropic’s Claude models through Orq’s unified API with automatic fallbacks, caching, and observability.

const response = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [
    {
      role: "user",
      content: "Explain quantum computing in simple terms",
    },
  ],
  max_tokens: 1024,
});

Available Models

Orq supports all Anthropic Claude models across multiple providers for optimal availability and pricing:

Latest Models

Model	Context	Strengths	Best For
`claude-opus-4-5-20251101`	200K	Highest intelligence	Complex reasoning, research
`claude-3-5-sonnet-20241022`	200K	Best balance	Most tasks, coding
`claude-3-5-haiku-20241022`	200K	Fast responses	Simple tasks, chat

Provider Options

Anthropic models are available through multiple providers:

anthropic/ - Direct Anthropic API
aws/ - AWS Bedrock (enterprise features)
google/ - Google Vertex AI (GCP integration)

// Direct Anthropic
model: "anthropic/claude-sonnet-4-5-20250929"

// AWS Bedrock
model: "aws/anthropic/claude-sonnet-4-5-20250929"

// Google Vertex AI
model: "google/anthropic/claude-opus-4-5-20251101"

Key Features

Prompt Caching

Cache frequently used context (system prompts, documents) to reduce costs by up to 90% and latency by up to 85%. Learn more about Prompt Caching

Extended Thinking

Enable deep reasoning for complex problems with budget-based token allocation for internal analysis. Learn more about Extended Thinking

Vision Capabilities

All Claude 3+ models support image analysis with high accuracy.

const response = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: { url: "https://example.com/image.jpg" }
        },
      ],
    },
  ],
});

Tool Use (Function Calling)

Claude excels at tool use with sophisticated planning and execution.

const response = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [{ role: "user", content: "What's the weather in Tokyo?" }],
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get current weather for a location",
        parameters: {
          type: "object",
          properties: {
            location: { type: "string" },
          },
          required: ["location"],
        },
      },
    },
  ],
});

Code Examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "user",
        "content": "Write a Python function to calculate Fibonacci numbers"
      }
    ],
    "max_tokens": 1024
  }'

Model Parameters

Parameter	Type	Description	Default
`max_tokens`	number	Maximum tokens to generate (required)	-
`temperature`	number	Randomness (0-1)	1
`top_p`	number	Nucleus sampling (0-1)	-
`top_k`	number	Top-K sampling	-
`stop_sequences`	string[]	Custom stop sequences	-

Note: max_tokens is required for Anthropic models. Typical values: 1024 for responses, 4096+ for long content.

Best Practices

Model selection:

Opus 4.5: Complex analysis, research, advanced reasoning
Sonnet 3.5: Most tasks, coding, general use (best price/performance)
Haiku 3.5: Simple queries, fast responses, high-volume tasks

Token management:

// Set appropriate max_tokens based on task
const getMaxTokens = (taskType: string) => {
  const limits = {
    chat: 1024,
    summary: 500,
    generation: 4096,
    analysis: 2048,
  };
  return limits[taskType] || 1024;
};

Multi-provider strategy:

// Use Orq's fallback system for reliability
const response = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [{ role: "user", content: "..." }],
  orq: {
    fallbacks: [
      { model: "aws/anthropic/claude-sonnet-4-5-20250929" },
      { model: "anthropic/claude-opus-4-5-20251101" },
    ],
  },
});

Response Structure

{
  "id": "msg_01ABC123",
  "object": "chat.completion",
  "created": 1704067200,
  "model": "anthropic/claude-sonnet-4-5-20250929",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Response content here"
      },
      "finish_reason": "end_turn"
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 250,
    "total_tokens": 350
  }
}

Troubleshooting

Missing max_tokens error

Anthropic models require max_tokens parameter
Add to request: max_tokens: 1024 (or appropriate value)

High costs

Enable prompt caching for repeated context
Use smaller models (Haiku) for simple tasks
Monitor token usage and optimize prompts

Rate limits

Anthropic has tiered rate limits based on usage
Use Orq’s automatic retries and fallbacks
Consider AWS/Google providers for higher limits

Limitations

max_tokens required: Unlike OpenAI, must specify maximum output length
Rate limits: Vary by tier and provider
Context window: 200K tokens (may vary by provider)
System prompts: Handled differently than OpenAI (automatically converted by Orq)

Advanced Features

Streaming

const stream = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [{ role: "user", content: "Tell me a story" }],
  max_tokens: 2048,
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

PDF Input

Claude Opus 4.5 supports direct PDF analysis:

const response = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Summarize this document" },
        {
          type: "document",
          document: {
            type: "pdf",
            url: "https://example.com/document.pdf"
          }
        },
      ],
    },
  ],
  max_tokens: 2048,
});

Getting Started

Reference

Administer

Anthropic Models

Quick Start

Available Models

Latest Models

Provider Options

Key Features

Prompt Caching

Extended Thinking

Vision Capabilities

Tool Use (Function Calling)

Code Examples

Model Parameters

Best Practices

Response Structure

Troubleshooting

Limitations

Advanced Features

Streaming

PDF Input

Reference

Getting Started

Reference

Administer

​Quick Start

​Available Models

​Latest Models

​Provider Options

​Key Features

​Prompt Caching

​Extended Thinking

​Vision Capabilities

​Tool Use (Function Calling)

​Code Examples

​Model Parameters

​Best Practices

​Response Structure

​Troubleshooting

​Limitations

​Advanced Features

​Streaming

​PDF Input

​Reference

Quick Start

Available Models

Latest Models

Provider Options

Key Features

Prompt Caching

Extended Thinking

Vision Capabilities

Tool Use (Function Calling)

Code Examples

Model Parameters

Best Practices

Response Structure

Troubleshooting

Limitations

Advanced Features

Streaming

PDF Input

Reference