Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

Setup Your API Key

To use Anthropic with Orq.ai, follow these steps:

Navigate to Providers (in AI Studio: Model Garden > Providers, in AI Router: Providers)
Find Anthropic in the list
Click the Configure button next to Anthropic
In the modal that opens, select Setup your own API Key
Enter a name for this configuration (e.g., “Anthropic Production”)
Paste your Anthropic API Key into the provided field
Click Save to complete the setup

Your Anthropic API key is now configured and ready to use with Orq.ai in AI Studio or through the AI Router.

Quick Start

Access Anthropic’s Claude models through the AI Router.

const response = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [
    {
      role: "user",
      content: "Explain quantum computing in simple terms",
    },
  ],
  max_tokens: 1024,
});

Available Models

Orq supports all Anthropic Claude models across multiple providers for optimal availability and pricing:

Latest Models

Model	Context	Strengths	Best For
`claude-opus-4-5-20251101`	200K	Highest intelligence	Complex reasoning, research
`claude-sonnet-4-5-20250929`	200K	Best balance	Most tasks, coding
`claude-haiku-4-5-20251001`	200K	Fast responses	Simple tasks, chat

Provider Options

Anthropic models are available through multiple providers:

anthropic/ - Direct Anthropic API
aws/ - AWS Bedrock (enterprise features)
google/ - Google Vertex AI (GCP integration)

// Direct Anthropic
model: "anthropic/claude-sonnet-4-5-20250929"

// AWS Bedrock
model: "aws/anthropic/claude-sonnet-4-5-20250929"

// Google Vertex AI
model: "google/anthropic/claude-opus-4-5-20251101"

For a complete list of supported models, see Supported Models.

Using the AI Router

Access Claude models (Claude 4.5 Opus, Sonnet, Haiku) through the AI Router with advanced message APIs, tool use capabilities, and intelligent model routing. All Claude models are available with consistent formatting and pricing across multiple providers.

Claude models use the provider slug format: anthropic/model-name. For example: anthropic/claude-opus-4-5-20251101

Prerequisites

Before making requests to the AI Router, you need to configure your environment and install the SDKs if you choose to use them. Endpoint

POST https://api.orq.ai/v2/router/chat/completions

Required Headers Include the following headers in all requests:

Authorization: Bearer $ORQ_API_KEY
Content-Type: application/json

Getting your API Key:

Go to API Keys
Click Create API Key and copy it
Store it in your environment as ORQ_API_KEY

SDK Installation Install the OpenAI SDK for your language:

npm install openai
# or
yarn add openai

Basic Usage

If your OpenAI code is already functionning, you only need to change the base_url and api_key to the router endpoint and ORQ_API_KEY.

Chat Completion

curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "max_tokens": 1024
  }'

Streaming

Stream responses for real-time output instead of waiting for the complete response:

const stream = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [{ role: "user", content: "Tell me a story" }],
  max_tokens: 2048,
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Advanced Usage

Prompt Caching

Ensure that Prompt Caching is compatible with the chosen model.

Cache frequently used context (system prompts, large documents, code bases) to reduce costs by up to 90% and latency by up to 85%.

curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "You are an expert Python developer with deep knowledge of best practices.",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "user",
        "content": "Write a function to parse JSON"
      }
    ],
    "max_tokens": 1024
  }'

How It Works Prompt caching stores frequently used content blocks on Anthropic’s servers for reuse across requests:

Mark content for caching: Add cache_control: { type: "ephemeral" } to text blocks
First request: Content is processed normally and cached (cache write)
Subsequent requests: Cached content is reused (cache read)
Cache lifetime: 5 minutes from last use (automatically managed)

Configuration Mark content blocks for caching by adding the cache_control parameter:

Parameter	Type	Required	Description
`type`	`"ephemeral"`	Yes	Only supported cache type
`ttl`	`"5m"` \| `"1h"`	No	Cache duration (default: `"5m"`)

Cache TTL Options The ttl parameter controls how long cached content persists:

"5m" (5 minutes) - Default cache duration
"1h" (1 hour) - Extended cache duration for longer-running workflows

{
  "cache_control": {
    "type": "ephemeral",
    "ttl": "1h"
  }
}

The ttl option is only available for Anthropic Claude models.

Cache placement rules

Add cache_control to the last message or content block you want cached
Everything up to that point is included in the cache
Minimum cacheable content: 1024 tokens (~800 words)
Maximum: 4 cache breakpoints per request

Supported Models Prompt caching is available on all current Claude Opus, Sonnet, and Haiku models. For the complete list of supported models, see Anthropic’s official documentation. Provider availability All models supporting prompt caching are available through anthropic, aws, and google providers. Use Cases

Static System Prompts

Cache role definitions and instructions that don’t change.

curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "You are an expert software engineer specializing in Python.\nYour responses should be:\n- Clear and concise\n- Include code examples\n- Follow PEP 8 style guidelines\n- Include error handling",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "user",
        "content": "How do I read a CSV file?"
      }
    ],
    "max_tokens": 1024
  }'

Large Document Context

Cache documents, codebases, or knowledge bases for reuse across multiple queries.

curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Here is our API documentation:\n\n[Large documentation content here...]",
            "cache_control": { "type": "ephemeral" }
          },
          {
            "type": "text",
            "text": "How do I authenticate with the API?"
          }
        ]
      }
    ],
    "max_tokens": 1024
  }'

Multi-turn Conversations

Cache conversation history for long interactions to reduce processing time and costs on subsequent messages.

curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "user",
        "content": "What is Python?"
      },
      {
        "role": "assistant",
        "content": "Python is a high-level programming language..."
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What are its main features?",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "assistant",
        "content": "Pythons main features include..."
      },
      {
        "role": "user",
        "content": "Can you give me a code example?"
      }
    ],
    "max_tokens": 1024
  }'

RAG with Document Collections

Cache retrieved documents for multiple queries in retrieval-augmented generation scenarios.

curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that answers based on provided context."
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Context:\n[Retrieved document content here...]",
            "cache_control": { "type": "ephemeral" }
          },
          {
            "type": "text",
            "text": "Question: What is the main topic of these documents?"
          }
        ]
      }
    ],
    "max_tokens": 1024
  }'

Extended Thinking

Enable deep reasoning for complex problems by allocating token budget for internal analysis before generating responses.

curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-opus-4-5-20251101",
    "messages": [
      {
        "role": "user",
        "content": "Design a distributed rate limiting system for 1M requests/second"
      }
    ],
    "thinking": {
      "type": "enabled",
      "budget_tokens": 8000
    },
    "max_tokens": 2048
  }'

Multi-turn Extended Thinking

Include reasoning content with its signature when continuing conversations:

const messages = [
  { role: "user", content: "Design a rate limiting system" }
];

const response = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages,
  thinking: { type: "enabled", budget_tokens: 8000 },
  max_tokens: 2048
});

// Map response to assistant message
const msg = response.choices[0].message;
const contentParts = [];

if (msg.reasoning) {
  contentParts.push({
    type: "reasoning",
    reasoning: msg.reasoning,
    signature: msg.reasoning_signature
  });
}

if (msg.redacted_reasoning) {
  contentParts.push({
    type: "redacted_reasoning",
    data: msg.redacted_reasoning
  });
}

if (msg.content) {
  contentParts.push({
    type: "text",
    text: msg.content
  });
}

const assistantMessage = {
  role: "assistant",
  content: contentParts
};

messages.push(assistantMessage);
messages.push({ role: "user", content: "How would you handle 10M req/s?" });

const followUp = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages,
  thinking: { type: "enabled", budget_tokens: 8000 },
  max_tokens: 2048
});

Important: Always include the signature field when passing reasoning content back to the API. The signature cryptographically verifies the reasoning was generated by the model and is required for multi-turn conversations.

Combine with prompt caching for repeated contexts

Cache system prompts and context to reduce costs and latency when using extended thinking:

const response = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages: [
    {
      role: "system",
      content: [{
        type: "text",
        text: "You are a system architect...", // Cache this
        cache_control: { type: "ephemeral" }
      }]
    },
    { role: "user", content: "Design a notification system" }
  ],
  thinking: { type: "enabled", budget_tokens: 8000 }
});

Configuration & Best Practices

Aspect	Guidance	Details
`thinking.type`	Set to `"enabled"`	Enables extended thinking
`thinking.budget_tokens`	Set based on complexity	Min: 1024, must be < `max_tokens`. Billed as output tokens.

Supported Models: Extended thinking is available on Claude Opus 4.5 (recommended), Sonnet 4.5, and newer models. Available through anthropic/, aws/, and google/ providers. For the complete list, see Anthropic’s documentation.

Vision Capabilities

All Claude 3+ models support image analysis with high accuracy. Choose between URL-based or base64-encoded images:

Image from URL

Use images from URLs for remote files:

const response = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: { url: "https://example.com/image.jpg" }
        },
      ],
    },
  ],
});

Image from Base64

Embed images directly as base64-encoded strings:

const imageBase64 = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==";

const response = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: { url: `data:image/jpeg;base64,${imageBase64}` }
        },
      ],
    },
  ],
});

PDF Input

Claude Opus 4.5 supports direct PDF analysis:

const response = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Summarize this document" },
        {
          type: "document",
          document: {
            type: "pdf",
            url: "https://example.com/document.pdf"
          }
        },
      ],
    },
  ],
  max_tokens: 2048,
});

Tool Use (Function Calling)

Claude excels at tool use with sophisticated planning and execution.

const response = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [{ role: "user", content: "What's the weather in Tokyo?" }],
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get current weather for a location",
        parameters: {
          type: "object",
          properties: {
            location: { type: "string" },
          },
          required: ["location"],
        },
      },
    },
  ],
});

Multi-provider strategy

// Use Orq's fallback system for reliability
const response = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [{ role: "user", content: "..." }],
  orq: {
    fallbacks: [
      { model: "aws/anthropic/claude-sonnet-4-5-20250929" },
      { model: "anthropic/claude-opus-4-5-20251101" },
    ],
  },
});

Configuration

Model Parameters

Parameter	Type	Description	Default
`max_tokens`	number	Maximum tokens to generate (required)	-
`temperature`	number	Randomness (0-1)	1
`top_p`	number	Nucleus sampling (0-1)	-
`top_k`	number	Top-K sampling	-
`stop_sequences`	string[]	Custom stop sequences	-

Note: max_tokens is required for Anthropic models. Typical values: 1024 for responses, 4096+ for long content.

Do not use temperature and top_p together on newer Anthropic models. Using both parameters simultaneously will result in an API error. Choose one or the other.

Token Management

// Set appropriate max_tokens based on task
const getMaxTokens = (taskType: string) => {
  const limits = {
    chat: 1024,
    summary: 500,
    generation: 4096,
    analysis: 2048,
  };
  return limits[taskType] || 1024;
};

Troubleshooting

Issue	Problem	Solution
Missing `max_tokens`	Anthropic models require `max_tokens` parameter	Add `max_tokens: 1024` (or appropriate value) to your request
High costs	Token usage accumulates quickly on large requests	Enable prompt caching for repeated context, use smaller models (Haiku) for simple tasks, monitor and optimize token usage
Rate limits	Anthropic has tiered rate limits based on usage	Use Orq’s automatic retries and fallbacks, or consider AWS/Google providers for higher limits

Limitations

max_tokens required: Unlike OpenAI, must specify maximum output length
Rate limits: Vary by tier and provider
Context window: 200K tokens (may vary by provider)
System prompts: Handled differently than OpenAI (automatically converted by Orq)

Code Assistants

LLM Providers

Frameworks

Anthropic Models | Claude AI Integration

Setup Your API Key

Quick Start

Available Models

Latest Models

Provider Options

Using the AI Router

Prerequisites

Basic Usage

Chat Completion

Streaming

Advanced Usage

Prompt Caching

Extended Thinking

Vision Capabilities

PDF Input

Tool Use (Function Calling)

Multi-provider strategy

Configuration

Model Parameters

Token Management

Troubleshooting

Limitations

Reference

Code Assistants

LLM Providers

Frameworks

​Setup Your API Key

​Quick Start

​Available Models

​Latest Models

​Provider Options

​Using the AI Router

​Prerequisites

​Basic Usage

​Chat Completion

​Streaming

​Advanced Usage

​Prompt Caching

​Extended Thinking

​Vision Capabilities

​PDF Input

​Tool Use (Function Calling)

​Multi-provider strategy

​Configuration

​Model Parameters

​Token Management

​Troubleshooting

​Limitations

​Reference

Setup Your API Key

Quick Start

Available Models

Latest Models

Provider Options

Using the AI Router

Prerequisites

Basic Usage

Chat Completion

Streaming

Advanced Usage

Prompt Caching

Extended Thinking

Vision Capabilities

PDF Input

Tool Use (Function Calling)

Multi-provider strategy

Configuration

Model Parameters

Token Management

Troubleshooting

Limitations

Reference