Skip to main content

Setup Your API Key

To use Anthropic with Orq.ai, follow these steps:
  1. Navigate to Providers (in AI Studio: Model Garden > Providers, in AI Router: Providers)
  2. Find Anthropic in the list
  3. Click the Configure button next to Anthropic
  4. In the modal that opens, select Setup your own API Key
  5. Enter a name for this configuration (e.g., “Anthropic Production”)
  6. Paste your Anthropic API Key into the provided field
  7. Click Save to complete the setup
Your Anthropic API key is now configured and ready to use with Orq.ai in AI Studio or through the AI Router.

Quick Start

Access Anthropic’s Claude models through the AI Router.
const response = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [
    {
      role: "user",
      content: "Explain quantum computing in simple terms",
    },
  ],
  max_tokens: 1024,
});

Available Models

Orq supports all Anthropic Claude models across multiple providers for optimal availability and pricing:

Latest Models

ModelContextStrengthsBest For
claude-opus-4-5-20251101200KHighest intelligenceComplex reasoning, research
claude-sonnet-4-5-20250929200KBest balanceMost tasks, coding
claude-haiku-4-5-20251001200KFast responsesSimple tasks, chat

Provider Options

Anthropic models are available through multiple providers:
  • anthropic/ - Direct Anthropic API
  • aws/ - AWS Bedrock (enterprise features)
  • google/ - Google Vertex AI (GCP integration)
// Direct Anthropic
model: "anthropic/claude-sonnet-4-5-20250929"

// AWS Bedrock
model: "aws/anthropic/claude-sonnet-4-5-20250929"

// Google Vertex AI
model: "google/anthropic/claude-opus-4-5-20251101"
For a complete list of supported models, see Supported Models.

Using the AI Router

Access Claude models (Claude 4.5 Opus, Sonnet, Haiku) through the AI Router with advanced message APIs, tool use capabilities, and intelligent model routing. All Claude models are available with consistent formatting and pricing across multiple providers.
Claude models use the provider slug format: anthropic/model-name. For example: anthropic/claude-opus-4-5-20251101

Prerequisites

Before making requests to the AI Router, you need to configure your environment and install the SDKs if you choose to use them. Endpoint
POST https://api.orq.ai/v2/router/chat/completions
Required Headers Include the following headers in all requests:
Authorization: Bearer $ORQ_API_KEY
Content-Type: application/json
Getting your API Key:
  1. Go to API Keys
  2. Click Create API Key and copy it
  3. Store it in your environment as ORQ_API_KEY
SDK Installation Install the OpenAI SDK for your language:
npm install openai
# or
yarn add openai

Basic Usage

If your OpenAI code is already functionning, you only need to change the base_url and api_key to the router endpoint and ORQ_API_KEY.

Chat Completion

curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "max_tokens": 1024
  }'

Streaming

Stream responses for real-time output instead of waiting for the complete response:
const stream = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [{ role: "user", content: "Tell me a story" }],
  max_tokens: 2048,
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Advanced Usage

Prompt Caching

Ensure that Prompt Caching is compatible with the chosen model.
Cache frequently used context (system prompts, large documents, code bases) to reduce costs by up to 90% and latency by up to 85%.
curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "You are an expert Python developer with deep knowledge of best practices.",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "user",
        "content": "Write a function to parse JSON"
      }
    ],
    "max_tokens": 1024
  }'
How It Works Prompt caching stores frequently used content blocks on Anthropic’s servers for reuse across requests:
  1. Mark content for caching: Add cache_control: { type: "ephemeral" } to text blocks
  2. First request: Content is processed normally and cached (cache write)
  3. Subsequent requests: Cached content is reused (cache read)
  4. Cache lifetime: 5 minutes from last use (automatically managed)
Configuration Mark content blocks for caching by adding the cache_control parameter:
ParameterTypeRequiredDescription
type"ephemeral"YesOnly supported cache type
ttl"5m" | "1h"NoCache duration (default: "5m")
Cache TTL Options The ttl parameter controls how long cached content persists:
  • "5m" (5 minutes) - Default cache duration
  • "1h" (1 hour) - Extended cache duration for longer-running workflows
{
  "cache_control": {
    "type": "ephemeral",
    "ttl": "1h"
  }
}
The ttl option is only available for Anthropic Claude models.
Cache placement rules
  • Add cache_control to the last message or content block you want cached
  • Everything up to that point is included in the cache
  • Minimum cacheable content: 1024 tokens (~800 words)
  • Maximum: 4 cache breakpoints per request
Supported Models Prompt caching is available on all current Claude Opus, Sonnet, and Haiku models. For the complete list of supported models, see Anthropic’s official documentation. Provider availability All models supporting prompt caching are available through anthropic, aws, and google providers. Use Cases
Cache role definitions and instructions that don’t change.
curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "You are an expert software engineer specializing in Python.\nYour responses should be:\n- Clear and concise\n- Include code examples\n- Follow PEP 8 style guidelines\n- Include error handling",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "user",
        "content": "How do I read a CSV file?"
      }
    ],
    "max_tokens": 1024
  }'
Cache documents, codebases, or knowledge bases for reuse across multiple queries.
curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Here is our API documentation:\n\n[Large documentation content here...]",
            "cache_control": { "type": "ephemeral" }
          },
          {
            "type": "text",
            "text": "How do I authenticate with the API?"
          }
        ]
      }
    ],
    "max_tokens": 1024
  }'
Cache conversation history for long interactions to reduce processing time and costs on subsequent messages.
curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "user",
        "content": "What is Python?"
      },
      {
        "role": "assistant",
        "content": "Python is a high-level programming language..."
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What are its main features?",
            "cache_control": { "type": "ephemeral" }
          }
        ]
      },
      {
        "role": "assistant",
        "content": "Pythons main features include..."
      },
      {
        "role": "user",
        "content": "Can you give me a code example?"
      }
    ],
    "max_tokens": 1024
  }'
Cache retrieved documents for multiple queries in retrieval-augmented generation scenarios.
curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that answers based on provided context."
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Context:\n[Retrieved document content here...]",
            "cache_control": { "type": "ephemeral" }
          },
          {
            "type": "text",
            "text": "Question: What is the main topic of these documents?"
          }
        ]
      }
    ],
    "max_tokens": 1024
  }'

Extended Thinking

Enable deep reasoning for complex problems by allocating token budget for internal analysis before generating responses.
curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-opus-4-5-20251101",
    "messages": [
      {
        "role": "user",
        "content": "Design a distributed rate limiting system for 1M requests/second"
      }
    ],
    "thinking": {
      "type": "enabled",
      "budget_tokens": 8000
    },
    "max_tokens": 2048
  }'
Include reasoning content with its signature when continuing conversations:
const messages = [
  { role: "user", content: "Design a rate limiting system" }
];

const response = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages,
  thinking: { type: "enabled", budget_tokens: 8000 },
  max_tokens: 2048
});

// Map response to assistant message
const msg = response.choices[0].message;
const contentParts = [];

if (msg.reasoning) {
  contentParts.push({
    type: "reasoning",
    reasoning: msg.reasoning,
    signature: msg.reasoning_signature
  });
}

if (msg.redacted_reasoning) {
  contentParts.push({
    type: "redacted_reasoning",
    data: msg.redacted_reasoning
  });
}

if (msg.content) {
  contentParts.push({
    type: "text",
    text: msg.content
  });
}

const assistantMessage = {
  role: "assistant",
  content: contentParts
};

messages.push(assistantMessage);
messages.push({ role: "user", content: "How would you handle 10M req/s?" });

const followUp = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages,
  thinking: { type: "enabled", budget_tokens: 8000 },
  max_tokens: 2048
});
Important: Always include the signature field when passing reasoning content back to the API. The signature cryptographically verifies the reasoning was generated by the model and is required for multi-turn conversations.
Cache system prompts and context to reduce costs and latency when using extended thinking:
const response = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages: [
    {
      role: "system",
      content: [{
        type: "text",
        text: "You are a system architect...", // Cache this
        cache_control: { type: "ephemeral" }
      }]
    },
    { role: "user", content: "Design a notification system" }
  ],
  thinking: { type: "enabled", budget_tokens: 8000 }
});
Configuration & Best Practices
AspectGuidanceDetails
thinking.typeSet to "enabled"Enables extended thinking
thinking.budget_tokensSet based on complexityMin: 1024, must be < max_tokens. Billed as output tokens.
Supported Models: Extended thinking is available on Claude Opus 4.5 (recommended), Sonnet 4.5, and newer models. Available through anthropic/, aws/, and google/ providers. For the complete list, see Anthropic’s documentation.

Vision Capabilities

All Claude 3+ models support image analysis with high accuracy. Choose between URL-based or base64-encoded images:
Use images from URLs for remote files:
const response = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: { url: "https://example.com/image.jpg" }
        },
      ],
    },
  ],
});
Embed images directly as base64-encoded strings:
const imageBase64 = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==";

const response = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: { url: `data:image/jpeg;base64,${imageBase64}` }
        },
      ],
    },
  ],
});

PDF Input

Claude Opus 4.5 supports direct PDF analysis:
const response = await openai.chat.completions.create({
  model: "anthropic/claude-opus-4-5-20251101",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Summarize this document" },
        {
          type: "document",
          document: {
            type: "pdf",
            url: "https://example.com/document.pdf"
          }
        },
      ],
    },
  ],
  max_tokens: 2048,
});

Tool Use (Function Calling)

Claude excels at tool use with sophisticated planning and execution.
const response = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [{ role: "user", content: "What's the weather in Tokyo?" }],
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get current weather for a location",
        parameters: {
          type: "object",
          properties: {
            location: { type: "string" },
          },
          required: ["location"],
        },
      },
    },
  ],
});

Multi-provider strategy

// Use Orq's fallback system for reliability
const response = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5-20250929",
  messages: [{ role: "user", content: "..." }],
  orq: {
    fallbacks: [
      { model: "aws/anthropic/claude-sonnet-4-5-20250929" },
      { model: "anthropic/claude-opus-4-5-20251101" },
    ],
  },
});

Configuration

Model Parameters

ParameterTypeDescriptionDefault
max_tokensnumberMaximum tokens to generate (required)-
temperaturenumberRandomness (0-1)1
top_pnumberNucleus sampling (0-1)-
top_knumberTop-K sampling-
stop_sequencesstring[]Custom stop sequences-
Note: max_tokens is required for Anthropic models. Typical values: 1024 for responses, 4096+ for long content.
Do not use temperature and top_p together on newer Anthropic models. Using both parameters simultaneously will result in an API error. Choose one or the other.

Token Management

// Set appropriate max_tokens based on task
const getMaxTokens = (taskType: string) => {
  const limits = {
    chat: 1024,
    summary: 500,
    generation: 4096,
    analysis: 2048,
  };
  return limits[taskType] || 1024;
};

Troubleshooting

IssueProblemSolution
Missing max_tokensAnthropic models require max_tokens parameterAdd max_tokens: 1024 (or appropriate value) to your request
High costsToken usage accumulates quickly on large requestsEnable prompt caching for repeated context, use smaller models (Haiku) for simple tasks, monitor and optimize token usage
Rate limitsAnthropic has tiered rate limits based on usageUse Orq’s automatic retries and fallbacks, or consider AWS/Google providers for higher limits

Limitations

  • max_tokens required: Unlike OpenAI, must specify maximum output length
  • Rate limits: Vary by tier and provider
  • Context window: 200K tokens (may vary by provider)
  • System prompts: Handled differently than OpenAI (automatically converted by Orq)

Reference