> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Anthropic Claude integration

> Access Claude models through Orq.ai. Use Claude 4.6 Opus, Sonnet, and Claude 4.5 Haiku with enhanced routing, caching, and prompt management capabilities.

## Setup Your API Key

To use Anthropic with Orq.ai, follow these steps:

1. Navigate to **AI Gateway** > BYOK
2. Find **Anthropic** in the list
3. Click the **Configure** button next to Anthropic
4. In the modal that opens, select <kbd className="key">Setup your own API Key</kbd>
5. Enter a name for this configuration (e.g., "Anthropic Production")
6. Paste your Anthropic API Key into the provided field
7. Click **Save** to complete the setup

Your Anthropic API key is now configured and ready to use with Orq.ai in **AI Studio** or through the **AI Gateway**.

## Quick Start

Access Anthropic's Claude models through the **AI Gateway**.

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/responses \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "anthropic/claude-sonnet-4-6",
      "input": "Explain quantum computing in simple terms"
    }'
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.responses.create({
    model: "anthropic/claude-sonnet-4-6",
    input: "Explain quantum computing in simple terms",
  });

  console.log(response.output_text);
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.responses.create(
      model="anthropic/claude-sonnet-4-6",
      input="Explain quantum computing in simple terms",
  )

  print(response.output_text)
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "anthropic/claude-sonnet-4-6",
    messages: [
      {
        role: "user",
        content: "Explain quantum computing in simple terms",
      },
    ],
    max_tokens: 1024,
  });

  console.log(response.choices[0].message.content);
  ```

  ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.chat.completions.create(
      model="anthropic/claude-sonnet-4-6",
      messages=[
          {
              "role": "user",
              "content": "Explain quantum computing in simple terms",
          }
      ],
      max_tokens=1024,
  )

  print(response.choices[0].message.content)
  ```
</CodeGroup>

## Available Models

Orq supports all Anthropic Claude models across multiple providers for optimal availability and pricing:

### Latest Models

| Model                       | Context | Strengths                                         | Best For                                 |
| --------------------------- | ------- | ------------------------------------------------- | ---------------------------------------- |
| `claude-opus-4-7`           | 1M      | Highest intelligence, extra-high reasoning effort | Coding, agentic tasks, complex reasoning |
| `claude-opus-4-6`           | 200K    | High intelligence                                 | Complex reasoning, research              |
| `claude-sonnet-4-6`         | 200K    | Best balance                                      | Most tasks, coding                       |
| `claude-haiku-4-5-20251001` | 200K    | Fast responses                                    | Simple tasks, chat                       |

### Provider Options

Anthropic models are available through multiple providers:

* **`anthropic/`**: Direct Anthropic API
* **`aws/`**: AWS Bedrock (enterprise features)
* **`google/`**: Google Vertex AI (GCP integration)

<CodeGroup>
  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  // Use these model strings inside your responses.create() or chat.completions.create() call

  // Direct Anthropic
  model: "anthropic/claude-sonnet-4-6"

  // AWS Bedrock
  model: "aws/anthropic/claude-sonnet-4-6"

  // Google Vertex AI
  model: "google/anthropic/claude-opus-4-6"
  ```
</CodeGroup>

For a complete list of supported models, see [Supported Models](/docs/ai-studio/ai-gateway/supported-models).

## Using the AI Gateway

Access Claude models (Claude 4.6 Opus, Sonnet, and Claude 4.5 Haiku) through the **AI Gateway** with advanced message APIs, tool use capabilities, and intelligent model routing. All Claude models are available with consistent formatting and pricing across multiple providers.

<Info>
  Claude models use the provider slug format: `anthropic/model-name`. For example: `anthropic/claude-sonnet-4-6`
</Info>

### Prerequisites

Before making requests to the **AI Gateway**, you need to configure your environment and install the SDKs if you choose to use them.

**Endpoint**

```
POST https://api.orq.ai/v3/router/responses
```

**Required Headers**

Include the following headers in all requests:

```
Authorization: Bearer $ORQ_API_KEY
Content-Type: application/json
```

**Getting your API Key:**

1. Go to [API Keys](/docs/ai-studio/organization/api-keys)
2. Click <kbd className="key">Create API Key</kbd> and copy it
3. Store it in your environment as `ORQ_API_KEY`

**SDK Installation**

Install the OpenAI SDK for your language:

<CodeGroup>
  ```bash Node.js/TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  npm install openai
  # or
  yarn add openai
  ```

  ```bash Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  pip install openai
  ```
</CodeGroup>

### Basic Usage

<Tip>
  If your OpenAI code is already functioning, you only need to change the `base_url` and `api_key` to the router endpoint and `ORQ_API_KEY`.
</Tip>

#### Chat Completion

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/responses \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "anthropic/claude-sonnet-4-6",
      "input": "Explain quantum computing in simple terms"
    }'
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.responses.create({
    model: "anthropic/claude-sonnet-4-6",
    input: "Explain quantum computing in simple terms",
  });

  console.log(response.output_text);
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.responses.create(
      model="anthropic/claude-sonnet-4-6",
      input="Explain quantum computing in simple terms",
  )

  print(response.output_text)
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "anthropic/claude-sonnet-4-6",
    messages: [
      {
        role: "user",
        content: "Explain quantum computing in simple terms",
      },
    ],
    max_tokens: 1024,
  });

  console.log(response.choices[0].message.content);
  ```

  ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.chat.completions.create(
      model="anthropic/claude-sonnet-4-6",
      messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}],
      max_tokens=1024,
  )

  print(response.choices[0].message.content)
  ```

  ```python Python (Anthropic SDK) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from anthropic import Anthropic
  import os

  client = Anthropic(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  message = client.messages.create(
      model="anthropic/claude-sonnet-4-6",
      max_tokens=1024,
      messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}],
  )

  print(message.content[0].text)
  ```

  ```typescript TypeScript (Anthropic SDK) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import Anthropic from "@anthropic-ai/sdk";

  const client = new Anthropic({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const message = await client.messages.create({
    model: "anthropic/claude-sonnet-4-6",
    max_tokens: 1024,
    messages: [{ role: "user", content: "Explain quantum computing in simple terms" }],
  });

  console.log(message.content[0].text);
  ```
</CodeGroup>

#### Streaming

Stream responses for real-time output instead of waiting for the complete response:

<CodeGroup>
  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const stream = await client.responses.create({
    model: "anthropic/claude-sonnet-4-6",
    input: "Tell me a story",
    stream: true,
  });

  for await (const event of stream) {
    if (event.type === "response.output_text.delta") {
      process.stdout.write(event.delta);
    }
  }
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  stream = client.responses.create(
      model="anthropic/claude-sonnet-4-6",
      input="Tell me a story",
      stream=True,
  )

  for event in stream:
      if event.type == "response.output_text.delta":
          print(event.delta, end="", flush=True)
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const stream = await client.chat.completions.create({
    model: "anthropic/claude-sonnet-4-6",
    messages: [{ role: "user", content: "Tell me a story" }],
    max_tokens: 2048,
    stream: true,
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || "");
  }
  ```

  ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  stream = client.chat.completions.create(
      model="anthropic/claude-sonnet-4-6",
      messages=[{"role": "user", "content": "Tell me a story"}],
      max_tokens=2048,
      stream=True,
  )

  for chunk in stream:
      print(chunk.choices[0].delta.content or "", end="", flush=True)
  ```
</CodeGroup>

### Advanced Usage

#### Prompt Caching

<Note>
  Prompt caching is supported on the Chat Completions endpoint (`/v3/router/chat/completions`). The examples below use Chat Completions tabs.
</Note>

For a full guide, see [Prompt Caching](/docs/ai-studio/ai-gateway/prompt-caching).

Cache frequently used context (system prompts, large documents, code bases) to reduce costs by up to 90% and latency by up to 85%.

<CodeGroup>
  ```bash cURL (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/chat/completions \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "anthropic/claude-sonnet-4-6",
      "messages": [
        {
          "role": "system",
          "content": [
            {
              "type": "text",
              "text": "You are an expert Python developer with deep knowledge of best practices.",
              "cache_control": { "type": "ephemeral" }
            }
          ]
        },
        {
          "role": "user",
          "content": "Write a function to parse JSON"
        }
      ],
      "max_tokens": 1024
    }'
  ```

  ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get('ORQ_API_KEY'),
      base_url='https://api.orq.ai/v3/router'
  )

  response = client.chat.completions.create(
      model='anthropic/claude-sonnet-4-6',
      messages=[
          {
              'role': 'system',
              'content': [
                  {
                      'type': 'text',
                      'text': 'You are an expert Python developer with deep knowledge of best practices.',
                      'cache_control': {'type': 'ephemeral'}
                  }
              ]
          },
          {
              'role': 'user',
              'content': 'Write a function to parse JSON'
          }
      ],
      max_tokens=1024
  )
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "anthropic/claude-sonnet-4-6",
    messages: [
      {
        role: "system",
        content: [
          {
            type: "text",
            text: "You are an expert Python developer with deep knowledge of best practices.",
            cache_control: { type: "ephemeral" },
          },
        ],
      },
      {
        role: "user",
        content: "Write a function to parse JSON",
      },
    ],
    max_tokens: 1024,
  });
  ```
</CodeGroup>

**How It Works**

Prompt caching stores frequently used content blocks on Anthropic's servers for reuse across requests:

1. **Mark content for caching**: Add `cache_control: { type: "ephemeral" }` to text blocks
2. **First request**: Content is processed normally and cached (cache write)
3. **Subsequent requests**: Cached content is reused (cache read)
4. **Cache lifetime**: 5 minutes from last use (automatically managed)

**Configuration**

Mark content blocks for caching by adding the `cache_control` parameter:

| Parameter | Type             | Required | Description                      |
| --------- | ---------------- | -------- | -------------------------------- |
| `type`    | `"ephemeral"`    | Yes      | Only supported cache type        |
| `ttl`     | `"5m"` \| `"1h"` | No       | Cache duration (default: `"5m"`) |

**Cache TTL Options**

The `ttl` parameter controls how long cached content persists:

* `"5m"` (5 minutes): Default cache duration
* `"1h"` (1 hour): Extended cache duration for longer-running workflows

```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
{
  "cache_control": {
    "type": "ephemeral",
    "ttl": "1h"
  }
}
```

**Cache placement rules**

* Add `cache_control` to the **last** message or content block you want cached
* Everything up to that point is included in the cache
* Maximum: 4 cache breakpoints per request

**Minimum token thresholds**

Caching only activates once the marked content meets the model's minimum. Requests below the threshold are processed normally at full cost.

| Model                                                     | Minimum tokens |
| --------------------------------------------------------- | -------------- |
| Claude Opus 4.6, Opus 4.5                                 | 4,096          |
| Claude Sonnet 4.6                                         | 2,048          |
| Claude Sonnet 4.5, Opus 4.1, Opus 4, Sonnet 4, Sonnet 3.7 | 1,024          |
| Claude Haiku 4.5                                          | 4,096          |
| Claude Haiku 3.5, Haiku 3                                 | 2,048          |

**Use Cases**

<AccordionGroup>
  <Accordion title="Static System Prompts">
    Cache role definitions and instructions that don't change.

    <CodeGroup>
      ```bash cURL (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl -X POST https://api.orq.ai/v3/router/chat/completions \
        -H "Authorization: Bearer $ORQ_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{
          "model": "anthropic/claude-sonnet-4-6",
          "messages": [
            {
              "role": "system",
              "content": [
                {
                  "type": "text",
                  "text": "You are an expert software engineer specializing in Python.\nYour responses should be:\n- Clear and concise\n- Include code examples\n- Follow PEP 8 style guidelines\n- Include error handling",
                  "cache_control": { "type": "ephemeral" }
                }
              ]
            },
            {
              "role": "user",
              "content": "How do I read a CSV file?"
            }
          ],
          "max_tokens": 1024
        }'
      ```

      ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
      from openai import OpenAI
      import os

      client = OpenAI(
          api_key=os.environ.get('ORQ_API_KEY'),
          base_url='https://api.orq.ai/v3/router'
      )

      system_prompt = """You are an expert software engineer specializing in Python.
      Your responses should be:
      - Clear and concise
      - Include code examples
      - Follow PEP 8 style guidelines
      - Include error handling"""

      response = client.chat.completions.create(
          model='anthropic/claude-sonnet-4-6',
          messages=[
              {
                  'role': 'system',
                  'content': [
                      {
                          'type': 'text',
                          'text': system_prompt,
                          'cache_control': {'type': 'ephemeral'}
                      }
                  ]
              },
              {
                  'role': 'user',
                  'content': 'How do I read a CSV file?'
              }
          ],
          max_tokens=1024
      )
      ```

      ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
      import OpenAI from "openai";

      const client = new OpenAI({
        apiKey: process.env.ORQ_API_KEY,
        baseURL: "https://api.orq.ai/v3/router",
      });

      const response = await client.chat.completions.create({
        model: 'anthropic/claude-sonnet-4-6',
        messages: [
          {
            role: 'system',
            content: [
              {
                type: 'text',
                text: `You are an expert software engineer specializing in Python.
      Your responses should be:
      - Clear and concise
      - Include code examples
      - Follow PEP 8 style guidelines
      - Include error handling`,
                cache_control: { type: 'ephemeral' },
              },
            ],
          },
          {
            role: 'user',
            content: 'How do I read a CSV file?',
          },
        ],
        max_tokens: 1024,
      });
      ```
    </CodeGroup>
  </Accordion>

  <Accordion title="Large Document Context">
    Cache documents, codebases, or knowledge bases for reuse across multiple queries.

    <CodeGroup>
      ```bash cURL (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl -X POST https://api.orq.ai/v3/router/chat/completions \
        -H "Authorization: Bearer $ORQ_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{
          "model": "anthropic/claude-sonnet-4-6",
          "messages": [
            {
              "role": "user",
              "content": [
                {
                  "type": "text",
                  "text": "Here is our API documentation:\n\n[Large documentation content here...]",
                  "cache_control": { "type": "ephemeral" }
                },
                {
                  "type": "text",
                  "text": "How do I authenticate with the API?"
                }
              ]
            }
          ],
          "max_tokens": 1024
        }'
      ```

      ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
      from openai import OpenAI
      import os

      client = OpenAI(
          api_key=os.environ.get('ORQ_API_KEY'),
          base_url='https://api.orq.ai/v3/router'
      )

      api_docs = "Your large API documentation content goes here..."

      response = client.chat.completions.create(
          model='anthropic/claude-sonnet-4-6',
          messages=[
              {
                  'role': 'user',
                  'content': [
                      {
                          'type': 'text',
                          'text': f'Here is our API documentation:\n\n{api_docs}',
                          'cache_control': {'type': 'ephemeral'}
                      },
                      {
                          'type': 'text',
                          'text': 'How do I authenticate with the API?'
                      }
                  ]
              }
          ],
          max_tokens=1024
      )
      ```

      ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
      import OpenAI from "openai";

      const client = new OpenAI({
        apiKey: process.env.ORQ_API_KEY,
        baseURL: "https://api.orq.ai/v3/router",
      });

      const apiDocs = "Your API documentation content goes here...";

      const response = await client.chat.completions.create({
        model: 'anthropic/claude-sonnet-4-6',
        messages: [
          {
            role: 'user',
            content: [
              {
                type: 'text',
                text: 'Here is our API documentation:\n\n' + apiDocs,
                cache_control: { type: 'ephemeral' },
              },
              {
                type: 'text',
                text: 'How do I authenticate with the API?',
              },
            ],
          },
        ],
        max_tokens: 1024,
      });
      ```
    </CodeGroup>
  </Accordion>

  <Accordion title="Multi-turn Conversations">
    Cache conversation history for long interactions to reduce processing time and costs on subsequent messages.

    <CodeGroup>
      ```bash cURL (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl -X POST https://api.orq.ai/v3/router/chat/completions \
        -H "Authorization: Bearer $ORQ_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{
          "model": "anthropic/claude-sonnet-4-6",
          "messages": [
            {
              "role": "user",
              "content": "What is Python?"
            },
            {
              "role": "assistant",
              "content": "Python is a high-level programming language..."
            },
            {
              "role": "user",
              "content": [
                {
                  "type": "text",
                  "text": "What are its main features?",
                  "cache_control": { "type": "ephemeral" }
                }
              ]
            },
            {
              "role": "assistant",
              "content": "Python's main features include..."
            },
            {
              "role": "user",
              "content": "Can you give me a code example?"
            }
          ],
          "max_tokens": 1024
        }'
      ```

      ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
      from openai import OpenAI
      import os

      client = OpenAI(
          api_key=os.environ.get('ORQ_API_KEY'),
          base_url='https://api.orq.ai/v3/router'
      )

      conversation_history = [
          {'role': 'user', 'content': 'What is Python?'},
          {'role': 'assistant', 'content': 'Python is a high-level...'},
          {'role': 'user', 'content': 'What are its main features?'},
          {'role': 'assistant', 'content': 'Python\'s main features include...'},
      ]

      # Mark last history message for caching
      last_message = conversation_history[-1]
      messages = conversation_history[:-1] + [
          {
              'role': last_message['role'],
              'content': [
                  {
                      'type': 'text',
                      'text': last_message['content'],
                      'cache_control': {'type': 'ephemeral'}
                  }
              ]
          },
          {
              'role': 'user',
              'content': 'Can you give me a code example?'
          }
      ]

      response = client.chat.completions.create(
          model='anthropic/claude-sonnet-4-6',
          messages=messages,
          max_tokens=1024
      )
      ```

      ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
      import OpenAI from "openai";

      const client = new OpenAI({
        apiKey: process.env.ORQ_API_KEY,
        baseURL: "https://api.orq.ai/v3/router",
      });

      const conversationHistory = [
        { role: 'user', content: 'What is Python?' },
        { role: 'assistant', content: 'Python is a high-level...' },
        { role: 'user', content: 'What are its main features?' },
        { role: 'assistant', content: "Python's main features include..." },
        // ... more history
      ];

      // Mark last history message for caching
      const lastHistoryMessage = conversationHistory[conversationHistory.length - 1];

      const response = await client.chat.completions.create({
        model: 'anthropic/claude-sonnet-4-6',
        messages: [
          ...conversationHistory.slice(0, -1),
          {
            ...lastHistoryMessage,
            content: [
              {
                type: 'text',
                text: lastHistoryMessage.content,
                cache_control: { type: 'ephemeral' },
              },
            ],
          },
          {
            role: 'user',
            content: 'Can you give me a code example?',
          },
        ],
        max_tokens: 1024,
      });
      ```
    </CodeGroup>
  </Accordion>

  <Accordion title="RAG with Document Collections">
    Cache retrieved documents for multiple queries in retrieval-augmented generation scenarios.

    <CodeGroup>
      ```bash cURL (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl -X POST https://api.orq.ai/v3/router/chat/completions \
        -H "Authorization: Bearer $ORQ_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{
          "model": "anthropic/claude-sonnet-4-6",
          "messages": [
            {
              "role": "system",
              "content": "You are a helpful assistant that answers based on provided context."
            },
            {
              "role": "user",
              "content": [
                {
                  "type": "text",
                  "text": "Context:\n[Retrieved document content here...]",
                  "cache_control": { "type": "ephemeral" }
                },
                {
                  "type": "text",
                  "text": "Question: What is the main topic of these documents?"
                }
              ]
            }
          ],
          "max_tokens": 1024
        }'
      ```

      ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
      from openai import OpenAI
      import os

      client = OpenAI(
          api_key=os.environ.get('ORQ_API_KEY'),
          base_url='https://api.orq.ai/v3/router'
      )

      user_question = "What is the main topic of these documents?"
      context_text = "Retrieved document content goes here..."

      response = client.chat.completions.create(
          model='anthropic/claude-sonnet-4-6',
          messages=[
              {
                  'role': 'system',
                  'content': 'You are a helpful assistant that answers based on provided context.'
              },
              {
                  'role': 'user',
                  'content': [
                      {
                          'type': 'text',
                          'text': f'Context:\n{context_text}',
                          'cache_control': {'type': 'ephemeral'}
                      },
                      {
                          'type': 'text',
                          'text': f'Question: {user_question}'
                      }
                  ]
              }
          ],
          max_tokens=1024
      )
      ```

      ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
      import OpenAI from "openai";

      const client = new OpenAI({
        apiKey: process.env.ORQ_API_KEY,
        baseURL: "https://api.orq.ai/v3/router",
      });

      const userQuestion = "What is the main topic of these documents?";
      const contextText = "Retrieved document content goes here...";

      const response = await client.chat.completions.create({
        model: 'anthropic/claude-sonnet-4-6',
        messages: [
          {
            role: 'system',
            content:
              'You are a helpful assistant that answers based on provided context.',
          },
          {
            role: 'user',
            content: [
              {
                type: 'text',
                text: `Context:\n${contextText}`,
                cache_control: { type: 'ephemeral' },
              },
              {
                type: 'text',
                text: `Question: ${userQuestion}`,
              },
            ],
          },
        ],
        max_tokens: 1024,
      });
      ```
    </CodeGroup>
  </Accordion>
</AccordionGroup>

#### Extended Thinking

Enable deep reasoning for complex problems by allocating token budget for internal analysis before generating responses.

<Note>
  Extended thinking uses the `thinking` parameter, which is only supported via the Chat Completions endpoint (`POST /v3/router/chat/completions`). Use the Chat Completions tabs below.
</Note>

<CodeGroup>
  ```bash cURL (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/chat/completions \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "anthropic/claude-opus-4-6",
      "messages": [
        {
          "role": "user",
          "content": "Design a distributed rate limiting system for 1M requests/second"
        }
      ],
      "thinking": {
        "type": "enabled",
        "budget_tokens": 8000
      },
      "max_tokens": 16000
    }'
  ```

  ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get('ORQ_API_KEY'),
      base_url='https://api.orq.ai/v3/router'
  )

  response = client.chat.completions.create(
      model='anthropic/claude-opus-4-6',
      messages=[
          {
              'role': 'user',
              'content': 'Design a distributed rate limiting system for 1M requests/second'
          }
      ],
      extra_body={
          'thinking': {
              'type': 'enabled',
              'budget_tokens': 8000
          }
      },
      max_tokens=16000
  )
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "anthropic/claude-opus-4-6",
    messages: [
      {
        role: "user",
        content: "Design a distributed rate limiting system for 1M requests/second",
      },
    ],
    thinking: {
      type: "enabled",
      budget_tokens: 8000,
    },
    max_tokens: 16000,
  });
  ```
</CodeGroup>

<Accordion title="Multi-turn Extended Thinking">
  Include reasoning content with its signature when continuing conversations:

  <CodeGroup>
    ```bash cURL (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
    curl -X POST https://api.orq.ai/v3/router/chat/completions \
      -H "Authorization: Bearer $ORQ_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "anthropic/claude-opus-4-6",
        "messages": [
          {"role": "user", "content": "Design a rate limiting system"},
          {
            "role": "assistant",
            "content": [
              {
                "type": "reasoning",
                "reasoning": "...",
                "signature": "..."
              },
              {
                "type": "text",
                "text": "Here'\''s a distributed rate limiting design..."
              }
            ]
          },
          {"role": "user", "content": "How would you handle 10M req/s?"}
        ],
        "thinking": {"type": "enabled", "budget_tokens": 8000},
        "max_tokens": 16000
      }'
    ```

    ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import OpenAI from "openai";

    const client = new OpenAI({
      apiKey: process.env.ORQ_API_KEY,
      baseURL: "https://api.orq.ai/v3/router",
    });

    const messages = [
      { role: "user", content: "Design a rate limiting system" }
    ];

    const response = await client.chat.completions.create({
      model: "anthropic/claude-opus-4-6",
      messages,
      thinking: { type: "enabled", budget_tokens: 8000 },
      max_tokens: 16000
    });

    // Map response to assistant message
    const msg = response.choices[0].message as any;
    const contentParts = [];

    if (msg.reasoning) {
      contentParts.push({
        type: "reasoning",
        reasoning: msg.reasoning,
        signature: msg.reasoning_signature
      });
    }

    if (msg.redacted_reasoning) {
      contentParts.push({
        type: "redacted_reasoning",
        data: msg.redacted_reasoning
      });
    }

    if (msg.content) {
      contentParts.push({
        type: "text",
        text: msg.content
      });
    }

    const assistantMessage = {
      role: "assistant",
      content: contentParts
    };

    messages.push(assistantMessage);
    messages.push({ role: "user", content: "How would you handle 10M req/s?" });

    const followUp = await client.chat.completions.create({
      model: "anthropic/claude-opus-4-6",
      messages,
      thinking: { type: "enabled", budget_tokens: 8000 },
      max_tokens: 16000
    });
    ```

    ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
    from openai import OpenAI
    import os

    client = OpenAI(
        api_key=os.environ.get("ORQ_API_KEY"),
        base_url="https://api.orq.ai/v3/router",
    )

    messages = [
      {"role": "user", "content": "Design a rate limiting system"}
    ]

    response = client.chat.completions.create(
      model='anthropic/claude-opus-4-6',
      messages=messages,
      extra_body={
        'thinking': {
          'type': 'enabled',
          'budget_tokens': 8000
        }
      },
      max_tokens=16000
    )

    msg = response.choices[0].message
    content_parts = []

    if getattr(msg, 'reasoning', None):
      content_parts.append({
        "type": "reasoning",
        "reasoning": msg.reasoning,
        "signature": getattr(msg, 'reasoning_signature', None)
      })

    if getattr(msg, 'redacted_reasoning', None):
      content_parts.append({
        "type": "redacted_reasoning",
        "data": msg.redacted_reasoning
      })

    if msg.content:
      content_parts.append({
        "type": "text",
        "text": msg.content
      })

    assistant_message = {
      "role": "assistant",
      "content": content_parts
    }

    messages.append(assistant_message)
    messages.append({"role": "user", "content": "How would you handle 10M req/s?"})

    follow_up = client.chat.completions.create(
      model='anthropic/claude-opus-4-6',
      messages=messages,
      extra_body={
        'thinking': {
          'type': 'enabled',
          'budget_tokens': 8000
        }
      },
      max_tokens=16000
    )
    ```
  </CodeGroup>

  <Warning>
    **Important**: Always include the `signature` field when passing reasoning content back to the API. The signature cryptographically verifies the reasoning was generated by the model and is required for multi-turn conversations.
  </Warning>
</Accordion>

<Accordion title="Combine with prompt caching for repeated contexts">
  Cache system prompts and context to reduce costs and latency when using extended thinking:

  <CodeGroup>
    ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import OpenAI from "openai";

    const client = new OpenAI({
      apiKey: process.env.ORQ_API_KEY,
      baseURL: "https://api.orq.ai/v3/router",
    });

    const response = await client.chat.completions.create({
      model: "anthropic/claude-opus-4-6",
      messages: [
        {
          role: "system",
          content: [{
            type: "text",
            text: "You are a system architect...", // Cache this
            cache_control: { type: "ephemeral" }
          }]
        },
        { role: "user", content: "Design a notification system" }
      ],
      max_tokens: 16000,
      thinking: { type: "enabled", budget_tokens: 8000 }
    });
    ```

    ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
    from openai import OpenAI
    import os

    client = OpenAI(
        api_key=os.environ.get("ORQ_API_KEY"),
        base_url="https://api.orq.ai/v3/router",
    )

    response = client.chat.completions.create(
        model="anthropic/claude-opus-4-6",
        messages=[
            {
                "role": "system",
                "content": [{
                    "type": "text",
                    "text": "You are a system architect...",
                    "cache_control": {"type": "ephemeral"}
                }]
            },
            {"role": "user", "content": "Design a notification system"}
        ],
        max_tokens=16000,
        extra_body={"thinking": {"type": "enabled", "budget_tokens": 8000}}
    )
    ```
  </CodeGroup>
</Accordion>

**Configuration & Best Practices**

| Aspect                   | Guidance                | Details                                                      |
| ------------------------ | ----------------------- | ------------------------------------------------------------ |
| `thinking.type`          | Set to `"enabled"`      | Enables extended thinking with manual budget                 |
| `thinking.budget_tokens` | Set based on complexity | Min: 1024, must be \< `max_tokens`. Billed as output tokens. |

<Note>
  **Supported Models:** Extended thinking with `budget_tokens` is available on Claude Opus 4.5, Sonnet 4.5, and newer models. For Claude Opus 4.6 and Sonnet 4.6, consider using **adaptive thinking** instead (see below). Available through `anthropic/`, `aws/`, and `google/` providers.
</Note>

<Card title="Reasoning models" icon="brain" href="/docs/ai-studio/ai-gateway/reasoning" horizontal>
  Configure `thinking.budget_tokens` and other extended thinking controls for Claude through the **AI Gateway**.
</Card>

#### Adaptive Thinking

Adaptive thinking is the recommended way to use extended thinking with **Claude Opus 4.6** and **Sonnet 4.6**. Instead of manually setting a thinking token budget, adaptive thinking lets Claude dynamically determine when and how much to think based on the complexity of each request.

<Note>
  Adaptive thinking uses the `thinking` parameter, which is only supported via the Chat Completions endpoint (`POST /v3/router/chat/completions`). Use the Chat Completions tabs below.
</Note>

<CodeGroup>
  ```bash cURL (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/chat/completions \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "anthropic/claude-opus-4-6",
      "messages": [
        {
          "role": "user",
          "content": "Design a distributed rate limiting system for 1M requests/second"
        }
      ],
      "thinking": {
        "type": "adaptive"
      },
      "max_tokens": 16000
    }'
  ```

  ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get('ORQ_API_KEY'),
      base_url='https://api.orq.ai/v3/router'
  )

  response = client.chat.completions.create(
      model='anthropic/claude-opus-4-6',
      messages=[
          {
              'role': 'user',
              'content': 'Design a distributed rate limiting system for 1M requests/second'
          }
      ],
      extra_body={
          'thinking': {
              'type': 'adaptive'
          }
      },
      max_tokens=16000
  )

  print(response.choices[0].message.content)
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "anthropic/claude-opus-4-6",
    messages: [
      {
        role: "user",
        content: "Design a distributed rate limiting system for 1M requests/second",
      },
    ],
    thinking: {
      type: "adaptive",
    },
    max_tokens: 16000,
  });

  console.log(response.choices[0].message.content);
  ```
</CodeGroup>

**Adaptive vs Manual Thinking**

| Mode         | Config                                            | When to use                                                                                        |
| ------------ | ------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| **Adaptive** | `thinking: { type: "adaptive" }`                  | Recommended for Claude 4.6 models. Claude determines thinking depth automatically.                 |
| **Manual**   | `thinking: { type: "enabled", budget_tokens: N }` | When you need precise control over thinking token spend. Supported on all thinking-capable models. |
| **Disabled** | Omit `thinking` parameter                         | When you don't need extended thinking and want the lowest latency.                                 |

<Note>
  **Supported Models:** Adaptive thinking is available on **Claude Opus 4.6** and **Claude Sonnet 4.6** only. Older models (Opus 4.5, Sonnet 4.5, etc.) require `type: "enabled"` with `budget_tokens`.
</Note>

#### Vision Capabilities

All Claude 3+ models support image analysis with high accuracy. Choose between URL-based or base64-encoded images:

<Accordion title="Image from URL">
  Use images from URLs for remote files:

  <CodeGroup>
    ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import OpenAI from "openai";

    const client = new OpenAI({
      apiKey: process.env.ORQ_API_KEY,
      baseURL: "https://api.orq.ai/v3/router",
    });

    const response = await client.chat.completions.create({
      model: "anthropic/claude-sonnet-4-6",
      messages: [
        {
          role: "user",
          content: [
            { type: "text", text: "What's in this image?" },
            {
              type: "image_url",
              image_url: { url: "https://example.com/image.jpg" }
            },
          ],
        },
      ],
    });
    ```

    ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
    from openai import OpenAI
    import os

    client = OpenAI(
        api_key=os.environ.get("ORQ_API_KEY"),
        base_url="https://api.orq.ai/v3/router",
    )

    response = client.chat.completions.create(
        model="anthropic/claude-sonnet-4-6",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "What's in this image?"},
                    {
                        "type": "image_url",
                        "image_url": {"url": "https://example.com/image.jpg"},
                    },
                ],
            }
        ],
    )
    ```
  </CodeGroup>
</Accordion>

<Accordion title="Image from Base64">
  Embed images directly as base64-encoded strings:

  <CodeGroup>
    ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import OpenAI from "openai";

    const client = new OpenAI({
      apiKey: process.env.ORQ_API_KEY,
      baseURL: "https://api.orq.ai/v3/router",
    });

    const imageBase64 = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==";

    const response = await client.chat.completions.create({
      model: "anthropic/claude-sonnet-4-6",
      messages: [
        {
          role: "user",
          content: [
            { type: "text", text: "What's in this image?" },
            {
              type: "image_url",
              image_url: { url: `data:image/png;base64,${imageBase64}` }
            },
          ],
        },
      ],
    });
    ```

    ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
    from openai import OpenAI
    import os
    import base64

    client = OpenAI(
        api_key=os.environ.get("ORQ_API_KEY"),
        base_url="https://api.orq.ai/v3/router",
    )

    image_base64 = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="

    response = client.chat.completions.create(
        model="anthropic/claude-sonnet-4-6",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "What's in this image?"},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/png;base64,{image_base64}"},
                    },
                ],
            }
        ],
    )
    ```
  </CodeGroup>
</Accordion>

#### PDF Input

<Note>
  The examples in this section use the Chat Completions endpoint. For the Responses API equivalent, use `openai.responses.create()` with `POST /v3/router/responses` and adapt the message structure to the Responses API input format.
</Note>

Claude Opus 4.6 supports direct PDF analysis:

<CodeGroup>
  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "anthropic/claude-opus-4-6",
    messages: [
      {
        role: "user",
        content: [
          { type: "text", text: "Summarize this document" },
          {
            type: "document",
            document: {
              type: "pdf",
              url: "https://example.com/document.pdf"
            }
          },
        ],
      },
    ],
    max_tokens: 2048,
  });
  ```
</CodeGroup>

<Card title="Multimodal" icon="image" href="/docs/ai-studio/ai-gateway/multimodal" horizontal>
  Full reference for image input, PDF input, image generation, and audio through the **AI Gateway**.
</Card>

#### Tool Use (Function Calling)

Claude excels at tool use with sophisticated planning and execution.

<CodeGroup>
  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.responses.create({
    model: "anthropic/claude-sonnet-4-6",
    input: "What's the weather in Tokyo?",
    tools: [
      {
        type: "function",
        name: "get_weather",
        description: "Get current weather for a location",
        parameters: {
          type: "object",
          properties: {
            location: { type: "string" },
          },
          required: ["location"],
        },
      },
    ],
  });
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.responses.create(
      model="anthropic/claude-sonnet-4-6",
      input="What's the weather in Tokyo?",
      tools=[
          {
              "type": "function",
              "name": "get_weather",
              "description": "Get current weather for a location",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "location": {"type": "string"},
                  },
                  "required": ["location"],
              },
          }
      ],
  )
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "anthropic/claude-sonnet-4-6",
    messages: [{ role: "user", content: "What's the weather in Tokyo?" }],
    tools: [
      {
        type: "function",
        function: {
          name: "get_weather",
          description: "Get current weather for a location",
          parameters: {
            type: "object",
            properties: {
              location: { type: "string" },
            },
            required: ["location"],
          },
        },
      },
    ],
  });
  ```

  ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.chat.completions.create(
      model="anthropic/claude-sonnet-4-6",
      messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
      tools=[
          {
              "type": "function",
              "function": {
                  "name": "get_weather",
                  "description": "Get current weather for a location",
                  "parameters": {
                      "type": "object",
                      "properties": {
                          "location": {"type": "string"},
                      },
                      "required": ["location"],
                  },
              },
          }
      ],
  )
  ```
</CodeGroup>

<Card title="Tool Calling" icon="wrench" href="/docs/ai-studio/ai-gateway/tool-calling" horizontal>
  Full reference for function tools, `tool_choice`, and streaming with tool calls through the **AI Gateway**.
</Card>

#### Multi-provider strategy

<CodeGroup>
  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.responses.create({
    model: "anthropic/claude-sonnet-4-6",
    input: "...",
    fallbacks: [
      { model: "aws/anthropic/claude-sonnet-4-6" },
      { model: "anthropic/claude-opus-4-6" },
    ],
  });

  console.log(response.output_text);
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.responses.create(
      model="anthropic/claude-sonnet-4-6",
      input="...",
      extra_body={
          "fallbacks": [
              {"model": "aws/anthropic/claude-sonnet-4-6"},
              {"model": "anthropic/claude-opus-4-6"},
          ]
      },
  )

  print(response.output_text)
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "anthropic/claude-sonnet-4-6",
    messages: [{ role: "user", content: "..." }],
    // @ts-ignore -- unknown top-level fields are forwarded as-is by the TypeScript SDK
    fallbacks: [
      { model: "aws/anthropic/claude-sonnet-4-6" },
      { model: "anthropic/claude-opus-4-6" },
    ],
  });

  console.log(response.choices[0].message.content);
  ```

  ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.chat.completions.create(
      model="anthropic/claude-sonnet-4-6",
      messages=[{"role": "user", "content": "..."}],
      extra_body={
          "fallbacks": [
              {"model": "aws/anthropic/claude-sonnet-4-6"},
              {"model": "anthropic/claude-opus-4-6"},
          ]
      },
  )

  print(response.choices[0].message.content)
  ```
</CodeGroup>

### Configuration

#### Model Parameters

| Parameter        | Type      | Description                           | Default |
| ---------------- | --------- | ------------------------------------- | ------- |
| `max_tokens`     | number    | Maximum tokens to generate (required) | -       |
| `temperature`    | number    | Randomness (0-1)                      | 1       |
| `top_p`          | number    | Nucleus sampling (0-1)                | -       |
| `top_k`          | number    | Top-K sampling                        | -       |
| `stop_sequences` | string\[] | Custom stop sequences                 | -       |

**Note**: `max_tokens` is required for Anthropic models. Typical values: 1024 for responses, 4096+ for long content.

<Warning>
  Do not use `temperature` and `top_p` together on newer Anthropic models. Using both parameters simultaneously will result in an API error. Choose one or the other.
</Warning>

#### Token Management

<CodeGroup>
  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  // Set appropriate max_tokens based on task
  const getMaxTokens = (taskType: string) => {
    const limits = {
      chat: 1024,
      summary: 500,
      generation: 4096,
      analysis: 2048,
    };
    return limits[taskType as keyof typeof limits] ?? 1024;
  };
  ```
</CodeGroup>

### Troubleshooting

| Issue                | Problem                                           | Solution                                                                                                                  |
| -------------------- | ------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| Missing `max_tokens` | Anthropic models require `max_tokens` parameter   | Add `max_tokens: 1024` (or appropriate value) to your request                                                             |
| High costs           | Token usage accumulates quickly on large requests | Enable prompt caching for repeated context, use smaller models (Haiku) for simple tasks, monitor and optimize token usage |
| Rate limits          | Anthropic has tiered rate limits based on usage   | Use Orq's automatic retries and fallbacks, or consider AWS/Google providers for higher limits                             |

#### Limitations

* **max\_tokens required**: Unlike OpenAI, must specify maximum output length
* **Rate limits**: Vary by tier and provider
* **Context window**: 200K tokens (may vary by provider)
* **System prompts**: Handled differently than OpenAI (automatically converted by Orq)

### Reference

* [Anthropic Documentation](https://docs.anthropic.com/)
* [Model Pricing](https://www.anthropic.com/pricing)
* [API Reference](https://docs.anthropic.com/en/api/messages)
* [Prompt Engineering Guide](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview)

## Claude Cowork

The **Orq.ai** **AI Gateway** is compatible with Claude Cowork's third-party inference mode. Route Cowork traffic through **Orq.ai** to get EU data residency, provider fallbacks, and cost control without changing the Cowork interface.

<Card title="Claude Cowork" icon="server" href="/docs/ai-studio/integrations/code-assistants/claude-desktop#third-party-inference-cowork" horizontal>
  Set up **Orq.ai** as a Cowork third-party inference gateway.
</Card>
