> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Reasoning models

> Use reasoning and thinking-capable models like o1, o3, and Claude through the AI Router. Configure reasoning effort and token budgets per request.

## Quick Start

The router supports two reasoning controls on `POST /chat/completions`:

* `reasoning_effort` for OpenAI reasoning models
* `thinking` for Google Gemini and Anthropic extended thinking

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  // OpenAI reasoning models
  const response = await openai.chat.completions.create({
    model: 'openai/o3-mini',
    messages: [
      {
        role: 'user',
        content: 'Analyze the logical flaw in this argument.',
      },
    ],
    reasoning_effort: 'medium',
  });

  // Gemini 3 preview models use thinking_level
  const geminiLevel = await openai.chat.completions.create({
    model: 'google/gemini-3-flash-preview',
    messages: [
      {
        role: 'user',
        content: 'Plan a 3-day Tokyo itinerary under $500.',
      },
    ],
    thinking: {
      type: 'enabled',
      thinking_level: 'high',
    },
  });

  // Anthropic and budget-based Gemini models use budget_tokens
  const anthropicThinking = await openai.chat.completions.create({
    model: 'anthropic/claude-sonnet-4-20250514',
    messages: [
      {
        role: 'user',
        content: 'Design a rate limiting strategy for a global API.',
      },
    ],
    thinking: {
      type: 'enabled',
      budget_tokens: 4096,
    },
    max_tokens: 2048,
  });
  ```
</CodeGroup>

<Note>
  The Go gateway mirrors the same contract internally through
  `models.ModelParameters.ReasoningEffort` and
  `models.ModelParameters.Thinking`.
</Note>

## Request Fields

| Field                     | Type   | Values                                              | Notes                                              |
| ------------------------- | ------ | --------------------------------------------------- | -------------------------------------------------- |
| `reasoning_effort`        | string | `none`, `minimal`, `low`, `medium`, `high`, `xhigh` | OpenAI-style reasoning control                     |
| `thinking.type`           | string | `enabled`, `disabled`                               | Used by Google Gemini and Anthropic thinking paths |
| `thinking.budget_tokens`  | number | integer                                             | Budget-based thinking                              |
| `thinking.thinking_level` | string | `low`, `high`                                       | Level-based thinking for Gemini 3 preview models   |

<Warning>
  Treat `thinking.budget_tokens` and `thinking.thinking_level` as mutually
  exclusive. On the Google path, if `thinking_level` is present it takes
  precedence over `budget_tokens`.
</Warning>

## Provider Behavior

### OpenAI reasoning models

Use `reasoning_effort` on `POST /chat/completions`.

Current registry examples:

* `openai/o1`
* `openai/o1-pro`
* `openai/o3-mini`
* `openai/o3`
* `openai/o3-pro`

The router schema accepts all six enum values, but the current `o1` and `o3` entries in the model registry only advertise `low`, `medium`, and `high`. Model support is ultimately model-specific.

<Note>
  When `reasoning_effort` is set, the router automatically drops `temperature`
  and `top_p` before forwarding the request. These parameters are incompatible
  with OpenAI reasoning models and will cause an error if sent directly.
</Note>

<Card title="OpenAI" icon="openai" href="/docs/integrations/providers/openai" horizontal>
  Set up your OpenAI API key and explore all supported models including the o1 and o3 families.
</Card>

### Google Gemini

Use the `thinking` object.

Level-based examples:

* `google/gemini-3-flash-preview`
* `google/gemini-3-pro-preview`

Budget-based examples:

* `google/gemini-2.5-flash`
* `google/gemini-2.5-flash-lite`
* `google/gemini-2.5-pro`

Router behavior:

* `thinking: { "type": "disabled" }` is valid
* On `thinking_enforced` models such as `google/gemini-2.5-pro`, disabling thinking is coerced to a minimum budget of `128`
* On non-enforced Gemini models, disabling thinking becomes a budget of `0`

<Card title="Google AI" icon="https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/google_ai_studio.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=eac05c3f32c81d329e7645eed547f5c0" href="/docs/integrations/providers/google-ai" horizontal width="48" height="48" data-path="images/logos/google_ai_studio.svg">
  Set up your Google AI API key and explore Gemini 2.5 and Gemini 3 thinking models.
</Card>

### Anthropic Claude

On `POST /chat/completions`, Anthropic uses `thinking: { type, budget_tokens }`.

Current registry examples:

* `anthropic/claude-sonnet-4-20250514`
* `anthropic/claude-sonnet-4-5-20250929`
* `anthropic/claude-opus-4-5-20251101`

Router behavior:

* Anthropic chat completions only forward thinking when `type` is `enabled`
* `budget_tokens` must be greater than `0` to be forwarded
* `thinking_level` is not used for Anthropic chat completions

<Card title="Anthropic" icon="https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/anthropic.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=b097662b141ad6f89f0c8d039ae241dc" href="/docs/integrations/providers/anthropic" horizontal width="61" height="43" data-path="images/logos/anthropic.svg">
  Set up your Anthropic API key and explore Claude extended thinking capabilities.
</Card>

## Responses API

If you call `POST /responses`, use the OpenAI-style `reasoning` object instead of `reasoning_effort`.

<CodeGroup>
  ```json JSON theme={"theme":{"light":"github-light","dark":"github-dark"}}
  {
    "model": "openai/o3-mini",
    "input": "Solve this step by step.",
    "reasoning": {
      "effort": "medium"
    }
  }
  ```
</CodeGroup>

## Usage and Output

Reasoning token usage is returned under `usage.completion_tokens_details.reasoning_tokens`.

<CodeGroup>
  ```json JSON theme={"theme":{"light":"github-light","dark":"github-dark"}}
  {
    "usage": {
      "prompt_tokens": 120,
      "completion_tokens": 980,
      "total_tokens": 1100,
      "completion_tokens_details": {
        "reasoning_tokens": 640
      }
    }
  }
  ```
</CodeGroup>

<Warning>
  Do not rely on visible chain-of-thought text being present in every response.
  The stable contract is the request fields above plus token usage.
  Provider-specific fields such as `reasoning`, `reasoning_signature`, or
  `redacted_reasoning` may appear, but they are optional.
</Warning>

## Code Examples

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/chat/completions \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/o3-mini",
      "messages": [
        {
          "role": "user",
          "content": "Solve this step by step: What is 15% of 250?"
        }
      ],
      "reasoning_effort": "medium"
    }'
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.chat.completions.create(
      model="google/gemini-3-flash-preview",
      messages=[
          {
              "role": "user",
              "content": "Compare remote and hybrid work for a 50-person company.",
          }
      ],
      extra_body={
          "thinking": {
              "type": "enabled",
              "thinking_level": "high",
          }
      },
  )
  ```

  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from 'openai';

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: 'https://api.orq.ai/v3/router',
  });

  const response = await client.chat.completions.create({
    model: 'anthropic/claude-opus-4-5-20251101',
    messages: [
      {
        role: 'user',
        content:
          'Break down the tradeoffs of event-driven vs request-response systems.',
      },
    ],
    thinking: {
      type: 'enabled',
      budget_tokens: 8192,
    },
    max_tokens: 2048,
  });
  ```
</CodeGroup>

## Choosing a Setting

Use `reasoning_effort` when the model is in the OpenAI `o1` or `o3` family. Use `thinking_level` for Gemini 3 preview models. Use `budget_tokens` for Anthropic and budget-based Gemini models.

If you need the current model catalog, use [Supported Models](/docs/proxy/supported-models).
