> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Reasoning models

> Use reasoning and thinking-capable models like o1, o3, and Claude through the AI Gateway. Configure reasoning effort and token budgets per request.

Reasoning and thinking models perform internal deliberation before generating a response. Each provider exposes this differently: OpenAI uses `reasoning_effort`, while Google Gemini and Anthropic use a `thinking` object. The **AI Gateway** accepts all three controls and normalizes values to what each model actually supports before forwarding the request.

| Provider                | Control                   | Values                                              |
| ----------------------- | ------------------------- | --------------------------------------------------- |
| OpenAI o-series         | `reasoning_effort`        | `none`, `minimal`, `low`, `medium`, `high`, `xhigh` |
| Google Gemini 3 preview | `thinking.thinking_level` | `low`, `high`                                       |
| Google Gemini 2.5       | `thinking.budget_tokens`  | integer                                             |
| Anthropic Claude        | `thinking.budget_tokens`  | integer                                             |

**Use Cases**

* Problems requiring multi-step logical deduction (math proofs, code debugging, planning).
* Complex analysis where a standard model produces shallow or incorrect results.
* Research tasks where depth of reasoning matters more than response speed.
* Benchmarking reasoning quality across providers on identical prompts.

***

## Quick Start

The router supports three reasoning controls:

* `reasoning` object on `POST /responses` for OpenAI reasoning models.
* `reasoning_effort` on `POST /chat/completions` for OpenAI reasoning models.
* `thinking` on `POST /chat/completions` for Google Gemini and Anthropic extended thinking.

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/responses \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/o3-mini",
      "input": "Analyze the logical flaw in this argument.",
      "reasoning": {"effort": "medium"}
    }'
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from 'openai';

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: 'https://api.orq.ai/v3/router',
  });

  const response = await client.responses.create({
    model: 'openai/o3-mini',
    input: 'Analyze the logical flaw in this argument.',
    reasoning: { effort: 'medium' },
  });

  console.log(response.output_text);
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.responses.create(
      model="openai/o3-mini",
      input="Analyze the logical flaw in this argument.",
      extra_body={"reasoning": {"effort": "medium"}},
  )

  print(response.output_text)
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from 'openai';

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: 'https://api.orq.ai/v3/router',
  });

  const response = await client.chat.completions.create({
    model: 'openai/o3-mini',
    messages: [{ role: 'user', content: 'Analyze the logical flaw in this argument.' }],
    reasoning_effort: 'medium',
  });
  console.log(response.choices[0].message.content ?? "");

  const geminiLevel = await client.chat.completions.create({
    model: 'google/gemini-2.5-pro',
    messages: [{ role: 'user', content: 'Plan a 3-day Tokyo itinerary under $500.' }],
    thinking: { type: 'enabled', budget_tokens: 4096 },
  });
  console.log(geminiLevel.choices[0].message.content ?? "");

  const anthropicThinking = await client.chat.completions.create({
    model: 'anthropic/claude-sonnet-4-6',
    messages: [{ role: 'user', content: 'Design a rate limiting strategy for a global API.' }],
    thinking: { type: 'enabled', budget_tokens: 4096 },
    max_tokens: 8000,
  });
  console.log(anthropicThinking.choices[0].message.content ?? "");
  ```
</CodeGroup>

<Note>
  The Go gateway mirrors the same contract internally through
  `models.ModelParameters.ReasoningEffort` and
  `models.ModelParameters.Thinking`.
</Note>

## Request Fields

| Field                     | Type   | Values                                              | Notes                                              |
| ------------------------- | ------ | --------------------------------------------------- | -------------------------------------------------- |
| `reasoning_effort`        | string | `none`, `minimal`, `low`, `medium`, `high`, `xhigh` | OpenAI-style reasoning control                     |
| `thinking.type`           | string | `enabled`, `disabled`                               | Used by Google Gemini and Anthropic thinking paths |
| `thinking.budget_tokens`  | number | integer                                             | Budget-based thinking                              |
| `thinking.thinking_level` | string | `low`, `high`                                       | Level-based thinking for Gemini 3 preview models   |

<Warning>
  Treat `thinking.budget_tokens` and `thinking.thinking_level` as mutually
  exclusive. On the Google path, if `thinking_level` is present it takes
  precedence over `budget_tokens`.
</Warning>

## Provider Behavior

### OpenAI reasoning models

Use `reasoning_effort` on `POST /chat/completions`.

Current registry examples:

* `openai/o1`.
* `openai/o1-pro`.
* `openai/o3-mini`.
* `openai/o3`.
* `openai/o3-pro`.
  The router schema accepts all six enum values, but model support is ultimately model-specific.

<Note>
  The router normalizes `reasoning_effort` to the nearest value a model supports before forwarding the request. For example, `openai/gpt-5.4` does not support `xhigh`: the router maps it to `high`. Models that do support `xhigh` receive the value as-is.
</Note>

<Note>
  When `reasoning_effort` is set, the router automatically drops `temperature`
  and `top_p` before forwarding the request. These parameters are incompatible
  with OpenAI reasoning models and will cause an error if sent directly.
</Note>

<Card title="OpenAI" icon="openai" href="/docs/ai-studio/integrations/providers/openai" horizontal>
  Set up your OpenAI API key and explore all supported models including the o1 and o3 families.
</Card>

### Google Gemini

Use the `thinking` object.

Level-based (`thinking_level`) examples:

* `google/gemini-3-flash-preview`.

* `google/gemini-3-pro-preview`.
  Budget-based (`budget_tokens`) examples:

* `google/gemini-2.5-flash`.

* `google/gemini-2.5-flash-lite`.

* `google/gemini-2.5-pro`.
  Router behavior:

* `thinking: { "type": "disabled" }` is valid

* On `thinking_enforced` models such as `google/gemini-2.5-pro`, disabling thinking is coerced to a minimum budget of `128`

* On non-enforced Gemini models, disabling thinking becomes a budget of `0`

<Card title="Google AI" icon="https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/google_ai_studio.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=eac05c3f32c81d329e7645eed547f5c0" href="/docs/ai-studio/integrations/providers/google-ai" horizontal width="48" height="48" data-path="images/logos/google_ai_studio.svg">
  Set up your Google AI API key and explore Gemini 2.5 and Gemini 3 thinking models.
</Card>

### Anthropic Claude

On `POST /chat/completions`, Anthropic uses `thinking: { type, budget_tokens }`.

Current registry examples:

* `anthropic/claude-sonnet-4-20250514`.

* `anthropic/claude-sonnet-4-5-20250929`.

* `anthropic/claude-opus-4-5-20251101`.
  Router behavior:

* Anthropic chat completions only forward thinking when `type` is `enabled`.

* `budget_tokens` must be greater than `0` to be forwarded.

* `thinking_level` is not used for Anthropic chat completions.

<Card title="Anthropic" icon="https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/anthropic.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=b097662b141ad6f89f0c8d039ae241dc" href="/docs/ai-studio/integrations/providers/anthropic" horizontal width="61" height="43" data-path="images/logos/anthropic.svg">
  Set up your Anthropic API key and explore Claude extended thinking capabilities.
</Card>

## Responses API

`POST /responses` supports reasoning for **OpenAI models only**. Use the OpenAI-style `reasoning` object with `effort` instead of `reasoning_effort`.

`thinking` (Anthropic and Google Gemini) is not supported on the `/responses` endpoint. Use `POST /chat/completions` for Anthropic and Google reasoning models.

<CodeGroup>
  ```json JSON theme={"theme":{"light":"github-light","dark":"github-dark"}}
  {
    "model": "openai/o3-mini",
    "input": "Solve this step by step.",
    "reasoning": {
      "effort": "medium"
    }
  }
  ```
</CodeGroup>

## Usage and Output

Reasoning token usage is returned under `usage.completion_tokens_details.reasoning_tokens`.

<CodeGroup>
  ```json JSON theme={"theme":{"light":"github-light","dark":"github-dark"}}
  {
    "usage": {
      "prompt_tokens": 120,
      "completion_tokens": 980,
      "total_tokens": 1100,
      "completion_tokens_details": {
        "reasoning_tokens": 640
      }
    }
  }
  ```
</CodeGroup>

<Warning>
  Do not rely on visible chain-of-thought text being present in every response.
  The stable contract is the request fields above plus token usage.
  Provider-specific fields such as `reasoning`, `reasoning_signature`, or
  `redacted_reasoning` may appear, but they are optional.
</Warning>

## Code Examples

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/responses \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/o3-mini",
      "input": "Solve this step by step: What is 15% of 250?",
      "reasoning": {"effort": "medium"}
    }'
  ```

  ```bash cURL (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/chat/completions \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "anthropic/claude-opus-4-5-20251101",
      "messages": [
        {
          "role": "user",
          "content": "Break down the tradeoffs of event-driven vs request-response systems."
        }
      ],
      "thinking": { "type": "enabled", "budget_tokens": 8192 },
      "max_tokens": 16000
    }'
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from 'openai';

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: 'https://api.orq.ai/v3/router',
  });

  const response = await client.responses.create({
    model: 'openai/o3-mini',
    input: 'Solve this step by step: What is 15% of 250?',
    reasoning: { effort: 'medium' },
  });

  console.log(response.output_text);
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.responses.create(
      model="openai/o3-mini",
      input="Solve this step by step: What is 15% of 250?",
      extra_body={"reasoning": {"effort": "medium"}},
  )

  print(response.output_text)
  ```

  ```typescript TypeScript (Chat Completions: Anthropic thinking) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from 'openai';

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: 'https://api.orq.ai/v3/router',
  });

  const response = await client.chat.completions.create({
    model: 'anthropic/claude-opus-4-5-20251101',
    messages: [
      {
        role: 'user',
        content: 'Break down the tradeoffs of event-driven vs request-response systems.',
      },
    ],
    thinking: { type: 'enabled', budget_tokens: 8192 },
    max_tokens: 16000,
  });

  console.log(response.choices[0].message.content ?? "");
  ```

  ```python Python (Chat Completions: Anthropic thinking) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.chat.completions.create(
      model="anthropic/claude-opus-4-5-20251101",
      messages=[
          {
              "role": "user",
              "content": "Break down the tradeoffs of event-driven vs request-response systems.",
          }
      ],
      max_tokens=16000,
      extra_body={
          "thinking": {
              "type": "enabled",
              "budget_tokens": 8192,
          }
      },
  )

  print(response.choices[0].message.content or "")
  ```
</CodeGroup>

## Choosing a Setting

Use `reasoning_effort` when the model is in the OpenAI `o1` or `o3` family. Use `thinking_level` for Gemini 3 preview models. Use `budget_tokens` for Anthropic and budget-based Gemini models.

If you need the current model catalog, use [Supported Models](/docs/ai-studio/ai-gateway/supported-models).
