Reasoning models - Orq.ai Documentation

Reasoning and thinking models perform internal deliberation before generating a response. Each provider exposes this differently: OpenAI uses reasoning_effort, while Google Gemini and Anthropic use a thinking object. The AI Gateway accepts all three controls and normalizes values to what each model actually supports before forwarding the request.

Provider	Control	Values
OpenAI o-series	`reasoning_effort`	`none`, `minimal`, `low`, `medium`, `high`, `xhigh`
Google Gemini 3 preview	`thinking.thinking_level`	`low`, `high`
Google Gemini 2.5	`thinking.budget_tokens`	integer
Anthropic Claude	`thinking.budget_tokens`	integer

Use Cases

Problems requiring multi-step logical deduction (math proofs, code debugging, planning).
Complex analysis where a standard model produces shallow or incorrect results.
Research tasks where depth of reasoning matters more than response speed.
Benchmarking reasoning quality across providers on identical prompts.

Quick Start

The AI Gateway supports three reasoning controls:

reasoning object on POST /responses for OpenAI reasoning models.
reasoning_effort on POST /chat/completions for OpenAI reasoning models.
thinking on POST /chat/completions for Google Gemini and Anthropic extended thinking.

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/o3-mini",
    "input": "Analyze the logical flaw in this argument.",
    "reasoning": {"effort": "medium"}
  }'

Request Fields

Field	Type	Values	Notes
`reasoning_effort`	string	`none`, `minimal`, `low`, `medium`, `high`, `xhigh`	OpenAI-style reasoning control
`thinking.type`	string	`enabled`, `disabled`	Used by Google Gemini and Anthropic thinking paths
`thinking.budget_tokens`	number	integer	Budget-based thinking
`thinking.thinking_level`	string	`low`, `high`	Level-based thinking for Gemini 3 preview models

Treat thinking.budget_tokens and thinking.thinking_level as mutually exclusive. On the Google path, if thinking_level is present it takes precedence over budget_tokens.

Provider Behavior

OpenAI reasoning models

Use reasoning_effort on POST /chat/completions. Current registry examples:

openai/o1.
openai/o1-pro.
openai/o3-mini.
openai/o3.
openai/o3-pro. The AI Gateway schema accepts all six enum values, but model support is ultimately model-specific.

The router normalizes reasoning_effort to the nearest value a model supports before forwarding the request. For example, openai/gpt-5.4 does not support xhigh: it maps to high. Models that do support xhigh receive the value as-is.

When reasoning_effort is set, the AI Gateway automatically drops temperature and top_p before forwarding the request. These parameters are incompatible with OpenAI reasoning models and will cause an error if sent directly.

OpenAI

Set up your OpenAI API key and explore all supported models including the o1 and o3 families.

Google Gemini

Use the thinking object. Level-based (thinking_level) examples:

google/gemini-3-flash-preview.
google/gemini-3-pro-preview. Budget-based (budget_tokens) examples:
google/gemini-2.5-flash.
google/gemini-2.5-flash-lite.
google/gemini-2.5-pro. Router behavior:
thinking: { "type": "disabled" } is valid
On thinking_enforced models such as google/gemini-2.5-pro, disabling thinking is coerced to a minimum budget of 128
On non-enforced Gemini models, disabling thinking becomes a budget of 0

Google AI

Set up your Google AI API key and explore Gemini 2.5 and Gemini 3 thinking models.

Anthropic Claude

On POST /chat/completions, Anthropic uses thinking: { type, budget_tokens }. Current registry examples:

anthropic/claude-sonnet-4-20250514.
anthropic/claude-sonnet-4-5-20250929.
anthropic/claude-opus-4-5-20251101. Router behavior:
Anthropic chat completions only forward thinking when type is enabled.
budget_tokens must be greater than 0 to be forwarded.
thinking_level is not used for Anthropic chat completions.

Anthropic

Set up your Anthropic API key and explore Claude extended thinking capabilities.

Responses API

POST /responses supports reasoning for OpenAI models only. Use the OpenAI-style reasoning object with effort instead of reasoning_effort. thinking (Anthropic and Google Gemini) is not supported on the /responses endpoint. Use POST /chat/completions for Anthropic and Google reasoning models.

{
  "model": "openai/o3-mini",
  "input": "Solve this step by step.",
  "reasoning": {
    "effort": "medium"
  }
}

Usage and Output

Reasoning token usage is returned under usage.completion_tokens_details.reasoning_tokens.

{
  "usage": {
    "prompt_tokens": 120,
    "completion_tokens": 980,
    "total_tokens": 1100,
    "completion_tokens_details": {
      "reasoning_tokens": 640
    }
  }
}

Do not rely on visible chain-of-thought text being present in every response. The stable contract is the request fields above plus token usage. Provider-specific fields such as reasoning, reasoning_signature, or redacted_reasoning may appear, but they are optional.

Code Examples

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/o3-mini",
    "input": "Solve this step by step: What is 15% of 250?",
    "reasoning": {"effort": "medium"}
  }'

Choosing a Setting

Use reasoning_effort when the model is in the OpenAI o1 or o3 family. Use thinking_level for Gemini 3 preview models. Use budget_tokens for Anthropic and budget-based Gemini models. If you need the current model catalog, use Supported Models.

​Quick Start

​Request Fields

​Provider Behavior

​OpenAI reasoning models

OpenAI

​Google Gemini

Google AI

​Anthropic Claude

Anthropic

​Responses API

​Usage and Output

​Code Examples

​Choosing a Setting

Quick Start

Request Fields

Provider Behavior

OpenAI reasoning models

Google Gemini

Anthropic Claude

Responses API

Usage and Output

Code Examples

Choosing a Setting