Skip to main content
Reasoning and thinking models perform internal deliberation before generating a response. Each provider exposes this differently: OpenAI uses reasoning_effort, while Google Gemini and Anthropic use a thinking object. The AI Gateway accepts all three controls and normalizes values to what each model actually supports before forwarding the request.
ProviderControlValues
OpenAI o-seriesreasoning_effortnone, minimal, low, medium, high, xhigh
Google Gemini 3 previewthinking.thinking_levellow, high
Google Gemini 2.5thinking.budget_tokensinteger
Anthropic Claudethinking.budget_tokensinteger
Use Cases
  • Problems requiring multi-step logical deduction (math proofs, code debugging, planning).
  • Complex analysis where a standard model produces shallow or incorrect results.
  • Research tasks where depth of reasoning matters more than response speed.
  • Benchmarking reasoning quality across providers on identical prompts.

Quick Start

The AI Gateway supports three reasoning controls:
  • reasoning object on POST /responses for OpenAI reasoning models.
  • reasoning_effort on POST /chat/completions for OpenAI reasoning models.
  • thinking on POST /chat/completions for Google Gemini and Anthropic extended thinking.
curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/o3-mini",
    "input": "Analyze the logical flaw in this argument.",
    "reasoning": {"effort": "medium"}
  }'

Request Fields

FieldTypeValuesNotes
reasoning_effortstringnone, minimal, low, medium, high, xhighOpenAI-style reasoning control
thinking.typestringenabled, disabledUsed by Google Gemini and Anthropic thinking paths
thinking.budget_tokensnumberintegerBudget-based thinking
thinking.thinking_levelstringlow, highLevel-based thinking for Gemini 3 preview models
Treat thinking.budget_tokens and thinking.thinking_level as mutually exclusive. On the Google path, if thinking_level is present it takes precedence over budget_tokens.

Provider Behavior

OpenAI reasoning models

Use reasoning_effort on POST /chat/completions. Current registry examples:
  • openai/o1.
  • openai/o1-pro.
  • openai/o3-mini.
  • openai/o3.
  • openai/o3-pro. The AI Gateway schema accepts all six enum values, but model support is ultimately model-specific.
The router normalizes reasoning_effort to the nearest value a model supports before forwarding the request. For example, openai/gpt-5.4 does not support xhigh: it maps to high. Models that do support xhigh receive the value as-is.
When reasoning_effort is set, the AI Gateway automatically drops temperature and top_p before forwarding the request. These parameters are incompatible with OpenAI reasoning models and will cause an error if sent directly.

OpenAI

Set up your OpenAI API key and explore all supported models including the o1 and o3 families.

Google Gemini

Use the thinking object. Level-based (thinking_level) examples:
  • google/gemini-3-flash-preview.
  • google/gemini-3-pro-preview. Budget-based (budget_tokens) examples:
  • google/gemini-2.5-flash.
  • google/gemini-2.5-flash-lite.
  • google/gemini-2.5-pro. Router behavior:
  • thinking: { "type": "disabled" } is valid
  • On thinking_enforced models such as google/gemini-2.5-pro, disabling thinking is coerced to a minimum budget of 128
  • On non-enforced Gemini models, disabling thinking becomes a budget of 0
https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/google_ai_studio.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=eac05c3f32c81d329e7645eed547f5c0

Google AI

Set up your Google AI API key and explore Gemini 2.5 and Gemini 3 thinking models.

Anthropic Claude

On POST /chat/completions, Anthropic uses thinking: { type, budget_tokens }. Current registry examples:
  • anthropic/claude-sonnet-4-20250514.
  • anthropic/claude-sonnet-4-5-20250929.
  • anthropic/claude-opus-4-5-20251101. Router behavior:
  • Anthropic chat completions only forward thinking when type is enabled.
  • budget_tokens must be greater than 0 to be forwarded.
  • thinking_level is not used for Anthropic chat completions.
https://mintcdn.com/orqai/d-t0Z04KwFlGVsS1/images/logos/anthropic.svg?fit=max&auto=format&n=d-t0Z04KwFlGVsS1&q=85&s=b097662b141ad6f89f0c8d039ae241dc

Anthropic

Set up your Anthropic API key and explore Claude extended thinking capabilities.

Responses API

POST /responses supports reasoning for OpenAI models only. Use the OpenAI-style reasoning object with effort instead of reasoning_effort. thinking (Anthropic and Google Gemini) is not supported on the /responses endpoint. Use POST /chat/completions for Anthropic and Google reasoning models.
{
  "model": "openai/o3-mini",
  "input": "Solve this step by step.",
  "reasoning": {
    "effort": "medium"
  }
}

Usage and Output

Reasoning token usage is returned under usage.completion_tokens_details.reasoning_tokens.
{
  "usage": {
    "prompt_tokens": 120,
    "completion_tokens": 980,
    "total_tokens": 1100,
    "completion_tokens_details": {
      "reasoning_tokens": 640
    }
  }
}
Do not rely on visible chain-of-thought text being present in every response. The stable contract is the request fields above plus token usage. Provider-specific fields such as reasoning, reasoning_signature, or redacted_reasoning may appear, but they are optional.

Code Examples

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/o3-mini",
    "input": "Solve this step by step: What is 15% of 250?",
    "reasoning": {"effort": "medium"}
  }'

Choosing a Setting

Use reasoning_effort when the model is in the OpenAI o1 or o3 family. Use thinking_level for Gemini 3 preview models. Use budget_tokens for Anthropic and budget-based Gemini models. If you need the current model catalog, use Supported Models.