Skip to main content

Quick Start

The router supports two reasoning controls on POST /chat/completions:
  • reasoning_effort for OpenAI reasoning models
  • thinking for Google Gemini and Anthropic extended thinking
// OpenAI reasoning models
const response = await openai.chat.completions.create({
  model: 'openai/o3-mini',
  messages: [
    {
      role: 'user',
      content: 'Analyze the logical flaw in this argument.',
    },
  ],
  reasoning_effort: 'medium',
});

// Gemini 3 preview models use thinking_level
const geminiLevel = await openai.chat.completions.create({
  model: 'google/gemini-3-flash-preview',
  messages: [
    {
      role: 'user',
      content: 'Plan a 3-day Tokyo itinerary under $500.',
    },
  ],
  thinking: {
    type: 'enabled',
    thinking_level: 'high',
  },
});

// Anthropic and budget-based Gemini models use budget_tokens
const anthropicThinking = await openai.chat.completions.create({
  model: 'anthropic/claude-sonnet-4-20250514',
  messages: [
    {
      role: 'user',
      content: 'Design a rate limiting strategy for a global API.',
    },
  ],
  thinking: {
    type: 'enabled',
    budget_tokens: 4096,
  },
  max_tokens: 2048,
});
The Go gateway mirrors the same contract internally through models.ModelParameters.ReasoningEffort and models.ModelParameters.Thinking.

Request Fields

FieldTypeValuesNotes
reasoning_effortstringnone, minimal, low, medium, high, xhighOpenAI-style reasoning control
thinking.typestringenabled, disabledUsed by Google Gemini and Anthropic thinking paths
thinking.budget_tokensnumberintegerBudget-based thinking
thinking.thinking_levelstringlow, highLevel-based thinking for Gemini 3 preview models
Treat thinking.budget_tokens and thinking.thinking_level as mutually exclusive. On the Google path, if thinking_level is present it takes precedence over budget_tokens.

Provider Behavior

OpenAI reasoning models

Use reasoning_effort on POST /chat/completions. Current registry examples:
  • openai/o1
  • openai/o1-pro
  • openai/o3-mini
  • openai/o3
  • openai/o3-pro
The router schema accepts all six enum values, but the current o1 and o3 entries in the model registry only advertise low, medium, and high. Model support is ultimately model-specific.

Google Gemini

Use the thinking object. Level-based examples:
  • google/gemini-3-flash-preview
  • google/gemini-3-pro-preview
Budget-based examples:
  • google/gemini-2.5-flash
  • google/gemini-2.5-flash-lite
  • google/gemini-2.5-pro
Router behavior:
  • thinking: { "type": "disabled" } is valid
  • On thinking_enforced models such as google/gemini-2.5-pro, disabling thinking is coerced to a minimum budget of 128
  • On non-enforced Gemini models, disabling thinking becomes a budget of 0

Anthropic Claude

On POST /chat/completions, Anthropic uses thinking: { type, budget_tokens }. Current registry examples:
  • anthropic/claude-sonnet-4-20250514
  • anthropic/claude-sonnet-4-5-20250929
  • anthropic/claude-opus-4-5-20251101
Router behavior:
  • Anthropic chat completions only forward thinking when type is enabled
  • budget_tokens must be greater than 0 to be forwarded
  • thinking_level is not used for Anthropic chat completions

Responses API

If you call POST /responses, use the OpenAI-style reasoning object instead of reasoning_effort.
{
  "model": "openai/o3-mini",
  "input": "Solve this step by step.",
  "reasoning": {
    "effort": "medium"
  }
}

Usage and Output

Reasoning token usage is returned under usage.completion_tokens_details.reasoning_tokens.
{
  "usage": {
    "prompt_tokens": 120,
    "completion_tokens": 980,
    "total_tokens": 1100,
    "completion_tokens_details": {
      "reasoning_tokens": 640
    }
  }
}
Do not rely on visible chain-of-thought text being present in every response. The stable contract is the request fields above plus token usage. Provider-specific fields such as reasoning, reasoning_signature, or redacted_reasoning may appear, but they are optional.

Code Examples

curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/o3-mini",
    "messages": [
      {
        "role": "user",
        "content": "Solve this step by step: What is 15% of 250?"
      }
    ],
    "reasoning_effort": "medium"
  }'

Choosing a Setting

Use reasoning_effort when the model is in the OpenAI o1 or o3 family. Use thinking_level for Gemini 3 preview models. Use budget_tokens for Anthropic and budget-based Gemini models. If you need the current model catalog, use Supported Models.