Skip to main content

Quick Start

Enable step-by-step reasoning for complex problems and analysis.
// OpenAI reasoning models (o1/o3 series)
const response = await openai.chat.completions.create({
  model: "openai/o1-mini",
  messages: [
    {
      role: "user",
      content:
        "Analyze the logical flaws in this argument: 'All birds can fly. Penguins are birds. Therefore, penguins can fly.'",
    },
  ],
  reasoning_effort: "medium", // none, minimal, low, medium, high, xhigh
});

// Gemini 3 models with thinking_level
const response2 = await openai.chat.completions.create({
  model: "google/gemini-3-flash",
  messages: [
    {
      role: "user",
      content: "Plan a 3-day itinerary for Tokyo with a $500 budget",
    },
  ],
  thinking: {
    type: "enabled",
    thinking_level: "high", // low or high (Gemini 3 only)
  },
});

// Other models with budget_tokens
const response3 = await openai.chat.completions.create({
  model: "google/gemini-2.5-pro",
  messages: [
    {
      role: "user",
      content: "Analyze the pros and cons of renewable energy",
    },
  ],
  thinking: {
    type: "enabled",
    budget_tokens: 5000,
  },
});

Configuration by Provider

OpenAI Models (o1 and o3 series)

ParameterTypeValuesDescription
reasoning_effortstringnone, minimal, low, medium, high, xhighDepth of reasoning process
The reasoning_effort parameter primarily affects OpenAI reasoning models. Not all providers support all values - check your specific model’s documentation for supported effort levels.
Models supporting reasoning_effort:
  • openai/o1-preview
  • openai/o1-mini
  • openai/o3-mini
  • openai/o3

Gemini 3 Models

ParameterTypeValuesDescription
thinking.type"enabled"-Enable reasoning capability
thinking.thinking_levelstringlow, highDepth of thinking for Gemini 3 models
The thinking_level parameter is only supported by Gemini 3 models. Gemini 3 uses qualitative thinking levels instead of explicit token budgets.
Models supporting thinking_level:
  • google/gemini-3-flash
  • google/gemini-3-pro

Gemini 2.5 Models

ParameterTypeDescription
thinking.type"enabled"Enable reasoning capability
thinking.budget_tokensnumberMax tokens for reasoning (varies by model, see below)
Budget token ranges by model:
ModelMinMaxDefault
gemini-2.5-pro12832,7688,192
gemini-2.5-flash024,576Dynamic
For Gemini 2.5 Flash, setting budget_tokens to 0 disables thinking, and -1 enables dynamic allocation.

Anthropic Claude Models

ParameterTypeDescription
thinking.type"enabled"Enable extended thinking capability
thinking.budget_tokensnumberMin: 1024, must be less than max_tokens
Models supporting extended thinking:
  • anthropic/claude-opus-4-5-20251101 (recommended)
  • anthropic/claude-sonnet-4-5-20250929
  • anthropic/claude-sonnet-4-20250514
  • Other Claude 4+ models
For Anthropic models, budget_tokens must be at least 1024 and less than the max_tokens parameter. When using interleaved thinking with tools, the budget can exceed max_tokens up to the full context window (200k tokens).

Reasoning Effort Levels

LevelUse CaseProcessing TimeAccuracy
noneDisable reasoning entirelyFastestStandard
minimalVery light reasoning, quick answers~5sBasic
lowSimple calculations, basic logic~10sGood
mediumMulti-step problems, analysis~30sBetter
highComplex reasoning, research tasks~60s+Best
xhighMaximum depth reasoning, exhaustive analysis~120s+Highest
Not all models support all effort levels. For example, some models may only support low, medium, and high. Check your model’s documentation for supported values.

Use Cases

Problem TypeRecommended SettingsExample
Math problemsmedium effort”Calculate compound interest over 10 years”
Logic puzzleshigh effort”Solve this Sudoku puzzle”
Code debuggingmedium effort”Find the bug in this Python function”
Strategic planninghigh effort”Create a business plan for a SaaS startup”
Data analysismedium-high effort”Analyze trends in this sales data”

Code examples

curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/o1-mini",
    "messages": [
      {
        "role": "user",
        "content": "Solve this step-by-step: What is 15% of 250?"
      }
    ],
    "reasoning_effort": "medium"
  }'

Response Structure

OpenAI reasoning models return:
{
  "choices": [
    {
      "message": {
        "content": "Final answer after reasoning",
        "reasoning": "Step-by-step thought process (when available)"
      }
    }
  ],
  "usage": {
    "reasoning_tokens": 1500, // Tokens used for reasoning
    "completion_tokens": 200 // Tokens for final answer
  }
}
Other models with thinking:
{
  "choices": [
    {
      "message": {
        "content": "Final answer with reasoning embedded or separate"
      }
    }
  ],
  "usage": {
    "completion_tokens": 1200 // Includes thinking tokens
  }
}

Best Practices

Effort/budget selection:
const getReasoningConfig = (problemComplexity) => {
  const effortMap = {
    trivial: "none",
    very_simple: "minimal",
    simple: "low",
    moderate: "medium",
    complex: "high",
    very_complex: "xhigh",
  };
  return { reasoning_effort: effortMap[problemComplexity] || "medium" };
};

// For Gemini 3 models with thinking_level
const getGemini3ThinkingConfig = (problemComplexity) => {
  return {
    thinking: {
      type: "enabled",
      thinking_level: problemComplexity === "complex" ? "high" : "low",
    },
  };
};

// For other models with budget_tokens
const getThinkingBudget = (problemComplexity) => {
  const budgets = {
    simple: 2000,
    moderate: 5000,
    complex: 10000,
  };
  return {
    thinking: {
      type: "enabled",
      budget_tokens: budgets[problemComplexity],
    },
  };
};
Prompt engineering for reasoning:
// Effective prompts for reasoning
const reasoningPrompts = {
  math: "Solve step-by-step, showing all work:",
  logic: "Think through this logically, considering all possibilities:",
  analysis: "Analyze systematically, breaking down into components:",
  planning: "Create a detailed plan, considering constraints and requirements:",
};

Performance Considerations

Response times:
  • none effort: Near instant (no reasoning)
  • minimal effort: 2-8 seconds
  • low effort: 5-15 seconds
  • medium effort: 15-45 seconds
  • high effort: 45-120 seconds
  • xhigh effort: 90-180+ seconds
Token usage:
// Monitor reasoning token consumption
const trackReasoningUsage = (response) => {
  const reasoningTokens = response.usage.reasoning_tokens || 0;
  const totalTokens = response.usage.total_tokens;
  const reasoningRatio = reasoningTokens / totalTokens;

  console.log(`Reasoning tokens: ${reasoningTokens}`);
  console.log(`Reasoning ratio: ${(reasoningRatio * 100).toFixed(1)}%`);
};

Troubleshooting

**Slow responses
  • Use lower reasoning effort for time-sensitive applications
  • Consider async processing for complex reasoning tasks
  • Implement timeouts appropriate for reasoning models
**High token usage
  • Monitor reasoning token consumption
  • Adjust budget_tokens for non-OpenAI models
  • Use lower effort levels when appropriate
**Poor reasoning quality
  • Increase reasoning effort/budget for complex problems
  • Improve prompt specificity and clarity
  • Try different reasoning-capable models

Advanced Patterns

Conditional reasoning

const solveWithReasoning = async (problem, complexity) => {
  const isComplex = complexity === "high" || problem.length > 500;

  if (isComplex) {
    return await openai.chat.completions.create({
      model: "openai/o1-preview",
      messages: [{ role: "user", content: problem }],
      reasoning_effort: "high",
    });
  } else {
    return await openai.chat.completions.create({
      model: "openai/gpt-4o",
      messages: [{ role: "user", content: problem }],
    });
  }
};

Progressive reasoning

// Start with low effort, escalate if needed
const progressiveReasoning = async (problem) => {
  const efforts = ["low", "medium", "high"];

  for (const effort of efforts) {
    const response = await openai.chat.completions.create({
      model: "openai/o1-mini",
      messages: [{ role: "user", content: problem }],
      reasoning_effort: effort,
    });

    // Check if solution is satisfactory
    if (await validateSolution(response.choices[0].message.content)) {
      return response;
    }
  }

  throw new Error("Could not solve with available reasoning levels");
};

Reasoning with fallbacks

const reasoningWithFallback = async (problem) => {
  try {
    // Try reasoning model first
    return await openai.chat.completions.create({
      model: "openai/o1-mini",
      messages: [{ role: "user", content: problem }],
      reasoning_effort: "medium",
    });
  } catch (error) {
    // Fallback to regular model with detailed prompt
    return await openai.chat.completions.create({
      model: "openai/gpt-4o",
      messages: [
        {
          role: "user",
          content: `Think step-by-step and solve: ${problem}`,
        },
      ],
    });
  }
};

Limitations

  • Response time: Reasoning adds significant latency to generation
  • Cost: Reasoning tokens are charged at higher rates
  • Model availability: Limited to specific reasoning-capable models
  • Token limits: Reasoning may hit context limits faster
  • Determinism: Reasoning output may vary between requests

Monitoring

Key metrics to track:
const reasoningMetrics = {
  avgReasoningTokens: 0,
  avgResponseTime: 0,
  successRate: 0,
  costPerReasoning: 0,
  effortDistribution: { none: 0, minimal: 0, low: 0, medium: 0, high: 0, xhigh: 0 },
};