Quick Start
Enable step-by-step reasoning for complex problems and analysis.
// OpenAI reasoning models (o1/o3 series)
const response = await openai.chat.completions.create({
model: "openai/o1-mini",
messages: [
{
role: "user",
content:
"Analyze the logical flaws in this argument: 'All birds can fly. Penguins are birds. Therefore, penguins can fly.'",
},
],
reasoning_effort: "medium", // none, minimal, low, medium, high, xhigh
});
// Gemini 3 models with thinking_level
const response2 = await openai.chat.completions.create({
model: "google/gemini-3-flash",
messages: [
{
role: "user",
content: "Plan a 3-day itinerary for Tokyo with a $500 budget",
},
],
thinking: {
type: "enabled",
thinking_level: "high", // low or high (Gemini 3 only)
},
});
// Other models with budget_tokens
const response3 = await openai.chat.completions.create({
model: "google/gemini-2.5-pro",
messages: [
{
role: "user",
content: "Analyze the pros and cons of renewable energy",
},
],
thinking: {
type: "enabled",
budget_tokens: 5000,
},
});
Configuration by Provider
OpenAI Models (o1 and o3 series)
| Parameter | Type | Values | Description |
|---|
reasoning_effort | string | none, minimal, low, medium, high, xhigh | Depth of reasoning process |
The reasoning_effort parameter primarily affects OpenAI reasoning models. Not all providers support all values - check your specific model’s documentation for supported effort levels.
Models supporting reasoning_effort:
openai/o1-preview
openai/o1-mini
openai/o3-mini
openai/o3
Gemini 3 Models
| Parameter | Type | Values | Description |
|---|
thinking.type | "enabled" | - | Enable reasoning capability |
thinking.thinking_level | string | low, high | Depth of thinking for Gemini 3 models |
The thinking_level parameter is only supported by Gemini 3 models. Gemini 3 uses qualitative thinking levels instead of explicit token budgets.
Models supporting thinking_level:
google/gemini-3-flash
google/gemini-3-pro
Gemini 2.5 Models
| Parameter | Type | Description |
|---|
thinking.type | "enabled" | Enable reasoning capability |
thinking.budget_tokens | number | Max tokens for reasoning (varies by model, see below) |
Budget token ranges by model:
| Model | Min | Max | Default |
|---|
gemini-2.5-pro | 128 | 32,768 | 8,192 |
gemini-2.5-flash | 0 | 24,576 | Dynamic |
For Gemini 2.5 Flash, setting budget_tokens to 0 disables thinking, and -1 enables dynamic allocation.
Anthropic Claude Models
| Parameter | Type | Description |
|---|
thinking.type | "enabled" | Enable extended thinking capability |
thinking.budget_tokens | number | Min: 1024, must be less than max_tokens |
Models supporting extended thinking:
anthropic/claude-opus-4-5-20251101 (recommended)
anthropic/claude-sonnet-4-5-20250929
anthropic/claude-sonnet-4-20250514
- Other Claude 4+ models
For Anthropic models, budget_tokens must be at least 1024 and less than the max_tokens parameter. When using interleaved thinking with tools, the budget can exceed max_tokens up to the full context window (200k tokens).
Reasoning Effort Levels
| Level | Use Case | Processing Time | Accuracy |
|---|
none | Disable reasoning entirely | Fastest | Standard |
minimal | Very light reasoning, quick answers | ~5s | Basic |
low | Simple calculations, basic logic | ~10s | Good |
medium | Multi-step problems, analysis | ~30s | Better |
high | Complex reasoning, research tasks | ~60s+ | Best |
xhigh | Maximum depth reasoning, exhaustive analysis | ~120s+ | Highest |
Not all models support all effort levels. For example, some models may only support low, medium, and high. Check your model’s documentation for supported values.
Use Cases
| Problem Type | Recommended Settings | Example |
|---|
| Math problems | medium effort | ”Calculate compound interest over 10 years” |
| Logic puzzles | high effort | ”Solve this Sudoku puzzle” |
| Code debugging | medium effort | ”Find the bug in this Python function” |
| Strategic planning | high effort | ”Create a business plan for a SaaS startup” |
| Data analysis | medium-high effort | ”Analyze trends in this sales data” |
Code examples
curl -X POST https://api.orq.ai/v2/router/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/o1-mini",
"messages": [
{
"role": "user",
"content": "Solve this step-by-step: What is 15% of 250?"
}
],
"reasoning_effort": "medium"
}'
Response Structure
OpenAI reasoning models return:
{
"choices": [
{
"message": {
"content": "Final answer after reasoning",
"reasoning": "Step-by-step thought process (when available)"
}
}
],
"usage": {
"reasoning_tokens": 1500, // Tokens used for reasoning
"completion_tokens": 200 // Tokens for final answer
}
}
Other models with thinking:
{
"choices": [
{
"message": {
"content": "Final answer with reasoning embedded or separate"
}
}
],
"usage": {
"completion_tokens": 1200 // Includes thinking tokens
}
}
Best Practices
Effort/budget selection:
const getReasoningConfig = (problemComplexity) => {
const effortMap = {
trivial: "none",
very_simple: "minimal",
simple: "low",
moderate: "medium",
complex: "high",
very_complex: "xhigh",
};
return { reasoning_effort: effortMap[problemComplexity] || "medium" };
};
// For Gemini 3 models with thinking_level
const getGemini3ThinkingConfig = (problemComplexity) => {
return {
thinking: {
type: "enabled",
thinking_level: problemComplexity === "complex" ? "high" : "low",
},
};
};
// For other models with budget_tokens
const getThinkingBudget = (problemComplexity) => {
const budgets = {
simple: 2000,
moderate: 5000,
complex: 10000,
};
return {
thinking: {
type: "enabled",
budget_tokens: budgets[problemComplexity],
},
};
};
Prompt engineering for reasoning:
// Effective prompts for reasoning
const reasoningPrompts = {
math: "Solve step-by-step, showing all work:",
logic: "Think through this logically, considering all possibilities:",
analysis: "Analyze systematically, breaking down into components:",
planning: "Create a detailed plan, considering constraints and requirements:",
};
Response times:
none effort: Near instant (no reasoning)
minimal effort: 2-8 seconds
low effort: 5-15 seconds
medium effort: 15-45 seconds
high effort: 45-120 seconds
xhigh effort: 90-180+ seconds
Token usage:
// Monitor reasoning token consumption
const trackReasoningUsage = (response) => {
const reasoningTokens = response.usage.reasoning_tokens || 0;
const totalTokens = response.usage.total_tokens;
const reasoningRatio = reasoningTokens / totalTokens;
console.log(`Reasoning tokens: ${reasoningTokens}`);
console.log(`Reasoning ratio: ${(reasoningRatio * 100).toFixed(1)}%`);
};
Troubleshooting
**Slow responses
- Use lower reasoning effort for time-sensitive applications
- Consider async processing for complex reasoning tasks
- Implement timeouts appropriate for reasoning models
**High token usage
- Monitor reasoning token consumption
- Adjust budget_tokens for non-OpenAI models
- Use lower effort levels when appropriate
**Poor reasoning quality
- Increase reasoning effort/budget for complex problems
- Improve prompt specificity and clarity
- Try different reasoning-capable models
Advanced Patterns
Conditional reasoning
const solveWithReasoning = async (problem, complexity) => {
const isComplex = complexity === "high" || problem.length > 500;
if (isComplex) {
return await openai.chat.completions.create({
model: "openai/o1-preview",
messages: [{ role: "user", content: problem }],
reasoning_effort: "high",
});
} else {
return await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [{ role: "user", content: problem }],
});
}
};
Progressive reasoning
// Start with low effort, escalate if needed
const progressiveReasoning = async (problem) => {
const efforts = ["low", "medium", "high"];
for (const effort of efforts) {
const response = await openai.chat.completions.create({
model: "openai/o1-mini",
messages: [{ role: "user", content: problem }],
reasoning_effort: effort,
});
// Check if solution is satisfactory
if (await validateSolution(response.choices[0].message.content)) {
return response;
}
}
throw new Error("Could not solve with available reasoning levels");
};
Reasoning with fallbacks
const reasoningWithFallback = async (problem) => {
try {
// Try reasoning model first
return await openai.chat.completions.create({
model: "openai/o1-mini",
messages: [{ role: "user", content: problem }],
reasoning_effort: "medium",
});
} catch (error) {
// Fallback to regular model with detailed prompt
return await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content: `Think step-by-step and solve: ${problem}`,
},
],
});
}
};
Limitations
- Response time: Reasoning adds significant latency to generation
- Cost: Reasoning tokens are charged at higher rates
- Model availability: Limited to specific reasoning-capable models
- Token limits: Reasoning may hit context limits faster
- Determinism: Reasoning output may vary between requests
Monitoring
Key metrics to track:
const reasoningMetrics = {
avgReasoningTokens: 0,
avgResponseTime: 0,
successRate: 0,
costPerReasoning: 0,
effortDistribution: { none: 0, minimal: 0, low: 0, medium: 0, high: 0, xhigh: 0 },
};