Reasoning
This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.
Quick Start
Enable step-by-step reasoning for complex problems and analysis.
// OpenAI reasoning models (o1 series)
const response = await openai.chat.completions.create({
model: "openai/o1-mini",
messages: [
{
role: "user",
content:
"Analyze the logical flaws in this argument: 'All birds can fly. Penguins are birds. Therefore, penguins can fly.'",
},
],
reasoning_effort: "medium",
});
// Other models with thinking
const response2 = await openai.chat.completions.create({
model: "google/gemini-2.5-pro",
messages: [
{
role: "user",
content: "Plan a 3-day itinerary for Tokyo with a $500 budget",
},
],
thinking: {
type: "enabled",
budget_tokens: 5000,
},
});
Configuration by Provider
OpenAI Models (o1 series)
Parameter | Type | Values | Description |
---|---|---|---|
reasoning_effort | string | low , medium , high | Depth of reasoning process |
Models supporting reasoning_effort:
openai/o1-preview
openai/o1-mini
openai/o3-mini
Other Models
Parameter | Type | Description |
---|---|---|
thinking.type | "enabled" | Enable reasoning capability |
thinking.budget_tokens | number | Max tokens for reasoning (1000-10000) |
Models supporting thinking:
google/gemini-2.5-pro
anthropic/claude-3-5-sonnet
- Other compatible models
Reasoning Effort Levels
Level | Use Case | Processing Time | Accuracy |
---|---|---|---|
low | Simple calculations, basic logic | ~10s | Good |
medium | Multi-step problems, analysis | ~30s | Better |
high | Complex reasoning, research tasks | ~60s+ | Best |
Use Cases
Problem Type | Recommended Settings | Example |
---|---|---|
Math problems | medium effort | "Calculate compound interest over 10 years" |
Logic puzzles | high effort | "Solve this Sudoku puzzle" |
Code debugging | medium effort | "Find the bug in this Python function" |
Strategic planning | high effort | "Create a business plan for a SaaS startup" |
Data analysis | medium-high effort | "Analyze trends in this sales data" |
Code examples
curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/o1-mini",
"messages": [
{
"role": "user",
"content": "Solve this step-by-step: What is 15% of 250?"
}
],
"reasoning_effort": "medium"
}'
from openai import OpenAI
import os
openai = OpenAI(
api_key=os.environ.get('ORQ_API_KEY'),
base_url='https://api.orq.ai/v2/proxy'
)
# OpenAI reasoning models
response = openai.chat.completions.create(
model='openai/o1-mini',
messages=[
{
'role': 'user',
'content': 'Solve this step-by-step: What is 15% of 250?'
}
],
reasoning_effort='medium' # low, medium, or high
)
# Other models with thinking
response2 = openai.chat.completions.create(
model='google/gemini-2.5-pro',
messages=[
{
'role': 'user',
'content': 'Analyze the pros and cons of remote work'
}
],
extra_body={
'thinking': {
'type': 'enabled',
'budget_tokens': 5000
}
}
)
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.ORQ_API_KEY,
baseURL: "https://api.orq.ai/v2/proxy",
});
// OpenAI reasoning models
const response = await openai.chat.completions.create({
model: "openai/o1-mini",
messages: [
{
role: "user",
content: "Solve this step-by-step: What is 15% of 250?",
},
],
reasoning_effort: "medium", // low, medium, or high
});
// Other models with thinking
const response2 = await openai.chat.completions.create({
model: "google/gemini-2.5-pro",
messages: [
{
role: "user",
content: "Create a marketing strategy for a new product launch",
},
],
thinking: {
type: "enabled",
budget_tokens: 5000,
},
});
Response Structure
OpenAI reasoning models return:
{
"choices": [
{
"message": {
"content": "Final answer after reasoning",
"reasoning": "Step-by-step thought process (when available)"
}
}
],
"usage": {
"reasoning_tokens": 1500, // Tokens used for reasoning
"completion_tokens": 200 // Tokens for final answer
}
}
Other models with thinking:
{
"choices": [
{
"message": {
"content": "Final answer with reasoning embedded or separate"
}
}
],
"usage": {
"completion_tokens": 1200 // Includes thinking tokens
}
}
Best Practices
Effort/budget selection:
const getReasoningConfig = (problemComplexity) => {
if (problemComplexity === "simple") {
return { reasoning_effort: "low" };
} else if (problemComplexity === "moderate") {
return { reasoning_effort: "medium" };
} else {
return { reasoning_effort: "high" };
}
};
// For non-OpenAI models
const getThinkingBudget = (problemComplexity) => {
const budgets = {
simple: 2000,
moderate: 5000,
complex: 10000,
};
return {
thinking: {
type: "enabled",
budget_tokens: budgets[problemComplexity],
},
};
};
Prompt engineering for reasoning:
// Effective prompts for reasoning
const reasoningPrompts = {
math: "Solve step-by-step, showing all work:",
logic: "Think through this logically, considering all possibilities:",
analysis: "Analyze systematically, breaking down into components:",
planning: "Create a detailed plan, considering constraints and requirements:",
};
Performance Considerations
Response times:
low
effort: 5-15 secondsmedium
effort: 15-45 secondshigh
effort: 45-120 seconds
Token usage:
// Monitor reasoning token consumption
const trackReasoningUsage = (response) => {
const reasoningTokens = response.usage.reasoning_tokens || 0;
const totalTokens = response.usage.total_tokens;
const reasoningRatio = reasoningTokens / totalTokens;
console.log(`Reasoning tokens: ${reasoningTokens}`);
console.log(`Reasoning ratio: ${(reasoningRatio * 100).toFixed(1)}%`);
};
Troubleshooting
Slow responses
- Use lower reasoning effort for time-sensitive applications
- Consider async processing for complex reasoning tasks
- Implement timeouts appropriate for reasoning models
High token usage
- Monitor reasoning token consumption
- Adjust budget_tokens for non-OpenAI models
- Use lower effort levels when appropriate
Poor reasoning quality
- Increase reasoning effort/budget for complex problems
- Improve prompt specificity and clarity
- Try different reasoning-capable models
Advanced Patterns
Conditional reasoning
const solveWithReasoning = async (problem, complexity) => {
const isComplex = complexity === "high" || problem.length > 500;
if (isComplex) {
return await openai.chat.completions.create({
model: "openai/o1-preview",
messages: [{ role: "user", content: problem }],
reasoning_effort: "high",
});
} else {
return await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [{ role: "user", content: problem }],
});
}
};
Progressive reasoning
// Start with low effort, escalate if needed
const progressiveReasoning = async (problem) => {
const efforts = ["low", "medium", "high"];
for (const effort of efforts) {
const response = await openai.chat.completions.create({
model: "openai/o1-mini",
messages: [{ role: "user", content: problem }],
reasoning_effort: effort,
});
// Check if solution is satisfactory
if (await validateSolution(response.choices[0].message.content)) {
return response;
}
}
throw new Error("Could not solve with available reasoning levels");
};
Reasoning with fallbacks
const reasoningWithFallback = async (problem) => {
try {
// Try reasoning model first
return await openai.chat.completions.create({
model: "openai/o1-mini",
messages: [{ role: "user", content: problem }],
reasoning_effort: "medium",
});
} catch (error) {
// Fallback to regular model with detailed prompt
return await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content: `Think step-by-step and solve: ${problem}`,
},
],
});
}
};
Limitations
- Response time: Reasoning adds significant latency to generation
- Cost: Reasoning tokens are charged at higher rates
- Model availability: Limited to specific reasoning-capable models
- Token limits: Reasoning may hit context limits faster
- Determinism: Reasoning output may vary between requests
Monitoring
Key metrics to track:
const reasoningMetrics = {
avgReasoningTokens: 0,
avgResponseTime: 0,
successRate: 0,
costPerReasoning: 0,
effortDistribution: { low: 0, medium: 0, high: 0 },
};
Updated 4 days ago