Reasoning

📖
This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.

Quick Start

Enable step-by-step reasoning for complex problems and analysis.

// OpenAI reasoning models (o1 series)
const response = await openai.chat.completions.create({
  model: "openai/o1-mini",
  messages: [
    {
      role: "user",
      content:
        "Analyze the logical flaws in this argument: 'All birds can fly. Penguins are birds. Therefore, penguins can fly.'",
    },
  ],
  reasoning_effort: "medium",
});

// Other models with thinking
const response2 = await openai.chat.completions.create({
  model: "google/gemini-2.5-pro",
  messages: [
    {
      role: "user",
      content: "Plan a 3-day itinerary for Tokyo with a $500 budget",
    },
  ],
  thinking: {
    type: "enabled",
    budget_tokens: 5000,
  },
});

Configuration by Provider

OpenAI Models (o1 series)

Parameter	Type	Values	Description
`reasoning_effort`	string	`low`, `medium`, `high`	Depth of reasoning process

Models supporting reasoning_effort:

openai/o1-preview
openai/o1-mini
openai/o3-mini

Other Models

Parameter	Type	Description
`thinking.type`	`"enabled"`	Enable reasoning capability
`thinking.budget_tokens`	number	Max tokens for reasoning (1000-10000)

Models supporting thinking:

google/gemini-2.5-pro
anthropic/claude-3-5-sonnet
Other compatible models

Reasoning Effort Levels

Level	Use Case	Processing Time	Accuracy
`low`	Simple calculations, basic logic	~10s	Good
`medium`	Multi-step problems, analysis	~30s	Better
`high`	Complex reasoning, research tasks	~60s+	Best

Use Cases

Problem Type	Recommended Settings	Example
Math problems	`medium` effort	"Calculate compound interest over 10 years"
Logic puzzles	`high` effort	"Solve this Sudoku puzzle"
Code debugging	`medium` effort	"Find the bug in this Python function"
Strategic planning	`high` effort	"Create a business plan for a SaaS startup"
Data analysis	`medium-high` effort	"Analyze trends in this sales data"

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/o1-mini",
    "messages": [
      {
        "role": "user",
        "content": "Solve this step-by-step: What is 15% of 250?"
      }
    ],
    "reasoning_effort": "medium"
  }'

from openai import OpenAI
import os

openai = OpenAI(
    api_key=os.environ.get('ORQ_API_KEY'),
    base_url='https://api.orq.ai/v2/proxy'
)

# OpenAI reasoning models
response = openai.chat.completions.create(
    model='openai/o1-mini',
    messages=[
        {
            'role': 'user',
            'content': 'Solve this step-by-step: What is 15% of 250?'
        }
    ],
    reasoning_effort='medium'  # low, medium, or high
)

# Other models with thinking
response2 = openai.chat.completions.create(
    model='google/gemini-2.5-pro',
    messages=[
        {
            'role': 'user',
            'content': 'Analyze the pros and cons of remote work'
        }
    ],
    extra_body={
        'thinking': {
            'type': 'enabled',
            'budget_tokens': 5000
        }
    }
)

import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v2/proxy",
});

// OpenAI reasoning models
const response = await openai.chat.completions.create({
  model: "openai/o1-mini",
  messages: [
    {
      role: "user",
      content: "Solve this step-by-step: What is 15% of 250?",
    },
  ],
  reasoning_effort: "medium", // low, medium, or high
});

// Other models with thinking
const response2 = await openai.chat.completions.create({
  model: "google/gemini-2.5-pro",
  messages: [
    {
      role: "user",
      content: "Create a marketing strategy for a new product launch",
    },
  ],
  thinking: {
    type: "enabled",
    budget_tokens: 5000,
  },
});

Response Structure

OpenAI reasoning models return:

{
  "choices": [
    {
      "message": {
        "content": "Final answer after reasoning",
        "reasoning": "Step-by-step thought process (when available)"
      }
    }
  ],
  "usage": {
    "reasoning_tokens": 1500, // Tokens used for reasoning
    "completion_tokens": 200 // Tokens for final answer
  }
}

Other models with thinking:

{
  "choices": [
    {
      "message": {
        "content": "Final answer with reasoning embedded or separate"
      }
    }
  ],
  "usage": {
    "completion_tokens": 1200 // Includes thinking tokens
  }
}

Best Practices

Effort/budget selection:

const getReasoningConfig = (problemComplexity) => {
  if (problemComplexity === "simple") {
    return { reasoning_effort: "low" };
  } else if (problemComplexity === "moderate") {
    return { reasoning_effort: "medium" };
  } else {
    return { reasoning_effort: "high" };
  }
};

// For non-OpenAI models
const getThinkingBudget = (problemComplexity) => {
  const budgets = {
    simple: 2000,
    moderate: 5000,
    complex: 10000,
  };
  return {
    thinking: {
      type: "enabled",
      budget_tokens: budgets[problemComplexity],
    },
  };
};

Prompt engineering for reasoning:

// Effective prompts for reasoning
const reasoningPrompts = {
  math: "Solve step-by-step, showing all work:",
  logic: "Think through this logically, considering all possibilities:",
  analysis: "Analyze systematically, breaking down into components:",
  planning: "Create a detailed plan, considering constraints and requirements:",
};

Performance Considerations

Response times:

low effort: 5-15 seconds
medium effort: 15-45 seconds
high effort: 45-120 seconds

Token usage:

// Monitor reasoning token consumption
const trackReasoningUsage = (response) => {
  const reasoningTokens = response.usage.reasoning_tokens || 0;
  const totalTokens = response.usage.total_tokens;
  const reasoningRatio = reasoningTokens / totalTokens;

  console.log(`Reasoning tokens: ${reasoningTokens}`);
  console.log(`Reasoning ratio: ${(reasoningRatio * 100).toFixed(1)}%`);
};

Troubleshooting

Slow responses

Use lower reasoning effort for time-sensitive applications
Consider async processing for complex reasoning tasks
Implement timeouts appropriate for reasoning models

High token usage

Monitor reasoning token consumption
Adjust budget_tokens for non-OpenAI models
Use lower effort levels when appropriate

Poor reasoning quality

Increase reasoning effort/budget for complex problems
Improve prompt specificity and clarity
Try different reasoning-capable models

Advanced Patterns

Conditional reasoning

const solveWithReasoning = async (problem, complexity) => {
  const isComplex = complexity === "high" || problem.length > 500;

  if (isComplex) {
    return await openai.chat.completions.create({
      model: "openai/o1-preview",
      messages: [{ role: "user", content: problem }],
      reasoning_effort: "high",
    });
  } else {
    return await openai.chat.completions.create({
      model: "openai/gpt-4o",
      messages: [{ role: "user", content: problem }],
    });
  }
};

Progressive reasoning

// Start with low effort, escalate if needed
const progressiveReasoning = async (problem) => {
  const efforts = ["low", "medium", "high"];

  for (const effort of efforts) {
    const response = await openai.chat.completions.create({
      model: "openai/o1-mini",
      messages: [{ role: "user", content: problem }],
      reasoning_effort: effort,
    });

    // Check if solution is satisfactory
    if (await validateSolution(response.choices[0].message.content)) {
      return response;
    }
  }

  throw new Error("Could not solve with available reasoning levels");
};

Reasoning with fallbacks

const reasoningWithFallback = async (problem) => {
  try {
    // Try reasoning model first
    return await openai.chat.completions.create({
      model: "openai/o1-mini",
      messages: [{ role: "user", content: problem }],
      reasoning_effort: "medium",
    });
  } catch (error) {
    // Fallback to regular model with detailed prompt
    return await openai.chat.completions.create({
      model: "openai/gpt-4o",
      messages: [
        {
          role: "user",
          content: `Think step-by-step and solve: ${problem}`,
        },
      ],
    });
  }
};

Limitations

Response time: Reasoning adds significant latency to generation
Cost: Reasoning tokens are charged at higher rates
Model availability: Limited to specific reasoning-capable models
Token limits: Reasoning may hit context limits faster
Determinism: Reasoning output may vary between requests

Monitoring

Key metrics to track:

const reasoningMetrics = {
  avgReasoningTokens: 0,
  avgResponseTime: 0,
  successRate: 0,
  costPerReasoning: 0,
  effortDistribution: { low: 0, medium: 0, high: 0 },
};