Reasoning

📖

This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.

Quick Start

Enable step-by-step reasoning for complex problems and analysis.

// OpenAI reasoning models (o1 series)
const response = await openai.chat.completions.create({
  model: "openai/o1-mini",
  messages: [
    {
      role: "user",
      content:
        "Analyze the logical flaws in this argument: 'All birds can fly. Penguins are birds. Therefore, penguins can fly.'",
    },
  ],
  reasoning_effort: "medium",
});

// Other models with thinking
const response2 = await openai.chat.completions.create({
  model: "google/gemini-2.5-pro",
  messages: [
    {
      role: "user",
      content: "Plan a 3-day itinerary for Tokyo with a $500 budget",
    },
  ],
  thinking: {
    type: "enabled",
    budget_tokens: 5000,
  },
});

Configuration by Provider

OpenAI Models (o1 series)

ParameterTypeValuesDescription
reasoning_effortstringlow, medium, highDepth of reasoning process

Models supporting reasoning_effort:

  • openai/o1-preview
  • openai/o1-mini
  • openai/o3-mini

Other Models

ParameterTypeDescription
thinking.type"enabled"Enable reasoning capability
thinking.budget_tokensnumberMax tokens for reasoning (1000-10000)

Models supporting thinking:

  • google/gemini-2.5-pro
  • anthropic/claude-3-5-sonnet
  • Other compatible models

Reasoning Effort Levels

LevelUse CaseProcessing TimeAccuracy
lowSimple calculations, basic logic~10sGood
mediumMulti-step problems, analysis~30sBetter
highComplex reasoning, research tasks~60s+Best

Use Cases

Problem TypeRecommended SettingsExample
Math problemsmedium effort"Calculate compound interest over 10 years"
Logic puzzleshigh effort"Solve this Sudoku puzzle"
Code debuggingmedium effort"Find the bug in this Python function"
Strategic planninghigh effort"Create a business plan for a SaaS startup"
Data analysismedium-high effort"Analyze trends in this sales data"

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/o1-mini",
    "messages": [
      {
        "role": "user",
        "content": "Solve this step-by-step: What is 15% of 250?"
      }
    ],
    "reasoning_effort": "medium"
  }'
from openai import OpenAI
import os

openai = OpenAI(
    api_key=os.environ.get('ORQ_API_KEY'),
    base_url='https://api.orq.ai/v2/proxy'
)

# OpenAI reasoning models
response = openai.chat.completions.create(
    model='openai/o1-mini',
    messages=[
        {
            'role': 'user',
            'content': 'Solve this step-by-step: What is 15% of 250?'
        }
    ],
    reasoning_effort='medium'  # low, medium, or high
)

# Other models with thinking
response2 = openai.chat.completions.create(
    model='google/gemini-2.5-pro',
    messages=[
        {
            'role': 'user',
            'content': 'Analyze the pros and cons of remote work'
        }
    ],
    extra_body={
        'thinking': {
            'type': 'enabled',
            'budget_tokens': 5000
        }
    }
)
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v2/proxy",
});

// OpenAI reasoning models
const response = await openai.chat.completions.create({
  model: "openai/o1-mini",
  messages: [
    {
      role: "user",
      content: "Solve this step-by-step: What is 15% of 250?",
    },
  ],
  reasoning_effort: "medium", // low, medium, or high
});

// Other models with thinking
const response2 = await openai.chat.completions.create({
  model: "google/gemini-2.5-pro",
  messages: [
    {
      role: "user",
      content: "Create a marketing strategy for a new product launch",
    },
  ],
  thinking: {
    type: "enabled",
    budget_tokens: 5000,
  },
});

Response Structure

OpenAI reasoning models return:

{
  "choices": [
    {
      "message": {
        "content": "Final answer after reasoning",
        "reasoning": "Step-by-step thought process (when available)"
      }
    }
  ],
  "usage": {
    "reasoning_tokens": 1500, // Tokens used for reasoning
    "completion_tokens": 200 // Tokens for final answer
  }
}

Other models with thinking:

{
  "choices": [
    {
      "message": {
        "content": "Final answer with reasoning embedded or separate"
      }
    }
  ],
  "usage": {
    "completion_tokens": 1200 // Includes thinking tokens
  }
}

Best Practices

Effort/budget selection:

const getReasoningConfig = (problemComplexity) => {
  if (problemComplexity === "simple") {
    return { reasoning_effort: "low" };
  } else if (problemComplexity === "moderate") {
    return { reasoning_effort: "medium" };
  } else {
    return { reasoning_effort: "high" };
  }
};

// For non-OpenAI models
const getThinkingBudget = (problemComplexity) => {
  const budgets = {
    simple: 2000,
    moderate: 5000,
    complex: 10000,
  };
  return {
    thinking: {
      type: "enabled",
      budget_tokens: budgets[problemComplexity],
    },
  };
};

Prompt engineering for reasoning:

// Effective prompts for reasoning
const reasoningPrompts = {
  math: "Solve step-by-step, showing all work:",
  logic: "Think through this logically, considering all possibilities:",
  analysis: "Analyze systematically, breaking down into components:",
  planning: "Create a detailed plan, considering constraints and requirements:",
};

Performance Considerations

Response times:

  • low effort: 5-15 seconds
  • medium effort: 15-45 seconds
  • high effort: 45-120 seconds

Token usage:

// Monitor reasoning token consumption
const trackReasoningUsage = (response) => {
  const reasoningTokens = response.usage.reasoning_tokens || 0;
  const totalTokens = response.usage.total_tokens;
  const reasoningRatio = reasoningTokens / totalTokens;

  console.log(`Reasoning tokens: ${reasoningTokens}`);
  console.log(`Reasoning ratio: ${(reasoningRatio * 100).toFixed(1)}%`);
};

Troubleshooting

Slow responses
  • Use lower reasoning effort for time-sensitive applications
  • Consider async processing for complex reasoning tasks
  • Implement timeouts appropriate for reasoning models
High token usage
  • Monitor reasoning token consumption
  • Adjust budget_tokens for non-OpenAI models
  • Use lower effort levels when appropriate
Poor reasoning quality
  • Increase reasoning effort/budget for complex problems
  • Improve prompt specificity and clarity
  • Try different reasoning-capable models

Advanced Patterns

Conditional reasoning

const solveWithReasoning = async (problem, complexity) => {
  const isComplex = complexity === "high" || problem.length > 500;

  if (isComplex) {
    return await openai.chat.completions.create({
      model: "openai/o1-preview",
      messages: [{ role: "user", content: problem }],
      reasoning_effort: "high",
    });
  } else {
    return await openai.chat.completions.create({
      model: "openai/gpt-4o",
      messages: [{ role: "user", content: problem }],
    });
  }
};

Progressive reasoning

// Start with low effort, escalate if needed
const progressiveReasoning = async (problem) => {
  const efforts = ["low", "medium", "high"];

  for (const effort of efforts) {
    const response = await openai.chat.completions.create({
      model: "openai/o1-mini",
      messages: [{ role: "user", content: problem }],
      reasoning_effort: effort,
    });

    // Check if solution is satisfactory
    if (await validateSolution(response.choices[0].message.content)) {
      return response;
    }
  }

  throw new Error("Could not solve with available reasoning levels");
};

Reasoning with fallbacks

const reasoningWithFallback = async (problem) => {
  try {
    // Try reasoning model first
    return await openai.chat.completions.create({
      model: "openai/o1-mini",
      messages: [{ role: "user", content: problem }],
      reasoning_effort: "medium",
    });
  } catch (error) {
    // Fallback to regular model with detailed prompt
    return await openai.chat.completions.create({
      model: "openai/gpt-4o",
      messages: [
        {
          role: "user",
          content: `Think step-by-step and solve: ${problem}`,
        },
      ],
    });
  }
};

Limitations

  • Response time: Reasoning adds significant latency to generation
  • Cost: Reasoning tokens are charged at higher rates
  • Model availability: Limited to specific reasoning-capable models
  • Token limits: Reasoning may hit context limits faster
  • Determinism: Reasoning output may vary between requests

Monitoring

Key metrics to track:

const reasoningMetrics = {
  avgReasoningTokens: 0,
  avgResponseTime: 0,
  successRate: 0,
  costPerReasoning: 0,
  effortDistribution: { low: 0, medium: 0, high: 0 },
};