Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

This page describes features extending the AI Gateway, which provides a unified API for accessing multiple AI providers. To learn more, see AI Gateway.

Quick Start

Enable step-by-step reasoning for complex problems and analysis.

// OpenAI reasoning models (o1 series)
const response = await openai.chat.completions.create({
  model: "openai/o1-mini",
  messages: [
    {
      role: "user",
      content:
        "Analyze the logical flaws in this argument: 'All birds can fly. Penguins are birds. Therefore, penguins can fly.'",
    },
  ],
  reasoning_effort: "medium",
});

// Other models with thinking
const response2 = await openai.chat.completions.create({
  model: "google/gemini-2.5-pro",
  messages: [
    {
      role: "user",
      content: "Plan a 3-day itinerary for Tokyo with a $500 budget",
    },
  ],
  thinking: {
    type: "enabled",
    budget_tokens: 5000,
  },
});

Configuration by Provider

OpenAI Models (o1 series)

Parameter	Type	Values	Description
`reasoning_effort`	string	`low`, `medium`, `high`	Depth of reasoning process

Models supporting reasoning_effort:

openai/o1-preview
openai/o1-mini
openai/o3-mini

Other Models

Parameter	Type	Description
`thinking.type`	`"enabled"`	Enable reasoning capability
`thinking.budget_tokens`	number	Max tokens for reasoning (1000-10000)

Models supporting thinking:

google/gemini-2.5-pro
anthropic/claude-3-5-sonnet
Other compatible models

Reasoning Effort Levels

Level	Use Case	Processing Time	Accuracy
`low`	Simple calculations, basic logic	~10s	Good
`medium`	Multi-step problems, analysis	~30s	Better
`high`	Complex reasoning, research tasks	~60s+	Best

Use Cases

Problem Type	Recommended Settings	Example
Math problems	`medium` effort	”Calculate compound interest over 10 years”
Logic puzzles	`high` effort	”Solve this Sudoku puzzle”
Code debugging	`medium` effort	”Find the bug in this Python function”
Strategic planning	`high` effort	”Create a business plan for a SaaS startup”
Data analysis	`medium-high` effort	”Analyze trends in this sales data”

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/o1-mini",
    "messages": [
      {
        "role": "user",
        "content": "Solve this step-by-step: What is 15% of 250?"
      }
    ],
    "reasoning_effort": "medium"
  }'

Response Structure

OpenAI reasoning models return:

{
  "choices": [
    {
      "message": {
        "content": "Final answer after reasoning",
        "reasoning": "Step-by-step thought process (when available)"
      }
    }
  ],
  "usage": {
    "reasoning_tokens": 1500, // Tokens used for reasoning
    "completion_tokens": 200 // Tokens for final answer
  }
}

Other models with thinking:

{
  "choices": [
    {
      "message": {
        "content": "Final answer with reasoning embedded or separate"
      }
    }
  ],
  "usage": {
    "completion_tokens": 1200 // Includes thinking tokens
  }
}

Best Practices

Effort/budget selection:

const getReasoningConfig = (problemComplexity) => {
  if (problemComplexity === "simple") {
    return { reasoning_effort: "low" };
  } else if (problemComplexity === "moderate") {
    return { reasoning_effort: "medium" };
  } else {
    return { reasoning_effort: "high" };
  }
};

// For non-OpenAI models
const getThinkingBudget = (problemComplexity) => {
  const budgets = {
    simple: 2000,
    moderate: 5000,
    complex: 10000,
  };
  return {
    thinking: {
      type: "enabled",
      budget_tokens: budgets[problemComplexity],
    },
  };
};

Prompt engineering for reasoning:

// Effective prompts for reasoning
const reasoningPrompts = {
  math: "Solve step-by-step, showing all work:",
  logic: "Think through this logically, considering all possibilities:",
  analysis: "Analyze systematically, breaking down into components:",
  planning: "Create a detailed plan, considering constraints and requirements:",
};

Performance Considerations

Response times:

low effort: 5-15 seconds
medium effort: 15-45 seconds
high effort: 45-120 seconds

Token usage:

// Monitor reasoning token consumption
const trackReasoningUsage = (response) => {
  const reasoningTokens = response.usage.reasoning_tokens || 0;
  const totalTokens = response.usage.total_tokens;
  const reasoningRatio = reasoningTokens / totalTokens;

  console.log(`Reasoning tokens: ${reasoningTokens}`);
  console.log(`Reasoning ratio: ${(reasoningRatio * 100).toFixed(1)}%`);
};

Troubleshooting

**Slow responses

Use lower reasoning effort for time-sensitive applications
Consider async processing for complex reasoning tasks
Implement timeouts appropriate for reasoning models

**High token usage

Monitor reasoning token consumption
Adjust budget_tokens for non-OpenAI models
Use lower effort levels when appropriate

**Poor reasoning quality

Increase reasoning effort/budget for complex problems
Improve prompt specificity and clarity
Try different reasoning-capable models

Advanced Patterns

Conditional reasoning

const solveWithReasoning = async (problem, complexity) => {
  const isComplex = complexity === "high" || problem.length > 500;

  if (isComplex) {
    return await openai.chat.completions.create({
      model: "openai/o1-preview",
      messages: [{ role: "user", content: problem }],
      reasoning_effort: "high",
    });
  } else {
    return await openai.chat.completions.create({
      model: "openai/gpt-4o",
      messages: [{ role: "user", content: problem }],
    });
  }
};

Progressive reasoning

// Start with low effort, escalate if needed
const progressiveReasoning = async (problem) => {
  const efforts = ["low", "medium", "high"];

  for (const effort of efforts) {
    const response = await openai.chat.completions.create({
      model: "openai/o1-mini",
      messages: [{ role: "user", content: problem }],
      reasoning_effort: effort,
    });

    // Check if solution is satisfactory
    if (await validateSolution(response.choices[0].message.content)) {
      return response;
    }
  }

  throw new Error("Could not solve with available reasoning levels");
};

Reasoning with fallbacks

const reasoningWithFallback = async (problem) => {
  try {
    // Try reasoning model first
    return await openai.chat.completions.create({
      model: "openai/o1-mini",
      messages: [{ role: "user", content: problem }],
      reasoning_effort: "medium",
    });
  } catch (error) {
    // Fallback to regular model with detailed prompt
    return await openai.chat.completions.create({
      model: "openai/gpt-4o",
      messages: [
        {
          role: "user",
          content: `Think step-by-step and solve: ${problem}`,
        },
      ],
    });
  }
};

Limitations

Response time: Reasoning adds significant latency to generation
Cost: Reasoning tokens are charged at higher rates
Model availability: Limited to specific reasoning-capable models
Token limits: Reasoning may hit context limits faster
Determinism: Reasoning output may vary between requests

Monitoring

Key metrics to track:

const reasoningMetrics = {
  avgReasoningTokens: 0,
  avgResponseTime: 0,
  successRate: 0,
  costPerReasoning: 0,
  effortDistribution: { low: 0, medium: 0, high: 0 },
};

Getting Started

Reference

Administer

Reasoning

Quick Start

Configuration by Provider

OpenAI Models (o1 series)

Other Models

Reasoning Effort Levels

Use Cases

Code examples

Response Structure

Best Practices

Performance Considerations

Troubleshooting

Advanced Patterns

Conditional reasoning

Progressive reasoning

Reasoning with fallbacks

Limitations

Monitoring

Getting Started

Reference

Administer

​Quick Start

​Configuration by Provider

​OpenAI Models (o1 series)

​Other Models

​Reasoning Effort Levels

​Use Cases

​Code examples

​Response Structure

​Best Practices

​Performance Considerations

​Troubleshooting

​Advanced Patterns

​Conditional reasoning

​Progressive reasoning

​Reasoning with fallbacks

​Limitations

​Monitoring

Quick Start

Configuration by Provider

OpenAI Models (o1 series)

Other Models

Reasoning Effort Levels

Use Cases

Code examples

Response Structure

Best Practices

Performance Considerations

Troubleshooting

Advanced Patterns

Conditional reasoning

Progressive reasoning

Reasoning with fallbacks

Limitations

Monitoring