Skip to main content
Use Cases
  • Preventing slow models from blocking user-facing requests indefinitely.
  • Setting different limits for interactive (short) vs. batch (long) workloads.
  • Triggering fallback logic when a provider exceeds an acceptable wait time.
  • Enforcing response-time SLAs on latency-sensitive features.

Quick Start

Set maximum request duration to prevent hanging requests.
curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "input": "Summarize AI trends for 2024",
    "timeout": {"call_timeout": 30000}
  }'

Configuration

ParameterTypeRequiredDescription
call_timeoutnumberYesMaximum execution time in milliseconds
Timeout applies to:
  • Request processing time.
  • Model generation time.
  • Network transfer time.
  • All fallback attempts (each gets same timeout).
Use CaseTimeout (ms)Reason
Chat applications15000 (15s)User expectation for responses
Real-time features5000 (5s)Immediate feedback required
Batch processing60000 (60s)Complex analysis tasks
Streaming responses30000 (30s)Longer generation time
Development/testing10000 (10s)Fast iteration cycles

Code examples

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "input": "Summarize the latest trends in artificial intelligence for 2024",
    "timeout": {"call_timeout": 30000}
  }'

Error Handling

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

try {
  const response = await client.responses.create({
    model: "openai/gpt-4o",
    input: "Explain quantum computing",
    timeout: { call_timeout: 15000 },
  });
  console.log(response.output_text);
} catch (error) {
  if (error instanceof OpenAI.APIConnectionTimeoutError) {
    console.log('Request timed out - try increasing timeout or using faster model');
  }
}

Best Practices

Timeout selection:
  • Set based on user experience requirements.
  • Consider model complexity and prompt length.
  • Factor in network latency (add 2-5s buffer).
  • Test with realistic prompts and data. Environment-specific timeouts:
const timeouts = {
  development: 10000, // Fast feedback during dev
  staging: 20000, // Realistic testing
  production: 30000, // Conservative for reliability
};
Progressive timeouts:
// Start with short timeout, increase for retries
const attempts = [
  { timeout: 10000, model: "fast-model" },
  { timeout: 20000, model: "standard-model" },
  { timeout: 30000, model: "comprehensive-model" },
];

Fallback Integration

Timeouts work seamlessly with fallbacks:
{
  "timeout": { "call_timeout": 15000 },
  "fallbacks": [
    { "model": "openai/gpt-4o" },
    { "model": "openai/gpt-3.5-turbo" }
  ]
}
Total possible time: timeout × (1 + fallback_count)
  • Primary + 2 fallbacks with 15s timeout = up to 45s total.

Troubleshooting

Frequent timeouts
  • Increase timeout value.
  • Use faster models (gpt-3.5-turbo vs gpt-4).
  • Reduce prompt complexity/length.
  • Check provider status for slowdowns. User experience issues
  • Set timeout based on user expectations.
  • Show loading states for longer operations.
  • Implement progressive enhancement.
  • Consider async processing for long tasks. Performance optimization
// Monitor timeout patterns
const timeoutMetrics = {
  averageResponseTime: 0,
  timeoutRate: 0,
  responseTimesByModel: {},
  optimalTimeout: 0, // 95th percentile + buffer
};

Advanced Patterns

Dynamic timeout adjustment:
const getDynamicTimeout = (promptLength, modelComplexity) => {
  const baseTimeout = 10000;
  const promptFactor = Math.min(promptLength / 1000, 3); // Max 3x for long prompts
  const modelFactor = modelComplexity === "simple" ? 1 : 2;

  return baseTimeout * promptFactor * modelFactor;
};
Timeout with streaming:
{
  "stream": true,
  "timeout": {
    "call_timeout": 30000
  }
}
Circuit breaker pattern:
class CircuitBreaker {
  timeout: number;
  failureCount: number;
  failureThreshold: number;
  state: "CLOSED" | "OPEN" | "HALF_OPEN";

  constructor(timeout: number, failureThreshold = 5) {
    this.timeout = timeout;
    this.failureCount = 0;
    this.failureThreshold = failureThreshold;
    this.state = "CLOSED";
  }

  async call(requestFn) {
    if (this.state === "OPEN") {
      throw new Error("Circuit breaker is OPEN");
    }

    try {
      const result = await requestFn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
}

Limitations

  • Fixed timeout: Same timeout applies to all requests.
  • No granular control: Cannot set different timeouts for different operations.
  • Fallback multiplication: Each fallback gets the same timeout duration.
  • Provider variations: Different providers have different baseline response times.
  • Streaming considerations: Streaming responses may need longer timeouts.

Monitoring

Key metrics to track:
  • Timeout rate: % of requests that timeout.
  • Average response time: Baseline performance.
  • 95th percentile: For setting optimal timeouts.
  • Timeout impact: User experience degradation.
  • Model performance: Response times by model.
// Example monitoring
const metrics = {
  totalRequests: 0,
  timeouts: 0,
  responseTimes: [] as number[],
  recommendedTimeout: 0,
};

const calculatePercentile = (arr: number[], p: number): number => {
  const sorted = [...arr].sort((a, b) => a - b);
  return sorted[Math.floor((p / 100) * sorted.length)] ?? 0;
};

const monitorTimeouts = (responseTime: number, wasTimeout: boolean) => {
  metrics.totalRequests++;
  if (wasTimeout) {
    metrics.timeouts++;
  } else {
    metrics.responseTimes.push(responseTime);
  }

  // Calculate optimal timeout (95th percentile + 5s buffer)
  const p95 = calculatePercentile(metrics.responseTimes, 95);
  metrics.recommendedTimeout = p95 + 5000;
};