Request timeouts - Orq.ai Documentation

Use Cases

Preventing slow models from blocking user-facing requests indefinitely.
Setting different limits for interactive (short) vs. batch (long) workloads.
Triggering fallback logic when a provider exceeds an acceptable wait time.
Enforcing response-time SLAs on latency-sensitive features.

Quick Start

Set maximum request duration to prevent hanging requests.

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "input": "Summarize AI trends for 2024",
    "timeout": {"call_timeout": 30000}
  }'

Configuration

Parameter	Type	Required	Description
`call_timeout`	number	Yes	Maximum execution time in milliseconds

Timeout applies to:

Request processing time.
Model generation time.
Network transfer time.
All fallback attempts (each gets same timeout).

Recommended Values

Use Case	Timeout (ms)	Reason
Chat applications	`15000` (15s)	User expectation for responses
Real-time features	`5000` (5s)	Immediate feedback required
Batch processing	`60000` (60s)	Complex analysis tasks
Streaming responses	`30000` (30s)	Longer generation time
Development/testing	`10000` (10s)	Fast iteration cycles

Code examples

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "input": "Summarize the latest trends in artificial intelligence for 2024",
    "timeout": {"call_timeout": 30000}
  }'

Error Handling

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

try {
  const response = await client.responses.create({
    model: "openai/gpt-4o",
    input: "Explain quantum computing",
    timeout: { call_timeout: 15000 },
  });
  console.log(response.output_text);
} catch (error) {
  if (error instanceof OpenAI.APIConnectionTimeoutError) {
    console.log('Request timed out - try increasing timeout or using faster model');
  }
}

Best Practices

Timeout selection:

Set based on user experience requirements.
Consider model complexity and prompt length.
Factor in network latency (add 2-5s buffer).
Test with realistic prompts and data. Environment-specific timeouts:

const timeouts = {
  development: 10000, // Fast feedback during dev
  staging: 20000, // Realistic testing
  production: 30000, // Conservative for reliability
};

Progressive timeouts:

// Start with short timeout, increase for retries
const attempts = [
  { timeout: 10000, model: "fast-model" },
  { timeout: 20000, model: "standard-model" },
  { timeout: 30000, model: "comprehensive-model" },
];

Fallback Integration

Timeouts work seamlessly with fallbacks:

{
  "timeout": { "call_timeout": 15000 },
  "fallbacks": [
    { "model": "openai/gpt-4o" },
    { "model": "openai/gpt-5-mini" }
  ]
}

Total possible time: timeout × (1 + fallback_count)

Primary + 2 fallbacks with 15s timeout = up to 45s total.

Troubleshooting

Frequent timeouts

Increase timeout value.
Use faster models (gpt-5-mini vs gpt-5).
Reduce prompt complexity/length.
Check provider status for slowdowns. User experience issues
Set timeout based on user expectations.
Show loading states for longer operations.
Implement progressive enhancement.
Consider async processing for long tasks. Performance optimization

// Monitor timeout patterns
const timeoutMetrics = {
  averageResponseTime: 0,
  timeoutRate: 0,
  responseTimesByModel: {},
  optimalTimeout: 0, // 95th percentile + buffer
};

Advanced Patterns

Dynamic timeout adjustment:

const getDynamicTimeout = (promptLength, modelComplexity) => {
  const baseTimeout = 10000;
  const promptFactor = Math.min(promptLength / 1000, 3); // Max 3x for long prompts
  const modelFactor = modelComplexity === "simple" ? 1 : 2;

  return baseTimeout * promptFactor * modelFactor;
};

Timeout with streaming:

{
  "stream": true,
  "timeout": {
    "call_timeout": 30000
  }
}

Circuit breaker pattern:

class CircuitBreaker {
  timeout: number;
  failureCount: number;
  failureThreshold: number;
  state: "CLOSED" | "OPEN" | "HALF_OPEN";

  constructor(timeout: number, failureThreshold = 5) {
    this.timeout = timeout;
    this.failureCount = 0;
    this.failureThreshold = failureThreshold;
    this.state = "CLOSED";
  }

  async call(requestFn) {
    if (this.state === "OPEN") {
      throw new Error("Circuit breaker is OPEN");
    }

    try {
      const result = await requestFn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
}

Limitations

Fixed timeout: Same timeout applies to all requests.
No granular control: Cannot set different timeouts for different operations.
Fallback multiplication: Each fallback gets the same timeout duration.
Provider variations: Different providers have different baseline response times.
Streaming considerations: Streaming responses may need longer timeouts.

Monitoring

Key metrics to track:

Timeout rate: % of requests that timeout.
Average response time: Baseline performance.
95th percentile: For setting optimal timeouts.
Timeout impact: User experience degradation.
Model performance: Response times by model.

// Example monitoring
const metrics = {
  totalRequests: 0,
  timeouts: 0,
  responseTimes: [] as number[],
  recommendedTimeout: 0,
};

const calculatePercentile = (arr: number[], p: number): number => {
  const sorted = [...arr].sort((a, b) => a - b);
  return sorted[Math.floor((p / 100) * sorted.length)] ?? 0;
};

const monitorTimeouts = (responseTime: number, wasTimeout: boolean) => {
  metrics.totalRequests++;
  if (wasTimeout) {
    metrics.timeouts++;
  } else {
    metrics.responseTimes.push(responseTime);
  }

  // Calculate optimal timeout (95th percentile + 5s buffer)
  const p95 = calculatePercentile(metrics.responseTimes, 95);
  metrics.recommendedTimeout = p95 + 5000;
};

​Quick Start

​Configuration

​Recommended Values

​Code examples

​Error Handling

​Best Practices

​Fallback Integration

​Troubleshooting

​Advanced Patterns

​Limitations

​Monitoring