Timeouts

📖
This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.

Quick Start

Set maximum request duration to prevent hanging requests.

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Summarize AI trends for 2024" }],
  orq: {
    timeout: {
      call_timeout: 30000, // 30 seconds
    },
  },
});

Configuration

Parameter	Type	Required	Description
`call_timeout`	number	Yes	Maximum execution time in milliseconds

Timeout applies to:

Request processing time
Model generation time
Network transfer time
All fallback attempts (each gets same timeout)

Recommended Values

Use Case	Timeout (ms)	Reason
Chat applications	`15000` (15s)	User expectation for responses
Real-time features	`5000` (5s)	Immediate feedback required
Batch processing	`60000` (60s)	Complex analysis tasks
Streaming responses	`30000` (30s)	Longer generation time
Development/testing	`10000` (10s)	Fast iteration cycles

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "SummarizeSummarize the latestlatest trendstrends inin artificialartificial intelligenceintelligence forfor 20242024"
      }
    ],
    "orq": {
      "timeout": {
        "call_timeout": 30000
      }
    }
  }'

from openai import OpenAI
import os

openai = OpenAI(
  api_key=os.environ.get("ORQ_API_KEY"),
  base_url="https://api.orq.ai/v2/proxy"
)

response = openai.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "SummarizeSummarize the latestlatest trendstrends inin artificialartificial intelligenceintelligence forfor 20242024"
        }
    ],
    extra_body={
        "orq": {
            "timeout": {
                "call_timeout": 30000
            }
        }
    }
)

import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v2/proxy",
});

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content:
        "SummarizeSummarize the latestlatest trendstrends inin artificialartificial intelligenceintelligence forfor 20242024",
    },
  ],
  orq: {
    timeout: {
      call_timeout: 30000,
    },
  },
});

Error Handling

try {
  const response = await openai.chat.completions.create({
    model: "openai/gpt-4o",
    messages: [...],
    orq: {
      timeout: {call_timeout: 15000}
    }
  });
} catch (error) {
  if (error.code === 'TIMEOUT') {
    console.log('Request timed out - try increasing timeout or using faster model');
    // Implement fallback behavior
  }
}

Best Practices

Timeout selection:

Set based on user experience requirements
Consider model complexity and prompt length
Factor in network latency (add 2-5s buffer)
Test with realistic prompts and data

Environment-specific timeouts:

const timeouts = {
  development: 10000, // Fast feedback during dev
  staging: 20000, // Realistic testing
  production: 30000, // Conservative for reliability
};

Progressive timeouts:

// Start with short timeout, increase for retries
const attempts = [
  { timeout: 10000, model: "fast-model" },
  { timeout: 20000, model: "standard-model" },
  { timeout: 30000, model: "comprehensive-model" },
];

Fallback Integration

Timeouts work seamlessly with fallbacks:

{
  orq: {
    timeout: {call_timeout: 15000},  // Applied to each attempt
    fallbacks: [
      {model: "openai/gpt-4o"},      // Gets 15s timeout
      {model: "openai/gpt-3.5-turbo"} // Also gets 15s timeout
    ]
  }
}

Total possible time: timeout × (1 + fallback_count)

Primary + 2 fallbacks with 15s timeout = up to 45s total

Troubleshooting

Frequent timeouts

Increase timeout value
Use faster models (gpt-3.5-turbo vs gpt-4)
Reduce prompt complexity/length
Check provider status for slowdowns

User experience issues

Set timeout based on user expectations
Show loading states for longer operations
Implement progressive enhancement
Consider async processing for long tasks

Performance optimization

// Monitor timeout patterns
const timeoutMetrics = {
  averageResponseTime: 0,
  timeoutRate: 0,
  responseTimesByModel: {},
  optimalTimeout: 0, // 95th percentile + buffer
};

Advanced Patterns

Dynamic timeout adjustment:

const getDynamicTimeout = (promptLength, modelComplexity) => {
  const baseTimeout = 10000;
  const promptFactor = Math.min(promptLength / 1000, 3); // Max 3x for long prompts
  const modelFactor = modelComplexity === "simple" ? 1 : 2;

  return baseTimeout * promptFactor * modelFactor;
};

Timeout with streaming:

{
  stream: true,
  orq: {
    timeout: {
      call_timeout: 30000  // Longer timeout for streaming
    }
  }
}

Circuit breaker pattern:

class CircuitBreaker {
  constructor(timeout, failureThreshold = 5) {
    this.timeout = timeout;
    this.failureCount = 0;
    this.failureThreshold = failureThreshold;
    this.state = "CLOSED"; // CLOSED, OPEN, HALF_OPEN
  }

  async call(requestFn) {
    if (this.state === "OPEN") {
      throw new Error("Circuit breaker is OPEN");
    }

    try {
      const result = await requestFn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
}

Limitations

Fixed timeout: Same timeout applies to all requests
No granular control: Cannot set different timeouts for different operations
Fallback multiplication: Each fallback gets the same timeout duration
Provider variations: Different providers have different baseline response times
Streaming considerations: Streaming responses may need longer timeouts

Monitoring

Key metrics to track:

Timeout rate: % of requests that timeout
Average response time: Baseline performance
95th percentile: For setting optimal timeouts
Timeout impact: User experience degradation
Model performance: Response times by model

// Example monitoring
const monitorTimeouts = (responseTime, wasTimeout) => {
  metrics.totalRequests++;
  if (wasTimeout) {
    metrics.timeouts++;
  } else {
    metrics.responseTimes.push(responseTime);
  }

  // Calculate optimal timeout (95th percentile + 5s buffer)
  const p95 = calculatePercentile(metrics.responseTimes, 95);
  metrics.recommendedTimeout = p95 + 5000;
};