Timeouts

📖

This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.

Quick Start

Set maximum request duration to prevent hanging requests.

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Summarize AI trends for 2024" }],
  orq: {
    timeout: {
      call_timeout: 30000, // 30 seconds
    },
  },
});

Configuration

ParameterTypeRequiredDescription
call_timeoutnumberYesMaximum execution time in milliseconds

Timeout applies to:

  • Request processing time
  • Model generation time
  • Network transfer time
  • All fallback attempts (each gets same timeout)

Recommended Values

Use CaseTimeout (ms)Reason
Chat applications15000 (15s)User expectation for responses
Real-time features5000 (5s)Immediate feedback required
Batch processing60000 (60s)Complex analysis tasks
Streaming responses30000 (30s)Longer generation time
Development/testing10000 (10s)Fast iteration cycles

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "SummarizeSummarize the latestlatest trendstrends inin artificialartificial intelligenceintelligence forfor 20242024"
      }
    ],
    "orq": {
      "timeout": {
        "call_timeout": 30000
      }
    }
  }'
from openai import OpenAI
import os

openai = OpenAI(
  api_key=os.environ.get("ORQ_API_KEY"),
  base_url="https://api.orq.ai/v2/proxy"
)

response = openai.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "SummarizeSummarize the latestlatest trendstrends inin artificialartificial intelligenceintelligence forfor 20242024"
        }
    ],
    extra_body={
        "orq": {
            "timeout": {
                "call_timeout": 30000
            }
        }
    }
)
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v2/proxy",
});

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content:
        "SummarizeSummarize the latestlatest trendstrends inin artificialartificial intelligenceintelligence forfor 20242024",
    },
  ],
  orq: {
    timeout: {
      call_timeout: 30000,
    },
  },
});

Error Handling

try {
  const response = await openai.chat.completions.create({
    model: "openai/gpt-4o",
    messages: [...],
    orq: {
      timeout: {call_timeout: 15000}
    }
  });
} catch (error) {
  if (error.code === 'TIMEOUT') {
    console.log('Request timed out - try increasing timeout or using faster model');
    // Implement fallback behavior
  }
}

Best Practices

Timeout selection:

  • Set based on user experience requirements
  • Consider model complexity and prompt length
  • Factor in network latency (add 2-5s buffer)
  • Test with realistic prompts and data

Environment-specific timeouts:

const timeouts = {
  development: 10000, // Fast feedback during dev
  staging: 20000, // Realistic testing
  production: 30000, // Conservative for reliability
};

Progressive timeouts:

// Start with short timeout, increase for retries
const attempts = [
  { timeout: 10000, model: "fast-model" },
  { timeout: 20000, model: "standard-model" },
  { timeout: 30000, model: "comprehensive-model" },
];

Fallback Integration

Timeouts work seamlessly with fallbacks:

{
  orq: {
    timeout: {call_timeout: 15000},  // Applied to each attempt
    fallbacks: [
      {model: "openai/gpt-4o"},      // Gets 15s timeout
      {model: "openai/gpt-3.5-turbo"} // Also gets 15s timeout
    ]
  }
}

Total possible time: timeout × (1 + fallback_count)

  • Primary + 2 fallbacks with 15s timeout = up to 45s total

Troubleshooting

Frequent timeouts
  • Increase timeout value
  • Use faster models (gpt-3.5-turbo vs gpt-4)
  • Reduce prompt complexity/length
  • Check provider status for slowdowns
User experience issues
  • Set timeout based on user expectations
  • Show loading states for longer operations
  • Implement progressive enhancement
  • Consider async processing for long tasks
Performance optimization
// Monitor timeout patterns
const timeoutMetrics = {
  averageResponseTime: 0,
  timeoutRate: 0,
  responseTimesByModel: {},
  optimalTimeout: 0, // 95th percentile + buffer
};

Advanced Patterns

Dynamic timeout adjustment:

const getDynamicTimeout = (promptLength, modelComplexity) => {
  const baseTimeout = 10000;
  const promptFactor = Math.min(promptLength / 1000, 3); // Max 3x for long prompts
  const modelFactor = modelComplexity === "simple" ? 1 : 2;

  return baseTimeout * promptFactor * modelFactor;
};

Timeout with streaming:

{
  stream: true,
  orq: {
    timeout: {
      call_timeout: 30000  // Longer timeout for streaming
    }
  }
}

Circuit breaker pattern:

class CircuitBreaker {
  constructor(timeout, failureThreshold = 5) {
    this.timeout = timeout;
    this.failureCount = 0;
    this.failureThreshold = failureThreshold;
    this.state = "CLOSED"; // CLOSED, OPEN, HALF_OPEN
  }

  async call(requestFn) {
    if (this.state === "OPEN") {
      throw new Error("Circuit breaker is OPEN");
    }

    try {
      const result = await requestFn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
}

Limitations

  • Fixed timeout: Same timeout applies to all requests
  • No granular control: Cannot set different timeouts for different operations
  • Fallback multiplication: Each fallback gets the same timeout duration
  • Provider variations: Different providers have different baseline response times
  • Streaming considerations: Streaming responses may need longer timeouts

Monitoring

Key metrics to track:

  • Timeout rate: % of requests that timeout
  • Average response time: Baseline performance
  • 95th percentile: For setting optimal timeouts
  • Timeout impact: User experience degradation
  • Model performance: Response times by model
// Example monitoring
const monitorTimeouts = (responseTime, wasTimeout) => {
  metrics.totalRequests++;
  if (wasTimeout) {
    metrics.timeouts++;
  } else {
    metrics.responseTimes.push(responseTime);
  }

  // Calculate optimal timeout (95th percentile + 5s buffer)
  const p95 = calculatePercentile(metrics.responseTimes, 95);
  metrics.recommendedTimeout = p95 + 5000;
};