Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

Quick Start

Automatically retry failed requests with exponential backoff.

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Analyze customer feedback" }],
  retry: {
    count: 3,
    on_codes: [429, 500, 502, 503, 504],
  },
});

Configuration

Parameter	Type	Required	Description
`count`	number	Yes	Max retry attempts (1-5)
`on_codes`	number[]	No	HTTP status codes that trigger retries (default: [429])

Error Codes

Code	Meaning	Retry?	Common Cause
`429`	Rate limit exceeded	✅ Yes	Too many requests
`500`	Internal server error	✅ Yes	Provider issue
`501`	Not implemented	✅ Yes	Feature unavailable
`502`	Bad gateway	✅ Yes	Network/Gateway issue
`503`	Service unavailable	✅ Yes	Provider maintenance
`504`	Gateway timeout	✅ Yes	Provider overload
`400`	Bad request	❌ No	Invalid parameters
`401`	Unauthorized	❌ No	Invalid API key
`403`	Forbidden	❌ No	Access denied

Retry Strategies

// Conservative (production)
retry: {
  count: 2,
  on_codes: [429, 503]  // Only rate limits and service unavailable
}

// Balanced (recommended)
retry: {
  count: 3,
  on_codes: [429, 500, 502, 503, 504]  // All transient errors
}

// Aggressive (development)
retry: {
  count: 5,
  on_codes: [429, 500, 502, 503, 504]  // Max retries
}

Backoff Algorithm

Exponential backoff with jitter

Attempt 1: 1s (±25%)
Attempt 2: 2s (±25%)
Attempt 3: 4s (±25%)
Attempt 4: 8s (±25%)
Attempt 5: 16s (±25%)

Maximum total delay: ~31 seconds for 5 retries

Code examples

curl -X POST https://api.orq.ai/v2/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Analyze customer feedback and provide sentiment analysis"
      }
    ],
    "retry": {
      "count": 3,
      "on_codes": [429, 500, 502, 503, 504]
    }
  }'

Best Practices

Production recommendations

Follow the following advice for a best production setup:

Use count: 2-3 for balance of reliability and speed
Always include 429 (rate limits) in on_codes
Monitor retry rates to detect systemic issues
Implement circuit breaker for persistent failures

Error handling

try {
  const response = await openai.chat.completions.create({...});
} catch (error) {
  if (error.status === 400) {
    // Don't retry client errors - fix the request
    console.error('Bad request:', error.message);
  } else if (error.status >= 500) {
    // Server errors might need manual intervention
    console.error('Server error:', error.message);
  }
}

Troubleshooting

High retry rates

Check if you’re hitting rate limits frequently
Verify API keys have sufficient quotas
Monitor provider status pages for outages

Slow response times

Reduce retry count for latency-sensitive apps
Use shorter timeout values with retries
Consider fallbacks for faster alternatives

Still getting errors

Check if error codes are in on_codes list
Verify retry count isn’t exhausted
Review provider-specific error documentation

Monitoring

Track these retry metrics:

const retryMetrics = {
  totalRequests: 0,
  retriedRequests: 0,
  retriesByAttempt: { 1: 0, 2: 0, 3: 0 }, // Retry attempt distribution
  retriesByCode: { 429: 0, 500: 0 }, // By error code
  avgRetryLatency: 0, // Added latency from retries
  finalFailures: 0, // Requests that failed after all retries
};

Limitations

Increased latency: Retries add delay (up to 31s for 5 attempts)
Cost implications: Failed requests may still incur charges
Rate limit consumption: Each retry counts against quotas
Limited retries: Maximum 5 attempts to prevent excessive delays
Non-retryable errors: 4xx client errors are not retried

Advanced Usage

Environment-specific configs:

const retryConfig = {
  development: { count: 1, on_codes: [429] }, // Fast feedback
  staging: { count: 2, on_codes: [429, 503] }, // Light retries
  production: { count: 3, on_codes: [429, 500, 502, 503, 504] }, // Full protection
};

With other features:

{
  "retry": { "count": 3, "on_codes": [429, 503] },
  "timeout": { "call_timeout": 10000 },
  "fallbacks": [{ "model": "backup-model" }],
  "cache": { "type": "exact_match", "ttl": 300 }
}

Custom retry logic (client-side):

const customRetry = async (requestFn, maxAttempts = 3) => {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await requestFn();
    } catch (error) {
      if (attempt === maxAttempts || error.status < 500) {
        throw error; // Final attempt or non-retryable error
      }
      await new Promise(
        (resolve) => setTimeout(resolve, Math.pow(2, attempt) * 1000), // Exponential backoff
      );
    }
  }
};

AI & Execution

Access & Security

AI Router Features

API Reference

LLM Retries & Error Handling | Auto Recovery

Quick Start

Configuration

Error Codes

Retry Strategies

Backoff Algorithm

Exponential backoff with jitter

Code examples

Best Practices

Production recommendations

Error handling

Troubleshooting

Monitoring

Limitations

Advanced Usage

AI & Execution

Access & Security

AI Router Features

API Reference

​Quick Start

​Configuration

​Error Codes

​Retry Strategies

​Backoff Algorithm

​Exponential backoff with jitter

​Code examples

​Best Practices

​Production recommendations

​Error handling

​Troubleshooting

​Monitoring

​Limitations

​Advanced Usage

Quick Start

Configuration

Error Codes

Retry Strategies

Backoff Algorithm

Exponential backoff with jitter

Code examples

Best Practices

Production recommendations

Error handling

Troubleshooting

Monitoring

Limitations

Advanced Usage