Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.orq.ai/llms.txt

Use this file to discover all available pages before exploring further.

Use Cases
  • Surviving transient provider errors without surfacing failures to end users.
  • Automatic failover to a backup provider when the primary is degraded or rate-limited.
  • Absorbing short rate-limit bursts without manual intervention or custom retry logic.
  • Meeting availability SLAs on production features without adding retry code to every service.

Retries

Retry failed requests automatically with exponential backoff. Configure which HTTP error codes trigger retries and how many attempts to make.

Fallbacks

Route to a different model when the primary fails. Define a fallback chain across providers for high availability.

Retries

Automatically retry failed requests with exponential backoff.

Quick Start

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "input": "Analyze customer feedback",
    "retry": {"count": 3, "on_codes": [429, 500, 502, 503, 504]}
  }'

Configuration

ParameterTypeRequiredDescription
countnumberYesMax retry attempts (1-5)
on_codesnumber[]NoHTTP status codes that trigger retries (default: [429])

Error Codes

CodeMeaningRetry?Common Cause
429Rate limit exceeded YesToo many requests
500Internal server error YesProvider issue
501Not implemented NoDefinitive; retrying will not succeed
502Bad gateway YesNetwork/Gateway issue
503Service unavailable YesProvider maintenance
504Gateway timeout YesProvider overload
400Bad request NoInvalid parameters
401Unauthorized NoInvalid API key
403Forbidden NoAccess denied

Retry Strategies

// Conservative
retry: {
  count: 2,
  on_codes: [429, 503]  // Only rate limits and service unavailable
}

// Balanced (recommended)
retry: {
  count: 3,
  on_codes: [429, 500, 502, 503, 504]  // All transient errors
}

// Aggressive
retry: {
  count: 5,
  on_codes: [429, 500, 502, 503, 504]  // Max retries
}

Backoff Algorithm

Exponential backoff with jitter

  • Attempt 1: 1s (±25%).
  • Attempt 2: 2s (±25%).
  • Attempt 3: 4s (±25%).
  • Attempt 4: 8s (±25%).
  • Attempt 5: 16s (±25%). Maximum total delay: ~31 seconds for 5 retries

Code examples

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "input": "Analyze customer feedback and provide sentiment analysis",
    "retry": {
      "count": 3,
      "on_codes": [429, 500, 502, 503, 504]
    }
  }'

Best Practices

Production recommendations

Follow the following advice for a best production setup:
  • Use count: 2-3 for balance of reliability and speed.
  • Always include 429 (rate limits) in on_codes.
  • Monitor retry rates to detect systemic issues.
  • Implement circuit breaker for persistent failures.

Error handling

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

try {
  const response = await client.responses.create({
    model: "openai/gpt-4o-mini",
    input: "Hello",
  });
} catch (error) {
  if (error instanceof OpenAI.APIError) {
    if (error.status === 400) {
      console.error('Bad request:', error.message);
    } else if (error.status >= 500) {
      console.error('Server error:', error.message);
    }
  }
}

Troubleshooting

High retry rates
  • Check if you’re hitting rate limits frequently.
  • Verify API keys have sufficient quotas.
  • Monitor provider status pages for outages. Slow response times
  • Reduce retry count for latency-sensitive apps.
  • Use shorter timeout values with retries.
  • Consider fallbacks for faster alternatives. Still getting errors
  • Check if error codes are in on_codes list.
  • Verify retry count isn’t exhausted.
  • Review provider-specific error documentation.

Monitoring

Track these retry metrics:
const retryMetrics = {
  totalRequests: 0,
  retriedRequests: 0,
  retriesByAttempt: { 1: 0, 2: 0, 3: 0 }, // Retry attempt distribution
  retriesByCode: { 429: 0, 500: 0 }, // By error code
  avgRetryLatency: 0, // Added latency from retries
  finalFailures: 0, // Requests that failed after all retries
};

Limitations

  • Increased latency: Retries add delay (up to 31s for 5 attempts).
  • Cost implications: Failed requests may still incur charges.
  • Rate limit consumption: Each retry counts against quotas.
  • Limited retries: Maximum 5 attempts to prevent excessive delays.
  • Non-retryable errors: 4xx client errors are not retried.

Advanced Usage

Environment-specific configs:
const retryConfig = {
  development: { count: 1, on_codes: [429] }, // Fast feedback
  staging: { count: 2, on_codes: [429, 503] }, // Light retries
  production: { count: 3, on_codes: [429, 500, 502, 503, 504] }, // Full protection
};
With other features:
{
  "retry": { "count": 3, "on_codes": [429, 503] },
  "timeout": { "call_timeout": 10000 },
  "fallbacks": [{ "model": "backup-model" }],
  "cache": { "type": "exact_match", "ttl": 300 }
}
Custom retry logic (client-side):
const customRetry = async (requestFn, maxAttempts = 3) => {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await requestFn();
    } catch (error) {
      if (attempt === maxAttempts || error.status < 500) {
        throw error; // Final attempt or non-retryable error
      }
      await new Promise(
        (resolve) => setTimeout(resolve, Math.pow(2, attempt) * 1000), // Exponential backoff
      );
    }
  }
};

Fallbacks

Automatically switch to a different model when the primary fails.

Quick Start

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.responses.create({
  model: "openai/gpt-4o-mini",
  input: "Generate a product description",
  fallbacks: [{ model: "openai/gpt-4o" }, { model: "azure/gpt-4o" }],
});

console.log(response.output_text);

Configuration

ParameterTypeRequiredDescription
fallbacksArrayYesList of fallback models in order of preference
modelstringYesModel identifier for each fallback

Trigger Conditions

Fallbacks activate on these errors:
Error CodeDescriptionTriggers Fallback
429Rate limit exceeded Yes
500Internal server error Yes
501Not implemented No
502Bad gateway Yes
503Service unavailable Yes
504Gateway timeout Yes
400Bad request No
401Unauthorized No
403Forbidden No

Best Practices

Use a maximum of 3 fallback models. Order them by preference or cost, and choose models with similar capabilities.
// Cost-optimized: cheap then expensive
fallbacks: [{ model: "openai/gpt-3.5-turbo" }, { model: "openai/gpt-4o" }];

// Reliability-optimized: different providers
fallbacks: [
  { model: "openai/gpt-4o" },
  { model: "anthropic/claude-sonnet-4-6" },
  { model: "azure/gpt-4o" },
];

Code examples

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "input": "Generate a product description",
    "fallbacks": [
      { "model": "openai/gpt-4o" },
      { "model": "azure/gpt-4o" }
    ]
  }'

Limitations

  • Response consistency: Different models may return varying output styles.
  • Parameter support: Not all providers support identical parameters.
  • Cost implications: Failed requests may still incur charges from the primary provider.
  • Latency impact: Sequential attempts add processing time.
  • Provider dependencies: Requires API keys for all fallback providers.