> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Load balancing across providers

> Distribute LLM requests across multiple providers with weighted routing. Optimize costs, run A/B tests, and ensure redundancy with load balancing.

**Use Cases**

* Distributing traffic across multiple provider accounts to stay within per-key rate limits.
* A/B testing providers by routing a configurable percentage of traffic to each.
* Reducing blast radius from a single provider outage without code changes.
* Maximizing throughput when one provider's capacity is a bottleneck.

***

## Quick Start

Distribute requests across multiple providers using weighted routing.

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/responses \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/gpt-4o-mini",
      "input": "Write a marketing slogan",
      "load_balancer": {
        "type": "weight_based",
        "models": [
          {"model": "openai/gpt-5-mini", "weight": 0.7},
          {"model": "anthropic/claude-haiku-4-5-20251001", "weight": 0.3}
        ]
      }
    }'
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.responses.create({
    model: "openai/gpt-4o-mini",
    input: "Write a marketing slogan",
    load_balancer: {
      type: "weight_based",
      models: [
        { model: "openai/gpt-5-mini", weight: 0.7 },
        { model: "anthropic/claude-haiku-4-5-20251001", weight: 0.3 },
      ],
    },
  });

  console.log(response.output_text);
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.responses.create(
      model="openai/gpt-4o-mini",
      input="Write a marketing slogan",
      extra_body={
          "load_balancer": {
              "type": "weight_based",
              "models": [
                  {"model": "openai/gpt-5-mini", "weight": 0.7},
                  {"model": "anthropic/claude-haiku-4-5-20251001", "weight": 0.3},
              ],
          }
      },
  )

  print(response.output_text)
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "openai/gpt-4o-mini",
    messages: [{ role: "user", content: "Write a marketing slogan" }],
    load_balancer: {
      type: "weight_based",
      models: [
        { model: "openai/gpt-5-mini", weight: 0.7 },
        { model: "anthropic/claude-haiku-4-5-20251001", weight: 0.3 },
      ],
    },
  });
  ```
</CodeGroup>

## Configuration

| Parameter              | Type   | Required | Description                                     |
| ---------------------- | ------ | -------- | ----------------------------------------------- |
| `load_balancer`        | Object | Yes      | Load balancer configuration (top-level)         |
| `load_balancer.type`   | string | Yes      | Strategy type (`weight_based` or `round_robin`) |
| `load_balancer.models` | Array  | Yes      | List of models with weights                     |
| `models[].model`       | string | Yes      | Model identifier                                |
| `models[].weight`      | number | No       | Relative weight (0.001 - 1.0, default 0.5)      |

**Weight Calculation:**

* Weights are normalized: `[0.4, 0.8]` → `[33%, 67%]`.
* Higher weight = more traffic.
* Minimum weight: `0.001`.
* Default weight: `0.5`.

## Common Patterns

<CodeGroup>
  ```typescript Weight-based config patterns theme={"theme":{"light":"github-light","dark":"github-dark"}}
  // Equal distribution
  load_balancer: {
    type: "weight_based",
    models: [
      { model: "openai/gpt-4o", weight: 1.0 },
      { model: "anthropic/claude-sonnet-4-6", weight: 1.0 },
    ],
  }

  // Cost optimization (cheap model primary)
  load_balancer: {
    type: "weight_based",
    models: [
      { model: "openai/gpt-5-mini", weight: 0.8 },
      { model: "openai/gpt-4o", weight: 0.2 },
    ],
  }

  // A/B testing
  load_balancer: {
    type: "weight_based",
    models: [
      { model: "current-model", weight: 0.9 },
      { model: "experimental-model", weight: 0.1 },
    ],
  }

  // Multi-provider redundancy
  load_balancer: {
    type: "weight_based",
    models: [
      { model: "openai/gpt-4o", weight: 0.5 },
      { model: "anthropic/claude-sonnet-4-6", weight: 0.3 },
      { model: "azure/gpt-4o", weight: 0.2 },
    ],
  }
  ```
</CodeGroup>

## Use Cases

| Scenario                | Weight Strategy            | Example                      |
| ----------------------- | -------------------------- | ---------------------------- |
| **Cost optimization**   | Heavy on cheaper models    | 80% GPT-3.5, 20% GPT-4       |
| **Performance testing** | Small traffic to new model | 95% current, 5% experimental |
| **Provider redundancy** | Split across providers     | 60% OpenAI, 40% Anthropic    |
| **Capacity management** | Distribute during peaks    | Even split across models     |

<Card title="See also: Organization-level load balancing" icon="sliders" href="/docs/ai-studio/ai-gateway/routing-rules#providers-and-traffic-weight" horizontal>
  To apply load balancing across your organization without changing request code, use **Routing Rules** to configure Fallback, Weighted, and Round Robin strategies at the workspace level.
</Card>

## Code examples

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/responses \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/gpt-4o-mini",
      "input": "Write a creative marketing slogan for an eco-friendly coffee brand",
      "load_balancer": {
        "type": "weight_based",
        "models": [
          {"model": "openai/gpt-5-mini", "weight": 0.4},
          {"model": "anthropic/claude-haiku-4-5-20251001", "weight": 0.6}
        ]
      }
    }'
  ```

  ```bash cURL (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/chat/completions \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/gpt-4o-mini",
      "messages": [
        {
          "role": "user",
          "content": "Write a creative marketing slogan for an eco-friendly coffee brand"
        }
      ],
      "load_balancer": {
        "type": "weight_based",
        "models": [
          {"model": "openai/gpt-5-mini", "weight": 0.4},
          {"model": "anthropic/claude-haiku-4-5-20251001", "weight": 0.6}
        ]
      }
    }'
  ```

  ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.responses.create({
    model: "openai/gpt-4o-mini",
    input: "Write a creative marketing slogan for an eco-friendly coffee brand",
    load_balancer: {
      type: "weight_based",
      models: [
        { model: "openai/gpt-5-mini", weight: 0.4 },
        { model: "anthropic/claude-haiku-4-5-20251001", weight: 0.6 },
      ],
    },
  });

  console.log(response.output_text);
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.responses.create(
      model="openai/gpt-4o-mini",
      input="Write a creative marketing slogan for an eco-friendly coffee brand",
      extra_body={
          "load_balancer": {
              "type": "weight_based",
              "models": [
                  {"model": "openai/gpt-5-mini", "weight": 0.4},
                  {"model": "anthropic/claude-haiku-4-5-20251001", "weight": 0.6},
              ],
          }
      },
  )

  print(response.output_text)
  ```

  ```typescript TypeScript (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await client.chat.completions.create({
    model: "openai/gpt-4o-mini",
    messages: [
      {
        role: "user",
        content: "Write a creative marketing slogan for an eco-friendly coffee brand",
      },
    ],
    load_balancer: {
      type: "weight_based",
      models: [
        { model: "openai/gpt-5-mini", weight: 0.4 },
        { model: "anthropic/claude-haiku-4-5-20251001", weight: 0.6 },
      ],
    },
  });
  ```

  ```python Python (Chat Completions) theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  client = OpenAI(
      api_key=os.environ.get("ORQ_API_KEY"),
      base_url="https://api.orq.ai/v3/router",
  )

  response = client.chat.completions.create(
      model="openai/gpt-4o-mini",
      messages=[
          {
              "role": "user",
              "content": "Write a creative marketing slogan for an eco-friendly coffee brand",
          }
      ],
      extra_body={
          "load_balancer": {
              "type": "weight_based",
              "models": [
                  {"model": "openai/gpt-5-mini", "weight": 0.4},
                  {"model": "anthropic/claude-haiku-4-5-20251001", "weight": 0.6},
              ],
          }
      },
  )
  ```
</CodeGroup>

## Monitoring

Track these metrics for optimal load balancing:

<CodeGroup>
  ```typescript Metrics tracking example theme={"theme":{"light":"github-light","dark":"github-dark"}}
  // Example monitoring setup
  const metrics = {
    requestsByModel: {}, // Count per model
    costsByModel: {}, // Cost per model
    latencyByModel: {}, // Response time per model
    errorsByModel: {}, // Error rate per model
  };
  ```
</CodeGroup>

**Key Metrics:**

* **Traffic distribution**: Actual vs expected percentages.
* **Cost per model**: Monitor spending across providers.
* **Response times**: Compare latency by model.
* **Error rates**: Track failures by provider.

## Troubleshooting

**Uneven distribution**

* Check if weights are normalized correctly.

* Verify sufficient request volume (min 100 requests for accuracy).

* Monitor over longer time periods.
  **Unexpected costs**

* Track actual vs expected cost distribution.

* Monitor for expensive model overuse.

* Set up cost alerts per provider.
  **Performance issues**

* Check latency differences between models.

* Monitor for provider-specific slowdowns.

* Adjust weights based on performance data.

## Limitations

* **Probabilistic routing**: Short-term traffic may not match exact weights.
* **Minimum volume needed**: Requires sufficient requests for statistical accuracy.
* **Response variations**: Different models may return varying output quality.
* **Cost complexity**: Managing billing across multiple providers.
* **Provider dependencies**: Requires API access to all models.

## Advanced Usage

**Environment-specific weights:**

<CodeGroup>
  ```typescript Environment weight config theme={"theme":{"light":"github-light","dark":"github-dark"}}
  const weights = {
    development: {
      type: "weight_based",
      models: [
        { model: "openai/gpt-5-mini", weight: 1.0 }, // Cheap for dev
      ],
    },
    production: {
      type: "weight_based",
      models: [
        { model: "openai/gpt-4o", weight: 0.7 }, // Quality primary
        { model: "anthropic/claude-sonnet-4-6", weight: 0.3 }, // Backup
      ],
    },
  };
  ```
</CodeGroup>

**Dynamic weight adjustment:**

<CodeGroup>
  ```typescript Dynamic weight calculation theme={"theme":{"light":"github-light","dark":"github-dark"}}
  // Adjust weights based on performance
  const calculateWeight = (latency: number, cost: number, quality: number) =>
    quality / (latency * cost);

  const adjustWeights = (models) => ({
    type: "weight_based",
    models: models.map((model) => ({
      model: model.model,
      weight: calculateWeight(model.latency, model.cost, model.quality),
    })),
  });
  ```
</CodeGroup>

**With other features:**

<CodeGroup>
  ```json JSON theme={"theme":{"light":"github-light","dark":"github-dark"}}
  {
    "model": "openai/gpt-4o",
    "load_balancer": {
      "type": "weight_based",
      "models": [
        { "model": "openai/gpt-4o", "weight": 0.6 },
        { "model": "anthropic/claude-sonnet-4-6", "weight": 0.4 }
      ]
    },
    "retry": { "count": 2, "on_codes": [429] },
    "timeout": { "call_timeout": 15000 }
  }
  ```
</CodeGroup>
