> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Load balancing across providers

> Distribute LLM requests across multiple providers with weighted routing. Optimize costs, run A/B tests, and ensure redundancy with load balancing.

## Quick Start

Distribute requests across multiple providers using weighted routing.

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  const response = await openai.chat.completions.create({
    model: "openai/gpt-4o-mini", // Primary model (ignored when load balancing)
    messages: [{ role: "user", content: "Write a marketing slogan" }],
    load_balancer: {
      type: "weight_based",
      models: [
        { model: "openai/gpt-3.5-turbo", weight: 0.7 }, // 70% of requests
        { model: "anthropic/claude-3-haiku", weight: 0.3 }, // 30% of requests
      ],
    },
  });
  ```
</CodeGroup>

## Configuration

| Parameter              | Type   | Required | Description                                |
| ---------------------- | ------ | -------- | ------------------------------------------ |
| `load_balancer`        | Object | Yes      | Load balancer configuration (top-level)    |
| `load_balancer.type`   | string | Yes      | Strategy type (`weight_based`)             |
| `load_balancer.models` | Array  | Yes      | List of models with weights                |
| `models[].model`       | string | Yes      | Model identifier                           |
| `models[].weight`      | number | No       | Relative weight (0.001 - 1.0, default 0.5) |

**Weight Calculation:**

* Weights are normalized: `[0.4, 0.8]` → `[33%, 67%]`
* Higher weight = more traffic
* Minimum weight: `0.001`
* Default weight: `0.5`

## Common Patterns

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  // Equal distribution
  load_balancer: {
    type: "weight_based",
    models: [
      { model: "openai/gpt-4o", weight: 1.0 },
      { model: "anthropic/claude-3", weight: 1.0 },
    ],
  }

  // Cost optimization (cheap model primary)
  load_balancer: {
    type: "weight_based",
    models: [
      { model: "openai/gpt-3.5-turbo", weight: 0.8 },
      { model: "openai/gpt-4o", weight: 0.2 },
    ],
  }

  // A/B testing
  load_balancer: {
    type: "weight_based",
    models: [
      { model: "current-model", weight: 0.9 },
      { model: "experimental-model", weight: 0.1 },
    ],
  }

  // Multi-provider redundancy
  load_balancer: {
    type: "weight_based",
    models: [
      { model: "openai/gpt-4o", weight: 0.5 },
      { model: "anthropic/claude-3", weight: 0.3 },
      { model: "azure/gpt-4o", weight: 0.2 },
    ],
  }
  ```
</CodeGroup>

## Use Cases

| Scenario                | Weight Strategy            | Example                      |
| ----------------------- | -------------------------- | ---------------------------- |
| **Cost optimization**   | Heavy on cheaper models    | 80% GPT-3.5, 20% GPT-4       |
| **Performance testing** | Small traffic to new model | 95% current, 5% experimental |
| **Provider redundancy** | Split across providers     | 60% OpenAI, 40% Anthropic    |
| **Capacity management** | Distribute during peaks    | Even split across models     |

<Card title="See also: Organization-level load balancing" icon="sliders" href="/docs/router/routing-rules#providers-and-traffic-weight" horizontal>
  To apply load balancing across your organization without changing request code, use **Routing Rules** to configure Fallback, Weighted, and Round Robin strategies at the workspace level.
</Card>

## Code examples

<CodeGroup>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST https://api.orq.ai/v3/router/chat/completions \
    -H "Authorization: Bearer $ORQ_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai/gpt-4o-mini",
      "messages": [
        {
          "role": "user",
          "content": "Write a creative marketing slogan for an eco-friendly coffee brand"
        }
      ],
      "load_balancer": {
        "type": "weight_based",
        "models": [
          {
            "model": "openai/gpt-3.5-turbo",
            "weight": 0.4
          },
          {
            "model": "anthropic/claude-3-haiku-20240307",
            "weight": 0.6
          }
        ]
      }
    }'
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import os

  openai = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router"
  )

  response = openai.chat.completions.create(
      model="openai/gpt-4o",
      messages=[
          {
              "role": "user",
              "content": "Write a creative marketing slogan for an eco-friendly coffee brand"
          }
      ],
      extra_body={
          "load_balancer": {
              "type": "weight_based",
              "models": [
                  {
                      "model": "openai/gpt-3.5-turbo",
                      "weight": 0.4
                  },
                  {
                      "model": "anthropic/claude-3-haiku-20240307",
                      "weight": 0.6
                  }
              ]
          }
      }
  )
  ```

  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import OpenAI from "openai";

  const openai = new OpenAI({
    apiKey: process.env.ORQ_API_KEY,
    baseURL: "https://api.orq.ai/v3/router",
  });

  const response = await openai.chat.completions.create({
    model: "openai/gpt-4o",
    messages: [
      {
        role: "user",
        content:
          "Write a creative marketing slogan for an eco-friendly coffee brand",
      },
    ],
    load_balancer: {
      type: "weight_based",
      models: [
        {
          model: "openai/gpt-3.5-turbo",
          weight: 0.4,
        },
        {
          model: "anthropic/claude-3-haiku-20240307",
          weight: 0.6,
        },
      ],
    },
  });
  ```
</CodeGroup>

## Monitoring

Track these metrics for optimal load balancing:

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  // Example monitoring setup
  const metrics = {
    requestsByModel: {}, // Count per model
    costsByModel: {}, // Cost per model
    latencyByModel: {}, // Response time per model
    errorsByModel: {}, // Error rate per model
  };
  ```
</CodeGroup>

**Key Metrics:**

* **Traffic distribution**: Actual vs expected percentages
* **Cost per model**: Monitor spending across providers
* **Response times**: Compare latency by model
* **Error rates**: Track failures by provider

## Troubleshooting

**Uneven distribution**

* Check if weights are normalized correctly
* Verify sufficient request volume (min 100 requests for accuracy)
* Monitor over longer time periods

**Unexpected costs**

* Track actual vs expected cost distribution
* Monitor for expensive model overuse
* Set up cost alerts per provider

**Performance issues**

* Check latency differences between models
* Monitor for provider-specific slowdowns
* Adjust weights based on performance data

## Limitations

* **Probabilistic routing**: Short-term traffic may not match exact weights
* **Minimum volume needed**: Requires sufficient requests for statistical accuracy
* **Response variations**: Different models may return varying output quality
* **Cost complexity**: Managing billing across multiple providers
* **Provider dependencies**: Requires API access to all models

## Advanced Usage

**Environment-specific weights:**

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  const weights = {
    development: {
      type: "weight_based",
      models: [
        { model: "openai/gpt-3.5-turbo", weight: 1.0 }, // Cheap for dev
      ],
    },
    production: {
      type: "weight_based",
      models: [
        { model: "openai/gpt-4o", weight: 0.7 }, // Quality primary
        { model: "anthropic/claude-3", weight: 0.3 }, // Backup
      ],
    },
  };
  ```
</CodeGroup>

**Dynamic weight adjustment:**

<CodeGroup>
  ```typescript Typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  // Adjust weights based on performance
  const adjustWeights = (metrics) => ({
    type: "weight_based",
    models: models.map((model) => ({
      model: model.name,
      weight: calculateWeight(model.latency, model.cost, model.quality),
    })),
  });
  ```
</CodeGroup>

**With other features:**

<CodeGroup>
  ```json JSON theme={"theme":{"light":"github-light","dark":"github-dark"}}
  {
    "model": "openai/gpt-4o",
    "load_balancer": {
      "type": "weight_based",
      "models": [
        { "model": "openai/gpt-4o", "weight": 0.6 },
        { "model": "anthropic/claude-3", "weight": 0.4 }
      ]
    },
    "orq": {
      "retry": { "count": 2, "on_codes": [429] },
      "timeout": { "call_timeout": 15000 }
    }
  }
  ```
</CodeGroup>
