> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# LiteLLM

> Send LiteLLM traces to Orq.ai using OpenTelemetry instrumentation. Monitor LLM calls, costs, and latency across all providers LiteLLM supports.

<CardGroup cols={1}>
  <Card title="Observability" icon="chart-line" href="#observability">
    Instrument your code with OpenTelemetry to capture traces, logs, and metrics for every LLM call, agent step, and tool use.
  </Card>
</CardGroup>

## Observability

### Getting Started

LiteLLM provides a unified interface for multiple LLM providers, enabling seamless switching between OpenAI, Anthropic, Cohere, and 100+ other providers. Tracing LiteLLM with Orq.ai gives you comprehensive insights into provider performance, cost optimization, routing decisions, and API reliability across your multi-provider setup.

### Prerequisites

Before you begin, ensure you have:

* An Orq.ai account and [API Key](/docs/administer/api-keys)
* LiteLLM installed in your project
* Python 3.8+
* API keys for your LLM providers (OpenAI, Anthropic, Cohere, etc.)

### Install Dependencies

<CodeGroup>
  ```bash Bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
  # Core LiteLLM and OpenTelemetry packages
  pip install litellm opentelemetry-sdk opentelemetry-exporter-otlp
  pip install 'litellm[proxy]'
  ```
</CodeGroup>

### Configure Orq.ai

Set up your environment variables to connect to Orq.ai's OpenTelemetry collector:

**Unix/Linux/macOS:**

<CodeGroup>
  ```bash Bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
  export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel/v1/traces"
  export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer <ORQ_API_KEY>"
  export OTEL_RESOURCE_ATTRIBUTES="service.name=litellm-app,service.version=1.0.0"
  export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
  ```
</CodeGroup>

**Windows (PowerShell):**

<CodeGroup>
  ```bash Bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
  $env:OTEL_EXPORTER_OTLP_ENDPOINT = "https://api.orq.ai/v2/otel/v1/traces"
  $env:OTEL_EXPORTER_OTLP_HEADERS = "Authorization=Bearer <ORQ_API_KEY>"
  $env:OTEL_RESOURCE_ATTRIBUTES = "service.name=litellm-app,service.version=1.0.0"
  $env:OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY>"
  ```
</CodeGroup>

**Using .env file:**

<CodeGroup>
  ```bash Bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
  OTEL_EXPORTER_OTLP_ENDPOINT=https://api.orq.ai/v2/otel/v1/traces
  OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer <ORQ_API_KEY>
  OTEL_RESOURCE_ATTRIBUTES=service.name=litellm-app,service.version=1.0.0
  OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
  ```
</CodeGroup>

### Integrations

Choose your preferred OpenTelemetry framework for collecting traces:

### LiteLLM

Auto-instrumentation with minimal setup:

<CodeGroup>
  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import litellm

  # Use just 1 line of code, to instantly log your LLM responses across all providers with OpenTelemetry:
  litellm.callbacks = ["otel"]

  # Your LiteLLM code is automatically traced
  response = litellm.completion(
      model="gpt-4o",
      messages=[{"role": "user", "content": "Hello, how are you?"}]
  )
  print(response.choices[0].message.content)
  ```
</CodeGroup>

### Examples

**Basic Multi-Provider Usage**

<CodeGroup>
  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import litellm

  # Use just 1 line of code, to instantly log your LLM responses across all providers with OpenTelemetry:
  litellm.callbacks = ["otel"]

  def basic_multi_provider_example():
      # Define different models from various providers
      models = [
          "gpt-4",                          # OpenAI
          "claude-3-opus-20240229",         # Anthropic
          "command-nightly",                # Cohere
          "gemini-pro",                     # Google
          "llama-2-70b-chat",              # Meta (via Replicate)
      ]

      prompt = "Explain the benefits of microservices architecture in 2 sentences."

      results = []
      for model in models:
          try:
              print(f"Testing {model}...")
              response = litellm.completion(
                  model=model,
                  messages=[{"role": "user", "content": prompt}],
                  max_tokens=150,
                  temperature=0.7
              )

              results.append({
                  "model": model,
                  "content": response.choices[0].message.content,
                  "tokens": response.usage.total_tokens,
                  "cost": response.usage.total_tokens * 0.002  # Estimated cost
              })

          except Exception as e:
              print(f"Error with {model}: {e}")
              results.append({
                  "model": model,
                  "error": str(e)
              })

      return results

  results = basic_multi_provider_example()
  for result in results:
      if "error" not in result:
          print(f"{result['model']}: {result['tokens']} tokens, ~${result['cost']:.4f}")
  ```
</CodeGroup>

**Cost Optimization with Provider Fallback**

<CodeGroup>
  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import openlit
  import litellm
  from typing import List, Dict, Any

  # Use just 1 line of code, to instantly log your LLM responses across all providers with OpenTelemetry:
  litellm.callbacks = ["otel"]

  def cost_optimized_completion(
      messages: List[Dict[str, str]],
      fallback_models: List[str] = None,
      max_tokens: int = 100
  ) -> Dict[str, Any]:
      """
      Try models in order of cost efficiency with fallback options
      """
      if fallback_models is None:
          fallback_models = [
              "gpt-3.5-turbo",      # Cheapest OpenAI option
              "claude-3-haiku-20240307",  # Anthropic's fastest/cheapest
              "command",             # Cohere
              "gpt-4o-mini",        # OpenAI's smaller model
              "gpt-4",              # Fallback to premium if needed
          ]

      for i, model in enumerate(fallback_models):
          try:
              print(f"Attempting {model} (priority {i+1})...")

              response = litellm.completion(
                  model=model,
                  messages=messages,
                  max_tokens=max_tokens,
                  temperature=0.7
              )

              # Calculate approximate cost (rough estimates)
              cost_per_1k_tokens = {
                  "gpt-3.5-turbo": 0.002,
                  "gpt-4o-mini": 0.0015,
                  "gpt-4": 0.03,
                  "claude-3-haiku-20240307": 0.00025,
                  "claude-3-sonnet-20240229": 0.003,
                  "command": 0.015
              }

              estimated_cost = (response.usage.total_tokens / 1000) * cost_per_1k_tokens.get(model, 0.002)

              return {
                  "success": True,
                  "model_used": model,
                  "content": response.choices[0].message.content,
                  "tokens": response.usage.total_tokens,
                  "estimated_cost": estimated_cost,
                  "attempt_number": i + 1
              }

          except Exception as e:
              print(f"Failed with {model}: {e}")
              if i == len(fallback_models) - 1:  # Last attempt
                  return {
                      "success": False,
                      "error": f"All models failed. Last error: {e}",
                      "attempts": len(fallback_models)
                  }
              continue

      return {"success": False, "error": "No models available"}

  # Test cost optimization
  result = cost_optimized_completion([
      {"role": "user", "content": "Summarize the key benefits of using Docker containers for development"}
  ])

  if result["success"]:
      print(f"Success with {result['model_used']} on attempt {result['attempt_number']}")
      print(f"Cost: ~${result['estimated_cost']:.4f}, Tokens: {result['tokens']}")
      print(f"Response: {result['content'][:100]}...")
  ```
</CodeGroup>

### View Traces

Head to the [Traces](/docs/observability/traces) tab to view LiteLLM traces in the AI Studio.

<img src="https://mintcdn.com/orqai/8ublVIDMeb653NWy/images/docs/4f393acb185bb7a48d77798bb71ed3d5e53e2d5a64caa019158ac2a949a0a291-Screenshot_2025-10-06_at_15.56.09.png?fit=max&auto=format&n=8ublVIDMeb653NWy&q=85&s=e4aff60e61ea52aac10d410b51cad7e2" alt="View Traces" width="1101" height="639" data-path="images/docs/4f393acb185bb7a48d77798bb71ed3d5e53e2d5a64caa019158ac2a949a0a291-Screenshot_2025-10-06_at_15.56.09.png" />

## Evaluations & Experiments

Once your agents are running, use **Evaluatorq** to score outputs across a dataset and **Experiments** to compare configurations side-by-side.

<CardGroup cols={2}>
  <Card title="Run Evaluations with Evaluatorq" icon="flask" href="/docs/evaluators/build#evaluatorq">
    Run parallel evaluations across your agents and compare results.
  </Card>

  <Card title="Run Experiments via the API" icon="flask-vial" href="/docs/experiments/api">
    Compare agent configurations and view results in the AI Studio.
  </Card>
</CardGroup>
