LiteLLM

Integrate Orq.ai with LiteLLM using OpenTelemetry

Getting Started

LiteLLM provides a unified interface for multiple LLM providers, enabling seamless switching between OpenAI, Anthropic, Cohere, and 100+ other providers. Tracing LiteLLM with Orq.ai gives you comprehensive insights into provider performance, cost optimization, routing decisions, and API reliability across your multi-provider setup.

Prerequisites

Before you begin, ensure you have:

  • An Orq.ai account and API key
  • LiteLLM installed in your project
  • Python 3.8+
  • API keys for your LLM providers (OpenAI, Anthropic, Cohere, etc.)

Install Dependencies

# Core LiteLLM and OpenTelemetry packages
pip install litellm opentelemetry-sdk opentelemetry-exporter-otlp

# Additional instrumentation packages
pip install openlit traceloop-sdk

# Optional: Specific provider packages
pip install openai anthropic cohere

Configure Orq.ai

Set up your environment variables to connect to Orq.ai's OpenTelemetry collector:

Unix/Linux/macOS:

export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer <ORQ_API_KEY>"
export OTEL_RESOURCE_ATTRIBUTES="service.name=litellm-app,service.version=1.0.0"
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
export ANTHROPIC_API_KEY="<YOUR_ANTHROPIC_API_KEY>"
export COHERE_API_KEY="<YOUR_COHERE_API_KEY>"

Windows (PowerShell):

$env:OTEL_EXPORTER_OTLP_ENDPOINT = "https://api.orq.ai/v2/otel"
$env:OTEL_EXPORTER_OTLP_HEADERS = "Authorization=Bearer <ORQ_API_KEY>"
$env:OTEL_RESOURCE_ATTRIBUTES = "service.name=litellm-app,service.version=1.0.0"
$env:OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY>"
$env:ANTHROPIC_API_KEY = "<YOUR_ANTHROPIC_API_KEY>"
$env:COHERE_API_KEY = "<YOUR_COHERE_API_KEY>"

Using .env file:

OTEL_EXPORTER_OTLP_ENDPOINT=https://api.orq.ai/v2/otel
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer <ORQ_API_KEY>
OTEL_RESOURCE_ATTRIBUTES=service.name=litellm-app,service.version=1.0.0
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
ANTHROPIC_API_KEY=<YOUR_ANTHROPIC_API_KEY>
COHERE_API_KEY=<YOUR_COHERE_API_KEY>

Integrations

Choose your preferred OpenTelemetry framework for collecting traces:

OpenLit

Auto-instrumentation with minimal setup:

import openlit
import litellm

# Initialize OpenLit
openlit.init(
    otlp_endpoint="https://api.orq.ai/v2/otel",
    otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)

# Your LiteLLM code is automatically traced
response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)

OpenLLMetry

Non-intrusive tracing with decorators:

from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow
import litellm

Traceloop.init()

@workflow(name="litellm-multi-provider-workflow")
def multi_provider_comparison():
    providers = ["gpt-4", "claude-3-sonnet-20240229", "command-nightly"]
    results = []

    for provider in providers:
        try:
            response = litellm.completion(
                model=provider,
                messages=[{"role": "user", "content": "Explain quantum computing in one sentence"}],
                max_tokens=100
            )
            results.append({
                "provider": provider,
                "response": response.choices[0].message.content,
                "tokens": response.usage.total_tokens
            })
        except Exception as e:
            results.append({
                "provider": provider,
                "error": str(e)
            })

    return results

comparison = multi_provider_comparison()

Logfire

Pydantic-based observability:

import logfire
import litellm

logfire.configure()

def litellm_with_logfire():
    with logfire.span("litellm-completion") as span:
        response = litellm.completion(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": "What is the weather like today?"}]
        )

        span.set_attribute("model", "gpt-3.5-turbo")
        span.set_attribute("tokens_used", response.usage.total_tokens)
        span.set_attribute("cost", response.usage.total_tokens * 0.001)  # Rough estimate

        return response

result = litellm_with_logfire()

OpenLLMetry

Comprehensive LiteLLM instrumentation:

from traceloop.sdk import Traceloop
import litellm
import asyncio

# Initialize tracing
Traceloop.init(
    app_name="litellm-app",
    disable_batch=True
)

async def async_litellm_example():
    # Async LiteLLM calls are automatically traced
    tasks = []
    models = ["gpt-3.5-turbo", "claude-3-haiku-20240307", "gemini-pro"]

    for model in models:
        task = litellm.acompletion(
            model=model,
            messages=[{"role": "user", "content": "Write a haiku about programming"}],
            max_tokens=50
        )
        tasks.append(task)

    responses = await asyncio.gather(*tasks, return_exceptions=True)

    results = []
    for i, response in enumerate(responses):
        if isinstance(response, Exception):
            results.append({"model": models[i], "error": str(response)})
        else:
            results.append({
                "model": models[i],
                "content": response.choices[0].message.content,
                "tokens": response.usage.total_tokens
            })

    return results

# Run async example
async_results = asyncio.run(async_litellm_example())

Examples

Basic Multi-Provider Usage

import openlit
import litellm

# Initialize tracing
openlit.init(
    otlp_endpoint="https://api.orq.ai/v2/otel",
    otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)

def basic_multi_provider_example():
    # Define different models from various providers
    models = [
        "gpt-4",                          # OpenAI
        "claude-3-opus-20240229",         # Anthropic
        "command-nightly",                # Cohere
        "gemini-pro",                     # Google
        "llama-2-70b-chat",              # Meta (via Replicate)
    ]

    prompt = "Explain the benefits of microservices architecture in 2 sentences."

    results = []
    for model in models:
        try:
            print(f"Testing {model}...")
            response = litellm.completion(
                model=model,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=150,
                temperature=0.7
            )

            results.append({
                "model": model,
                "content": response.choices[0].message.content,
                "tokens": response.usage.total_tokens,
                "cost": response.usage.total_tokens * 0.002  # Estimated cost
            })

        except Exception as e:
            print(f"Error with {model}: {e}")
            results.append({
                "model": model,
                "error": str(e)
            })

    return results

results = basic_multi_provider_example()
for result in results:
    if "error" not in result:
        print(f"{result['model']}: {result['tokens']} tokens, ~${result['cost']:.4f}")

Cost Optimization with Provider Fallback

import openlit
import litellm
from typing import List, Dict, Any

openlit.init(
    otlp_endpoint="https://api.orq.ai/v2/otel",
    otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)

def cost_optimized_completion(
    messages: List[Dict[str, str]],
    fallback_models: List[str] = None,
    max_tokens: int = 100
) -> Dict[str, Any]:
    """
    Try models in order of cost efficiency with fallback options
    """
    if fallback_models is None:
        fallback_models = [
            "gpt-3.5-turbo",      # Cheapest OpenAI option
            "claude-3-haiku-20240307",  # Anthropic's fastest/cheapest
            "command",             # Cohere
            "gpt-4o-mini",        # OpenAI's smaller model
            "gpt-4",              # Fallback to premium if needed
        ]

    for i, model in enumerate(fallback_models):
        try:
            print(f"Attempting {model} (priority {i+1})...")

            response = litellm.completion(
                model=model,
                messages=messages,
                max_tokens=max_tokens,
                temperature=0.7
            )

            # Calculate approximate cost (rough estimates)
            cost_per_1k_tokens = {
                "gpt-3.5-turbo": 0.002,
                "gpt-4o-mini": 0.0015,
                "gpt-4": 0.03,
                "claude-3-haiku-20240307": 0.00025,
                "claude-3-sonnet-20240229": 0.003,
                "command": 0.015
            }

            estimated_cost = (response.usage.total_tokens / 1000) * cost_per_1k_tokens.get(model, 0.002)

            return {
                "success": True,
                "model_used": model,
                "content": response.choices[0].message.content,
                "tokens": response.usage.total_tokens,
                "estimated_cost": estimated_cost,
                "attempt_number": i + 1
            }

        except Exception as e:
            print(f"Failed with {model}: {e}")
            if i == len(fallback_models) - 1:  # Last attempt
                return {
                    "success": False,
                    "error": f"All models failed. Last error: {e}",
                    "attempts": len(fallback_models)
                }
            continue

    return {"success": False, "error": "No models available"}

# Test cost optimization
result = cost_optimized_completion([
    {"role": "user", "content": "Summarize the key benefits of using Docker containers for development"}
])

if result["success"]:
    print(f"Success with {result['model_used']} on attempt {result['attempt_number']}")
    print(f"Cost: ~${result['estimated_cost']:.4f}, Tokens: {result['tokens']}")
    print(f"Response: {result['content'][:100]}...")

Custom Spans for Provider Performance Monitoring

from opentelemetry import trace
import openlit
import litellm
import time

openlit.init(
    otlp_endpoint="https://api.orq.ai/v2/otel",
    otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)

tracer = trace.get_tracer("litellm")

def provider_performance_benchmark():
    with tracer.start_as_current_span("provider-benchmark") as benchmark_span:
        providers = [
            {"model": "gpt-3.5-turbo", "provider": "openai"},
            {"model": "claude-3-haiku-20240307", "provider": "anthropic"},
            {"model": "command", "provider": "cohere"},
        ]

        benchmark_results = []

        for provider_info in providers:
            with tracer.start_as_current_span(f"test-{provider_info['provider']}") as provider_span:
                provider_span.set_attribute("provider.name", provider_info["provider"])
                provider_span.set_attribute("model.name", provider_info["model"])

                start_time = time.time()

                try:
                    response = litellm.completion(
                        model=provider_info["model"],
                        messages=[{
                            "role": "user",
                            "content": "Write a brief explanation of machine learning in exactly 50 words."
                        }],
                        max_tokens=75,
                        temperature=0.5
                    )

                    end_time = time.time()
                    response_time = end_time - start_time

                    provider_span.set_attribute("response.success", True)
                    provider_span.set_attribute("response.time_seconds", response_time)
                    provider_span.set_attribute("tokens.total", response.usage.total_tokens)
                    provider_span.set_attribute("tokens.prompt", response.usage.prompt_tokens)
                    provider_span.set_attribute("tokens.completion", response.usage.completion_tokens)

                    benchmark_results.append({
                        "provider": provider_info["provider"],
                        "model": provider_info["model"],
                        "success": True,
                        "response_time": response_time,
                        "tokens": response.usage.total_tokens,
                        "content": response.choices[0].message.content
                    })

                except Exception as e:
                    end_time = time.time()
                    response_time = end_time - start_time

                    provider_span.record_exception(e)
                    provider_span.set_attribute("response.success", False)
                    provider_span.set_attribute("response.time_seconds", response_time)
                    provider_span.set_attribute("error.message", str(e))

                    benchmark_results.append({
                        "provider": provider_info["provider"],
                        "model": provider_info["model"],
                        "success": False,
                        "response_time": response_time,
                        "error": str(e)
                    })

        # Calculate performance metrics
        successful_tests = [r for r in benchmark_results if r["success"]]
        if successful_tests:
            avg_response_time = sum(r["response_time"] for r in successful_tests) / len(successful_tests)
            fastest_provider = min(successful_tests, key=lambda x: x["response_time"])

            benchmark_span.set_attribute("benchmark.total_providers", len(providers))
            benchmark_span.set_attribute("benchmark.successful_providers", len(successful_tests))
            benchmark_span.set_attribute("benchmark.avg_response_time", avg_response_time)
            benchmark_span.set_attribute("benchmark.fastest_provider", fastest_provider["provider"])

        return benchmark_results

benchmark_results = provider_performance_benchmark()

# Print results
print("\n=== Provider Performance Benchmark ===")
for result in benchmark_results:
    if result["success"]:
        print(f"{result['provider']:12} | {result['response_time']:.2f}s | {result['tokens']} tokens")
    else:
        print(f"{result['provider']:12} | FAILED | {result['error'][:50]}...")

Streaming Responses with Multiple Providers

import openlit
import litellm

openlit.init(
    otlp_endpoint="https://api.orq.ai/v2/otel",
    otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)

def streaming_comparison():
    """Compare streaming capabilities across providers"""

    providers = ["gpt-3.5-turbo", "claude-3-haiku-20240307"]
    prompt = "Write a short story about a robot learning to paint."

    for provider in providers:
        print(f"\n--- Streaming from {provider} ---")
        try:
            response = litellm.completion(
                model=provider,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=200,
                stream=True,
                temperature=0.8
            )

            full_response = ""
            chunk_count = 0

            for chunk in response:
                if chunk.choices[0].delta.content:
                    content = chunk.choices[0].delta.content
                    full_response += content
                    print(content, end='', flush=True)
                    chunk_count += 1

            print(f"\n\n[Completed: {chunk_count} chunks, {len(full_response)} characters]")

        except Exception as e:
            print(f"Streaming failed for {provider}: {e}")

streaming_comparison()

Router Configuration with Load Balancing

import openlit
import litellm

openlit.init(
    otlp_endpoint="https://api.orq.ai/v2/otel",
    otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)

def setup_litellm_router():
    """Configure LiteLLM router for load balancing and fallbacks"""

    # Define model configurations
    model_list = [
        {
            "model_name": "gpt-4-turbo",  # Model alias
            "litellm_params": {
                "model": "gpt-4-1106-preview",
                "api_key": "sk-...",  # Your OpenAI API key
            },
        },
        {
            "model_name": "gpt-4-turbo",  # Same alias for load balancing
            "litellm_params": {
                "model": "gpt-4-1106-preview",
                "api_key": "sk-...",  # Different API key for load balancing
                "api_base": "https://api.openai.com/v1"
            },
        },
        {
            "model_name": "claude-opus",
            "litellm_params": {
                "model": "claude-3-opus-20240229",
                "api_key": "sk-ant-...",
            },
        }
    ]

    # Initialize router
    router = litellm.Router(model_list=model_list)

    return router

def router_example():
    router = setup_litellm_router()

    # Use router for load-balanced requests
    response = router.completion(
        model="gpt-4-turbo",  # Uses alias, automatically load balances
        messages=[{"role": "user", "content": "Explain the concept of distributed systems"}],
        max_tokens=150
    )

    print("Router response:", response.choices[0].message.content)

    # Router automatically handles failover
    try:
        response = router.completion(
            model="claude-opus",
            messages=[{"role": "user", "content": "What are the benefits of functional programming?"}],
            max_tokens=150
        )
        print("Claude response:", response.choices[0].message.content)
    except Exception as e:
        print(f"Router handled error: {e}")

router_example()

Next Steps

Verify traces: Check your Orq.ai dashboard to see incoming traces ✅ Add custom attributes: Enhance traces with business-specific metadata ✅ Set up alerts: Configure monitoring for performance degradation ✅ Explore metrics: Use trace data for performance optimization

Related Documentation

Support