Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.orq.ai/llms.txt

Use this file to discover all available pages before exploring further.

Observability

Instrument your code with OpenTelemetry to capture traces, logs, and metrics for every LLM call, agent step, and tool use.

Observability

Getting Started

LiteLLM provides a unified interface for multiple LLM providers, enabling seamless switching between OpenAI, Anthropic, Cohere, and 100+ other providers. Tracing LiteLLM with Orq.ai gives you comprehensive insights into provider performance, cost optimization, routing decisions, and API reliability across your multi-provider setup.

Prerequisites

Before you begin, ensure you have:
  • An Orq.ai account and API Key
  • LiteLLM installed in your project
  • Python 3.8+
  • API keys for your LLM providers (OpenAI, Anthropic, Cohere, etc.)

Install Dependencies

# Core LiteLLM and OpenTelemetry packages
pip install litellm opentelemetry-sdk opentelemetry-exporter-otlp
pip install 'litellm[proxy]'

Configure Orq.ai

Set up your environment variables to connect to Orq.ai’s OpenTelemetry collector: Unix/Linux/macOS:
export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel/v1/traces"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer <ORQ_API_KEY>"
export OTEL_RESOURCE_ATTRIBUTES="service.name=litellm-app,service.version=1.0.0"
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
Windows (PowerShell):
$env:OTEL_EXPORTER_OTLP_ENDPOINT = "https://api.orq.ai/v2/otel/v1/traces"
$env:OTEL_EXPORTER_OTLP_HEADERS = "Authorization=Bearer <ORQ_API_KEY>"
$env:OTEL_RESOURCE_ATTRIBUTES = "service.name=litellm-app,service.version=1.0.0"
$env:OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY>"
Using .env file:
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.orq.ai/v2/otel/v1/traces
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer <ORQ_API_KEY>
OTEL_RESOURCE_ATTRIBUTES=service.name=litellm-app,service.version=1.0.0
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>

Integrations

Choose your preferred OpenTelemetry framework for collecting traces:

LiteLLM

Auto-instrumentation with minimal setup:
import litellm

# Use just 1 line of code, to instantly log your LLM responses across all providers with OpenTelemetry:
litellm.callbacks = ["otel"]

# Your LiteLLM code is automatically traced
response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)

Examples

Basic Multi-Provider Usage
import litellm

# Use just 1 line of code, to instantly log your LLM responses across all providers with OpenTelemetry:
litellm.callbacks = ["otel"]

def basic_multi_provider_example():
    # Define different models from various providers
    models = [
        "gpt-4",                          # OpenAI
        "claude-3-opus-20240229",         # Anthropic
        "command-nightly",                # Cohere
        "gemini-pro",                     # Google
        "llama-2-70b-chat",              # Meta (via Replicate)
    ]

    prompt = "Explain the benefits of microservices architecture in 2 sentences."

    results = []
    for model in models:
        try:
            print(f"Testing {model}...")
            response = litellm.completion(
                model=model,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=150,
                temperature=0.7
            )

            results.append({
                "model": model,
                "content": response.choices[0].message.content,
                "tokens": response.usage.total_tokens,
                "cost": response.usage.total_tokens * 0.002  # Estimated cost
            })

        except Exception as e:
            print(f"Error with {model}: {e}")
            results.append({
                "model": model,
                "error": str(e)
            })

    return results

results = basic_multi_provider_example()
for result in results:
    if "error" not in result:
        print(f"{result['model']}: {result['tokens']} tokens, ~${result['cost']:.4f}")
Cost Optimization with Provider Fallback
import openlit
import litellm
from typing import List, Dict, Any

# Use just 1 line of code, to instantly log your LLM responses across all providers with OpenTelemetry:
litellm.callbacks = ["otel"]

def cost_optimized_completion(
    messages: List[Dict[str, str]],
    fallback_models: List[str] = None,
    max_tokens: int = 100
) -> Dict[str, Any]:
    """
    Try models in order of cost efficiency with fallback options
    """
    if fallback_models is None:
        fallback_models = [
            "gpt-3.5-turbo",      # Cheapest OpenAI option
            "claude-3-haiku-20240307",  # Anthropic's fastest/cheapest
            "command",             # Cohere
            "gpt-4o-mini",        # OpenAI's smaller model
            "gpt-4",              # Fallback to premium if needed
        ]

    for i, model in enumerate(fallback_models):
        try:
            print(f"Attempting {model} (priority {i+1})...")

            response = litellm.completion(
                model=model,
                messages=messages,
                max_tokens=max_tokens,
                temperature=0.7
            )

            # Calculate approximate cost (rough estimates)
            cost_per_1k_tokens = {
                "gpt-3.5-turbo": 0.002,
                "gpt-4o-mini": 0.0015,
                "gpt-4": 0.03,
                "claude-3-haiku-20240307": 0.00025,
                "claude-3-sonnet-20240229": 0.003,
                "command": 0.015
            }

            estimated_cost = (response.usage.total_tokens / 1000) * cost_per_1k_tokens.get(model, 0.002)

            return {
                "success": True,
                "model_used": model,
                "content": response.choices[0].message.content,
                "tokens": response.usage.total_tokens,
                "estimated_cost": estimated_cost,
                "attempt_number": i + 1
            }

        except Exception as e:
            print(f"Failed with {model}: {e}")
            if i == len(fallback_models) - 1:  # Last attempt
                return {
                    "success": False,
                    "error": f"All models failed. Last error: {e}",
                    "attempts": len(fallback_models)
                }
            continue

    return {"success": False, "error": "No models available"}

# Test cost optimization
result = cost_optimized_completion([
    {"role": "user", "content": "Summarize the key benefits of using Docker containers for development"}
])

if result["success"]:
    print(f"Success with {result['model_used']} on attempt {result['attempt_number']}")
    print(f"Cost: ~${result['estimated_cost']:.4f}, Tokens: {result['tokens']}")
    print(f"Response: {result['content'][:100]}...")

View Traces

Head to the Traces tab to view LiteLLM traces in the AI Studio.

Evaluations & Experiments

Once your agents are running, use Evaluatorq to score outputs across a dataset and Experiments to compare configurations side-by-side.

Run Evaluations with Evaluatorq

Run parallel evaluations across your agents and compare results.

Run Experiments via the API

Compare agent configurations and view results in the AI Studio.