Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

Getting Started

LiteLLM provides a unified interface for multiple LLM providers, enabling seamless switching between OpenAI, Anthropic, Cohere, and 100+ other providers. Tracing LiteLLM with Orq.ai gives you comprehensive insights into provider performance, cost optimization, routing decisions, and API reliability across your multi-provider setup.

Prerequisites

Before you begin, ensure you have:

An Orq.ai account and API Key
LiteLLM installed in your project
Python 3.8+
API keys for your LLM providers (OpenAI, Anthropic, Cohere, etc.)

Install Dependencies

# Core LiteLLM and OpenTelemetry packages
pip install litellm opentelemetry-sdk opentelemetry-exporter-otlp
pip install 'litellm[proxy]'

Configure Orq.ai

Set up your environment variables to connect to Orq.ai’s OpenTelemetry collector: Unix/Linux/macOS:

export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel/v1/traces"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer <ORQ_API_KEY>"
export OTEL_RESOURCE_ATTRIBUTES="service.name=litellm-app,service.version=1.0.0"
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

Windows (PowerShell):

$env:OTEL_EXPORTER_OTLP_ENDPOINT = "https://api.orq.ai/v2/otel/v1/traces"
$env:OTEL_EXPORTER_OTLP_HEADERS = "Authorization=Bearer <ORQ_API_KEY>"
$env:OTEL_RESOURCE_ATTRIBUTES = "service.name=litellm-app,service.version=1.0.0"
$env:OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY>"

Using .env file:

OTEL_EXPORTER_OTLP_ENDPOINT=https://api.orq.ai/v2/otel/v1/traces
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer <ORQ_API_KEY>
OTEL_RESOURCE_ATTRIBUTES=service.name=litellm-app,service.version=1.0.0
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>

Integrations

Choose your preferred OpenTelemetry framework for collecting traces:

LiteLLM

Auto-instrumentation with minimal setup:

import litellm

# Use just 1 line of code, to instantly log your LLM responses across all providers with OpenTelemetry:
litellm.callbacks = ["otel"]

# Your LiteLLM code is automatically traced
response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)

Examples

**Basic Multi-Provider Usage

import litellm

# Use just 1 line of code, to instantly log your LLM responses across all providers with OpenTelemetry:
litellm.callbacks = ["otel"]

def basic_multi_provider_example():
    # Define different models from various providers
    models = [
        "gpt-4",                          # OpenAI
        "claude-3-opus-20240229",         # Anthropic
        "command-nightly",                # Cohere
        "gemini-pro",                     # Google
        "llama-2-70b-chat",              # Meta (via Replicate)
    ]

    prompt = "Explain the benefits of microservices architecture in 2 sentences."

    results = []
    for model in models:
        try:
            print(f"Testing {model}...")
            response = litellm.completion(
                model=model,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=150,
                temperature=0.7
            )

            results.append({
                "model": model,
                "content": response.choices[0].message.content,
                "tokens": response.usage.total_tokens,
                "cost": response.usage.total_tokens * 0.002  # Estimated cost
            })

        except Exception as e:
            print(f"Error with {model}: {e}")
            results.append({
                "model": model,
                "error": str(e)
            })

    return results

results = basic_multi_provider_example()
for result in results:
    if "error" not in result:
        print(f"{result['model']}: {result['tokens']} tokens, ~${result['cost']:.4f}")

**Cost Optimization with Provider Fallback

import openlit
import litellm
from typing import List, Dict, Any

# Use just 1 line of code, to instantly log your LLM responses across all providers with OpenTelemetry:
litellm.callbacks = ["otel"]

def cost_optimized_completion(
    messages: List[Dict[str, str]],
    fallback_models: List[str] = None,
    max_tokens: int = 100
) -> Dict[str, Any]:
    """
    Try models in order of cost efficiency with fallback options
    """
    if fallback_models is None:
        fallback_models = [
            "gpt-3.5-turbo",      # Cheapest OpenAI option
            "claude-3-haiku-20240307",  # Anthropic's fastest/cheapest
            "command",             # Cohere
            "gpt-4o-mini",        # OpenAI's smaller model
            "gpt-4",              # Fallback to premium if needed
        ]

    for i, model in enumerate(fallback_models):
        try:
            print(f"Attempting {model} (priority {i+1})...")

            response = litellm.completion(
                model=model,
                messages=messages,
                max_tokens=max_tokens,
                temperature=0.7
            )

            # Calculate approximate cost (rough estimates)
            cost_per_1k_tokens = {
                "gpt-3.5-turbo": 0.002,
                "gpt-4o-mini": 0.0015,
                "gpt-4": 0.03,
                "claude-3-haiku-20240307": 0.00025,
                "claude-3-sonnet-20240229": 0.003,
                "command": 0.015
            }

            estimated_cost = (response.usage.total_tokens / 1000) * cost_per_1k_tokens.get(model, 0.002)

            return {
                "success": True,
                "model_used": model,
                "content": response.choices[0].message.content,
                "tokens": response.usage.total_tokens,
                "estimated_cost": estimated_cost,
                "attempt_number": i + 1
            }

        except Exception as e:
            print(f"Failed with {model}: {e}")
            if i == len(fallback_models) - 1:  # Last attempt
                return {
                    "success": False,
                    "error": f"All models failed. Last error: {e}",
                    "attempts": len(fallback_models)
                }
            continue

    return {"success": False, "error": "No models available"}

# Test cost optimization
result = cost_optimized_completion([
    {"role": "user", "content": "Summarize the key benefits of using Docker containers for development"}
])

if result["success"]:
    print(f"Success with {result['model_used']} on attempt {result['attempt_number']}")
    print(f"Cost: ~${result['estimated_cost']:.4f}, Tokens: {result['tokens']}")
    print(f"Response: {result['content'][:100]}...")

View Traces

Head to the Traces tab to view LiteLLM traces in the Orq.ai Studio.

LlamaIndex Mastra

Getting Started

Reference

Administer

LiteLLM

Getting Started

Prerequisites

Install Dependencies

Configure Orq.ai

Integrations

LiteLLM

Examples

View Traces

Getting Started

Reference

Administer

​Getting Started

​Prerequisites

​Install Dependencies

​Configure Orq.ai

​Integrations

​LiteLLM

​Examples

​View Traces

Getting Started

Prerequisites

Install Dependencies

Configure Orq.ai

Integrations

LiteLLM

Examples

View Traces