Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

Observability

Instrument your code with OpenTelemetry to capture traces, logs, and metrics for every LLM call, agent step, and tool use.

Observability

Getting Started

LiteLLM provides a unified interface for multiple LLM providers, enabling seamless switching between OpenAI, Anthropic, Cohere, and 100+ other providers. Tracing LiteLLM with Orq.ai gives you comprehensive insights into provider performance, cost optimization, routing decisions, and API reliability across your multi-provider setup.

Prerequisites

Before you begin, ensure you have:

An Orq.ai account and API Key
LiteLLM installed in your project
Python 3.8+
API keys for your LLM providers (OpenAI, Anthropic, Cohere, etc.)

Install Dependencies

# Core LiteLLM and OpenTelemetry packages
pip install litellm opentelemetry-sdk opentelemetry-exporter-otlp
pip install 'litellm[proxy]'

Configure Orq.ai

Set up your environment variables to connect to Orq.ai’s OpenTelemetry collector: Unix/Linux/macOS:

export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel/v1/traces"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer <ORQ_API_KEY>"
export OTEL_RESOURCE_ATTRIBUTES="service.name=litellm-app,service.version=1.0.0"
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

Windows (PowerShell):

$env:OTEL_EXPORTER_OTLP_ENDPOINT = "https://api.orq.ai/v2/otel/v1/traces"
$env:OTEL_EXPORTER_OTLP_HEADERS = "Authorization=Bearer <ORQ_API_KEY>"
$env:OTEL_RESOURCE_ATTRIBUTES = "service.name=litellm-app,service.version=1.0.0"
$env:OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY>"

Using .env file:

OTEL_EXPORTER_OTLP_ENDPOINT=https://api.orq.ai/v2/otel/v1/traces
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer <ORQ_API_KEY>
OTEL_RESOURCE_ATTRIBUTES=service.name=litellm-app,service.version=1.0.0
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>

Integrations

Choose your preferred OpenTelemetry framework for collecting traces:

LiteLLM

Auto-instrumentation with minimal setup:

import litellm

# Use just 1 line of code, to instantly log your LLM responses across all providers with OpenTelemetry:
litellm.callbacks = ["otel"]

# Your LiteLLM code is automatically traced
response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)

Examples

Basic Multi-Provider Usage

import litellm

# Use just 1 line of code, to instantly log your LLM responses across all providers with OpenTelemetry:
litellm.callbacks = ["otel"]

def basic_multi_provider_example():
    # Define different models from various providers
    models = [
        "gpt-4",                          # OpenAI
        "claude-3-opus-20240229",         # Anthropic
        "command-nightly",                # Cohere
        "gemini-pro",                     # Google
        "llama-2-70b-chat",              # Meta (via Replicate)
    ]

    prompt = "Explain the benefits of microservices architecture in 2 sentences."

    results = []
    for model in models:
        try:
            print(f"Testing {model}...")
            response = litellm.completion(
                model=model,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=150,
                temperature=0.7
            )

            results.append({
                "model": model,
                "content": response.choices[0].message.content,
                "tokens": response.usage.total_tokens,
                "cost": response.usage.total_tokens * 0.002  # Estimated cost
            })

        except Exception as e:
            print(f"Error with {model}: {e}")
            results.append({
                "model": model,
                "error": str(e)
            })

    return results

results = basic_multi_provider_example()
for result in results:
    if "error" not in result:
        print(f"{result['model']}: {result['tokens']} tokens, ~${result['cost']:.4f}")

Cost Optimization with Provider Fallback

import openlit
import litellm
from typing import List, Dict, Any

# Use just 1 line of code, to instantly log your LLM responses across all providers with OpenTelemetry:
litellm.callbacks = ["otel"]

def cost_optimized_completion(
    messages: List[Dict[str, str]],
    fallback_models: List[str] = None,
    max_tokens: int = 100
) -> Dict[str, Any]:
    """
    Try models in order of cost efficiency with fallback options
    """
    if fallback_models is None:
        fallback_models = [
            "gpt-3.5-turbo",      # Cheapest OpenAI option
            "claude-3-haiku-20240307",  # Anthropic's fastest/cheapest
            "command",             # Cohere
            "gpt-4o-mini",        # OpenAI's smaller model
            "gpt-4",              # Fallback to premium if needed
        ]

    for i, model in enumerate(fallback_models):
        try:
            print(f"Attempting {model} (priority {i+1})...")

            response = litellm.completion(
                model=model,
                messages=messages,
                max_tokens=max_tokens,
                temperature=0.7
            )

            # Calculate approximate cost (rough estimates)
            cost_per_1k_tokens = {
                "gpt-3.5-turbo": 0.002,
                "gpt-4o-mini": 0.0015,
                "gpt-4": 0.03,
                "claude-3-haiku-20240307": 0.00025,
                "claude-3-sonnet-20240229": 0.003,
                "command": 0.015
            }

            estimated_cost = (response.usage.total_tokens / 1000) * cost_per_1k_tokens.get(model, 0.002)

            return {
                "success": True,
                "model_used": model,
                "content": response.choices[0].message.content,
                "tokens": response.usage.total_tokens,
                "estimated_cost": estimated_cost,
                "attempt_number": i + 1
            }

        except Exception as e:
            print(f"Failed with {model}: {e}")
            if i == len(fallback_models) - 1:  # Last attempt
                return {
                    "success": False,
                    "error": f"All models failed. Last error: {e}",
                    "attempts": len(fallback_models)
                }
            continue

    return {"success": False, "error": "No models available"}

# Test cost optimization
result = cost_optimized_completion([
    {"role": "user", "content": "Summarize the key benefits of using Docker containers for development"}
])

if result["success"]:
    print(f"Success with {result['model_used']} on attempt {result['attempt_number']}")
    print(f"Cost: ~${result['estimated_cost']:.4f}, Tokens: {result['tokens']}")
    print(f"Response: {result['content'][:100]}...")

View Traces

Head to the Traces tab to view LiteLLM traces in the AI Studio.

Evaluations & Experiments

Once your agents are running, use Evaluatorq to score outputs across a dataset and Experiments to compare configurations side-by-side.

Run Evaluations with Evaluatorq

Run parallel evaluations across your agents and compare results.

Run Experiments via the API

Compare agent configurations and view results in the AI Studio.

AI Providers

Coding Assistants

Frameworks

Automations

LiteLLM

Observability

Observability

Getting Started

Prerequisites

Install Dependencies

Configure Orq.ai

Integrations

LiteLLM

Examples

View Traces

Evaluations & Experiments

Run Evaluations with Evaluatorq

Run Experiments via the API

AI Providers

Coding Assistants

Frameworks

Automations

Documentation Index

Observability

​Observability

​Getting Started

​Prerequisites

​Install Dependencies

​Configure Orq.ai

​Integrations

​LiteLLM

​Examples

​View Traces

​Evaluations & Experiments

Run Evaluations with Evaluatorq

Run Experiments via the API

Observability

Getting Started

Prerequisites

Install Dependencies

Configure Orq.ai

Integrations

LiteLLM

Examples

View Traces

Evaluations & Experiments