LiteLLM
Integrate Orq.ai with LiteLLM using OpenTelemetry
Getting Started
LiteLLM provides a unified interface for multiple LLM providers, enabling seamless switching between OpenAI, Anthropic, Cohere, and 100+ other providers. Tracing LiteLLM with Orq.ai gives you comprehensive insights into provider performance, cost optimization, routing decisions, and API reliability across your multi-provider setup.
Prerequisites
Before you begin, ensure you have:
- An Orq.ai account and API key
- LiteLLM installed in your project
- Python 3.8+
- API keys for your LLM providers (OpenAI, Anthropic, Cohere, etc.)
Install Dependencies
# Core LiteLLM and OpenTelemetry packages
pip install litellm opentelemetry-sdk opentelemetry-exporter-otlp
# Additional instrumentation packages
pip install openlit traceloop-sdk
# Optional: Specific provider packages
pip install openai anthropic cohere
Configure Orq.ai
Set up your environment variables to connect to Orq.ai's OpenTelemetry collector:
Unix/Linux/macOS:
export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer <ORQ_API_KEY>"
export OTEL_RESOURCE_ATTRIBUTES="service.name=litellm-app,service.version=1.0.0"
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
export ANTHROPIC_API_KEY="<YOUR_ANTHROPIC_API_KEY>"
export COHERE_API_KEY="<YOUR_COHERE_API_KEY>"
Windows (PowerShell):
$env:OTEL_EXPORTER_OTLP_ENDPOINT = "https://api.orq.ai/v2/otel"
$env:OTEL_EXPORTER_OTLP_HEADERS = "Authorization=Bearer <ORQ_API_KEY>"
$env:OTEL_RESOURCE_ATTRIBUTES = "service.name=litellm-app,service.version=1.0.0"
$env:OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY>"
$env:ANTHROPIC_API_KEY = "<YOUR_ANTHROPIC_API_KEY>"
$env:COHERE_API_KEY = "<YOUR_COHERE_API_KEY>"
Using .env file:
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.orq.ai/v2/otel
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer <ORQ_API_KEY>
OTEL_RESOURCE_ATTRIBUTES=service.name=litellm-app,service.version=1.0.0
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
ANTHROPIC_API_KEY=<YOUR_ANTHROPIC_API_KEY>
COHERE_API_KEY=<YOUR_COHERE_API_KEY>
Integrations
Choose your preferred OpenTelemetry framework for collecting traces:
OpenLit
Auto-instrumentation with minimal setup:
import openlit
import litellm
# Initialize OpenLit
openlit.init(
otlp_endpoint="https://api.orq.ai/v2/otel",
otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)
# Your LiteLLM code is automatically traced
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)
OpenLLMetry
Non-intrusive tracing with decorators:
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow
import litellm
Traceloop.init()
@workflow(name="litellm-multi-provider-workflow")
def multi_provider_comparison():
providers = ["gpt-4", "claude-3-sonnet-20240229", "command-nightly"]
results = []
for provider in providers:
try:
response = litellm.completion(
model=provider,
messages=[{"role": "user", "content": "Explain quantum computing in one sentence"}],
max_tokens=100
)
results.append({
"provider": provider,
"response": response.choices[0].message.content,
"tokens": response.usage.total_tokens
})
except Exception as e:
results.append({
"provider": provider,
"error": str(e)
})
return results
comparison = multi_provider_comparison()
Logfire
Pydantic-based observability:
import logfire
import litellm
logfire.configure()
def litellm_with_logfire():
with logfire.span("litellm-completion") as span:
response = litellm.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "What is the weather like today?"}]
)
span.set_attribute("model", "gpt-3.5-turbo")
span.set_attribute("tokens_used", response.usage.total_tokens)
span.set_attribute("cost", response.usage.total_tokens * 0.001) # Rough estimate
return response
result = litellm_with_logfire()
OpenLLMetry
Comprehensive LiteLLM instrumentation:
from traceloop.sdk import Traceloop
import litellm
import asyncio
# Initialize tracing
Traceloop.init(
app_name="litellm-app",
disable_batch=True
)
async def async_litellm_example():
# Async LiteLLM calls are automatically traced
tasks = []
models = ["gpt-3.5-turbo", "claude-3-haiku-20240307", "gemini-pro"]
for model in models:
task = litellm.acompletion(
model=model,
messages=[{"role": "user", "content": "Write a haiku about programming"}],
max_tokens=50
)
tasks.append(task)
responses = await asyncio.gather(*tasks, return_exceptions=True)
results = []
for i, response in enumerate(responses):
if isinstance(response, Exception):
results.append({"model": models[i], "error": str(response)})
else:
results.append({
"model": models[i],
"content": response.choices[0].message.content,
"tokens": response.usage.total_tokens
})
return results
# Run async example
async_results = asyncio.run(async_litellm_example())
Examples
Basic Multi-Provider Usage
import openlit
import litellm
# Initialize tracing
openlit.init(
otlp_endpoint="https://api.orq.ai/v2/otel",
otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)
def basic_multi_provider_example():
# Define different models from various providers
models = [
"gpt-4", # OpenAI
"claude-3-opus-20240229", # Anthropic
"command-nightly", # Cohere
"gemini-pro", # Google
"llama-2-70b-chat", # Meta (via Replicate)
]
prompt = "Explain the benefits of microservices architecture in 2 sentences."
results = []
for model in models:
try:
print(f"Testing {model}...")
response = litellm.completion(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=150,
temperature=0.7
)
results.append({
"model": model,
"content": response.choices[0].message.content,
"tokens": response.usage.total_tokens,
"cost": response.usage.total_tokens * 0.002 # Estimated cost
})
except Exception as e:
print(f"Error with {model}: {e}")
results.append({
"model": model,
"error": str(e)
})
return results
results = basic_multi_provider_example()
for result in results:
if "error" not in result:
print(f"{result['model']}: {result['tokens']} tokens, ~${result['cost']:.4f}")
Cost Optimization with Provider Fallback
import openlit
import litellm
from typing import List, Dict, Any
openlit.init(
otlp_endpoint="https://api.orq.ai/v2/otel",
otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)
def cost_optimized_completion(
messages: List[Dict[str, str]],
fallback_models: List[str] = None,
max_tokens: int = 100
) -> Dict[str, Any]:
"""
Try models in order of cost efficiency with fallback options
"""
if fallback_models is None:
fallback_models = [
"gpt-3.5-turbo", # Cheapest OpenAI option
"claude-3-haiku-20240307", # Anthropic's fastest/cheapest
"command", # Cohere
"gpt-4o-mini", # OpenAI's smaller model
"gpt-4", # Fallback to premium if needed
]
for i, model in enumerate(fallback_models):
try:
print(f"Attempting {model} (priority {i+1})...")
response = litellm.completion(
model=model,
messages=messages,
max_tokens=max_tokens,
temperature=0.7
)
# Calculate approximate cost (rough estimates)
cost_per_1k_tokens = {
"gpt-3.5-turbo": 0.002,
"gpt-4o-mini": 0.0015,
"gpt-4": 0.03,
"claude-3-haiku-20240307": 0.00025,
"claude-3-sonnet-20240229": 0.003,
"command": 0.015
}
estimated_cost = (response.usage.total_tokens / 1000) * cost_per_1k_tokens.get(model, 0.002)
return {
"success": True,
"model_used": model,
"content": response.choices[0].message.content,
"tokens": response.usage.total_tokens,
"estimated_cost": estimated_cost,
"attempt_number": i + 1
}
except Exception as e:
print(f"Failed with {model}: {e}")
if i == len(fallback_models) - 1: # Last attempt
return {
"success": False,
"error": f"All models failed. Last error: {e}",
"attempts": len(fallback_models)
}
continue
return {"success": False, "error": "No models available"}
# Test cost optimization
result = cost_optimized_completion([
{"role": "user", "content": "Summarize the key benefits of using Docker containers for development"}
])
if result["success"]:
print(f"Success with {result['model_used']} on attempt {result['attempt_number']}")
print(f"Cost: ~${result['estimated_cost']:.4f}, Tokens: {result['tokens']}")
print(f"Response: {result['content'][:100]}...")
Custom Spans for Provider Performance Monitoring
from opentelemetry import trace
import openlit
import litellm
import time
openlit.init(
otlp_endpoint="https://api.orq.ai/v2/otel",
otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)
tracer = trace.get_tracer("litellm")
def provider_performance_benchmark():
with tracer.start_as_current_span("provider-benchmark") as benchmark_span:
providers = [
{"model": "gpt-3.5-turbo", "provider": "openai"},
{"model": "claude-3-haiku-20240307", "provider": "anthropic"},
{"model": "command", "provider": "cohere"},
]
benchmark_results = []
for provider_info in providers:
with tracer.start_as_current_span(f"test-{provider_info['provider']}") as provider_span:
provider_span.set_attribute("provider.name", provider_info["provider"])
provider_span.set_attribute("model.name", provider_info["model"])
start_time = time.time()
try:
response = litellm.completion(
model=provider_info["model"],
messages=[{
"role": "user",
"content": "Write a brief explanation of machine learning in exactly 50 words."
}],
max_tokens=75,
temperature=0.5
)
end_time = time.time()
response_time = end_time - start_time
provider_span.set_attribute("response.success", True)
provider_span.set_attribute("response.time_seconds", response_time)
provider_span.set_attribute("tokens.total", response.usage.total_tokens)
provider_span.set_attribute("tokens.prompt", response.usage.prompt_tokens)
provider_span.set_attribute("tokens.completion", response.usage.completion_tokens)
benchmark_results.append({
"provider": provider_info["provider"],
"model": provider_info["model"],
"success": True,
"response_time": response_time,
"tokens": response.usage.total_tokens,
"content": response.choices[0].message.content
})
except Exception as e:
end_time = time.time()
response_time = end_time - start_time
provider_span.record_exception(e)
provider_span.set_attribute("response.success", False)
provider_span.set_attribute("response.time_seconds", response_time)
provider_span.set_attribute("error.message", str(e))
benchmark_results.append({
"provider": provider_info["provider"],
"model": provider_info["model"],
"success": False,
"response_time": response_time,
"error": str(e)
})
# Calculate performance metrics
successful_tests = [r for r in benchmark_results if r["success"]]
if successful_tests:
avg_response_time = sum(r["response_time"] for r in successful_tests) / len(successful_tests)
fastest_provider = min(successful_tests, key=lambda x: x["response_time"])
benchmark_span.set_attribute("benchmark.total_providers", len(providers))
benchmark_span.set_attribute("benchmark.successful_providers", len(successful_tests))
benchmark_span.set_attribute("benchmark.avg_response_time", avg_response_time)
benchmark_span.set_attribute("benchmark.fastest_provider", fastest_provider["provider"])
return benchmark_results
benchmark_results = provider_performance_benchmark()
# Print results
print("\n=== Provider Performance Benchmark ===")
for result in benchmark_results:
if result["success"]:
print(f"{result['provider']:12} | {result['response_time']:.2f}s | {result['tokens']} tokens")
else:
print(f"{result['provider']:12} | FAILED | {result['error'][:50]}...")
Streaming Responses with Multiple Providers
import openlit
import litellm
openlit.init(
otlp_endpoint="https://api.orq.ai/v2/otel",
otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)
def streaming_comparison():
"""Compare streaming capabilities across providers"""
providers = ["gpt-3.5-turbo", "claude-3-haiku-20240307"]
prompt = "Write a short story about a robot learning to paint."
for provider in providers:
print(f"\n--- Streaming from {provider} ---")
try:
response = litellm.completion(
model=provider,
messages=[{"role": "user", "content": prompt}],
max_tokens=200,
stream=True,
temperature=0.8
)
full_response = ""
chunk_count = 0
for chunk in response:
if chunk.choices[0].delta.content:
content = chunk.choices[0].delta.content
full_response += content
print(content, end='', flush=True)
chunk_count += 1
print(f"\n\n[Completed: {chunk_count} chunks, {len(full_response)} characters]")
except Exception as e:
print(f"Streaming failed for {provider}: {e}")
streaming_comparison()
Router Configuration with Load Balancing
import openlit
import litellm
openlit.init(
otlp_endpoint="https://api.orq.ai/v2/otel",
otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)
def setup_litellm_router():
"""Configure LiteLLM router for load balancing and fallbacks"""
# Define model configurations
model_list = [
{
"model_name": "gpt-4-turbo", # Model alias
"litellm_params": {
"model": "gpt-4-1106-preview",
"api_key": "sk-...", # Your OpenAI API key
},
},
{
"model_name": "gpt-4-turbo", # Same alias for load balancing
"litellm_params": {
"model": "gpt-4-1106-preview",
"api_key": "sk-...", # Different API key for load balancing
"api_base": "https://api.openai.com/v1"
},
},
{
"model_name": "claude-opus",
"litellm_params": {
"model": "claude-3-opus-20240229",
"api_key": "sk-ant-...",
},
}
]
# Initialize router
router = litellm.Router(model_list=model_list)
return router
def router_example():
router = setup_litellm_router()
# Use router for load-balanced requests
response = router.completion(
model="gpt-4-turbo", # Uses alias, automatically load balances
messages=[{"role": "user", "content": "Explain the concept of distributed systems"}],
max_tokens=150
)
print("Router response:", response.choices[0].message.content)
# Router automatically handles failover
try:
response = router.completion(
model="claude-opus",
messages=[{"role": "user", "content": "What are the benefits of functional programming?"}],
max_tokens=150
)
print("Claude response:", response.choices[0].message.content)
except Exception as e:
print(f"Router handled error: {e}")
router_example()
Next Steps
✅ Verify traces: Check your Orq.ai dashboard to see incoming traces ✅ Add custom attributes: Enhance traces with business-specific metadata ✅ Set up alerts: Configure monitoring for performance degradation ✅ Explore metrics: Use trace data for performance optimization
Related Documentation
Support
Updated about 23 hours ago