Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

AI Router

Route your LLM calls through the AI Router with a single base URL change. Zero vendor lock-in: always run on the best model at the lowest cost for your use case.

Observability

Instrument your code with OpenTelemetry to capture traces, logs, and metrics for every LLM call, agent step, and tool use.

AI Router

Overview

The OpenAI SDK provides powerful tools for building AI applications with GPT models. By connecting the SDK to Orq.ai’s AI Router, you transform your OpenAI integration into a production-ready system with enterprise-grade capabilities, complete observability, and access to 300+ models across 20+ providers.

Key Benefits

Orq.ai’s AI Router enhances your OpenAI SDK with:

Complete Observability

Track every API call, token usage, and model interaction with detailed traces and analytics

Built-in Reliability

Automatic fallbacks, retries, and load balancing for production resilience

Cost Optimization

Real-time cost tracking and spend management across all your AI operations

Multi-Provider Access

Access 300+ LLMs and 20+ providers through a single, unified integration

Prerequisites

Before integrating OpenAI SDK with Orq.ai, ensure you have:

An Orq.ai account and API Key
Python 3.8+ or Node.js 18+ with TypeScript support
OpenAI SDK installed

To setup your API key, see API keys & Endpoints.

To use libraries with private models, see Onboarding Private Models.

Installation

pip install openai

Configuration

While using the OpenAI SDK, set the Base URL to the AI Router to feed calls through our API without changing any other part of your code. Using the Orq.ai AI Router, you benefit from Platform Traces and Cost and Usage Monitoring, keeping full compatibility and a unified API with all models while using the OpenAI SDK.

base_url: https://api.orq.ai/v3/router

Text Generation

Basic text generation with the OpenAI SDK through Orq.ai:

from openai import OpenAI
import os

client = OpenAI(
  base_url="https://api.orq.ai/v3/router",
  api_key=os.getenv("ORQ_API_KEY"),
)

completion = client.chat.completions.create(
  model="openai/gpt-4o",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]
)

print(completion.choices[0].message)

Streaming Responses

Stream responses for real-time output:

from openai import OpenAI
import os

client = OpenAI(
  base_url="https://api.orq.ai/v3/router",
  api_key=os.getenv("ORQ_API_KEY"),
)

stream = client.chat.completions.create(
  model="openai/gpt-4o",
  messages=[
    {"role": "user", "content": "Write a short story about robots."}
  ],
  stream=True
)

for chunk in stream:
  if chunk.choices[0].delta.content:
    print(chunk.choices[0].delta.content, end="", flush=True)

Model Selection

With Orq.ai, you can use any supported model from 20+ providers:

from openai import OpenAI
import os

client = OpenAI(
  base_url="https://api.orq.ai/v3/router",
  api_key=os.getenv("ORQ_API_KEY"),
)

# Use Claude
claude_response = client.chat.completions.create(
  model="anthropic/claude-sonnet-4-5-20250929",
  messages=[{"role": "user", "content": "Explain machine learning"}]
)

# Use Gemini
gemini_response = client.chat.completions.create(
  model="google/gemini-2.5-flash",
  messages=[{"role": "user", "content": "Explain machine learning"}]
)

# Use Groq
groq_response = client.chat.completions.create(
  model="groq/llama-3.3-70b-versatile",
  messages=[{"role": "user", "content": "Explain machine learning"}]
)

Observability

Getting Started

Integrate OpenAI SDK with Orq.ai’s observability to gain complete insights into model performance, token usage, API latency, and conversation flows using OpenTelemetry.

Prerequisites

Before you begin, ensure you have:

An Orq.ai account and API Key
Python 3.8+ or Node.js 18+
OpenAI SDK installed

Install Dependencies

# Core OpenTelemetry packages
pip install opentelemetry-sdk opentelemetry-instrumentation opentelemetry-exporter-otlp

# OpenAI SDK
pip install openai

# OpenAI Auto-Instrumentation
pip install opentelemetry-instrumentation-openai

Configure Orq.ai

Set up your environment variables to connect to Orq.ai’s OpenTelemetry collector: Unix/Linux/macOS:

export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer $ORQ_API_KEY"
export OTEL_RESOURCE_ATTRIBUTES="service.name=openai-app,service.version=1.0.0"
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/json"

Windows (PowerShell):

$env:OTEL_EXPORTER_OTLP_ENDPOINT = "https://api.orq.ai/v2/otel"
$env:OTEL_EXPORTER_OTLP_HEADERS = "Authorization=Bearer <ORQ_API_KEY>"
$env:OTEL_RESOURCE_ATTRIBUTES = "service.name=openai-app,service.version=1.0.0"
$env:OTEL_EXPORTER_OTLP_TRACES_PROTOCOL = "http/json"

Using .env file:

OTEL_EXPORTER_OTLP_ENDPOINT=https://api.orq.ai/v2/otel
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer <ORQ_API_KEY>
OTEL_RESOURCE_ATTRIBUTES=service.name=openai-app,service.version=1.0.0
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/json

Setup

Configure OpenTelemetry once at application startup:

from openai import OpenAI
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
import os

# Configure OpenTelemetry
tracer_provider = TracerProvider()
tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(tracer_provider)

# Instrument OpenAI
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)

# Create OpenAI client
client = OpenAI(
    base_url="https://api.orq.ai/v3/router",
    api_key=os.getenv("ORQ_API_KEY")
)

Examples

Basic Example

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is quantum computing in one sentence?"}
    ]
)

print(response.choices[0].message.content)

Streaming Example

from opentelemetry import trace

tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("streaming-completion") as span:
    stream = client.chat.completions.create(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Write a haiku about code."}],
        stream=True
    )

    full_response = ""
    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            full_response += content
            print(content, end="", flush=True)

    span.set_attribute("response.length", len(full_response))

Custom Spans Example

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

def analyze_document(document: str):
    with tracer.start_as_current_span("document-analysis") as span:
        span.set_attribute("document.length", len(document))

        # Prepare prompt
        prompt = f"Analyze this text: {document}"
        with tracer.start_as_current_span("prepare-prompt") as prep_span:
            prep_span.set_attribute("prompt.length", len(prompt))

        # Model inference
        with tracer.start_as_current_span("model-inference") as inf_span:
            inf_span.set_attribute("model", "gpt-4o")

            response = client.chat.completions.create(
                model="openai/gpt-4o",
                messages=[{"role": "user", "content": prompt}]
            )

            inf_span.set_attribute("tokens.total", response.usage.total_tokens)

        # Process result
        with tracer.start_as_current_span("process-result") as proc_span:
            result = response.choices[0].message.content
            proc_span.set_attribute("result.length", len(result))

        return result

analysis = analyze_document("Machine learning is a subset of AI.")
print(analysis)

Advanced Workflows Example

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

def content_generation_pipeline(topic: str):
    with tracer.start_as_current_span("content-pipeline") as pipeline_span:
        pipeline_span.set_attribute("pipeline.topic", topic)
        pipeline_span.set_attribute("pipeline.stages", 3)

        # Stage 1: Research
        with tracer.start_as_current_span("stage-research") as research_span:
            research_span.set_attribute("stage.name", "research")

            research = client.chat.completions.create(
                model="openai/gpt-4o",
                messages=[{"role": "user", "content": f"List 3 key facts about {topic}."}]
            )

            facts = research.choices[0].message.content
            research_span.set_attribute("facts.count", 3)
            research_span.set_attribute("tokens.used", research.usage.total_tokens)

        # Stage 2: Writing
        with tracer.start_as_current_span("stage-writing") as writing_span:
            writing_span.set_attribute("stage.name", "writing")

            writing = client.chat.completions.create(
                model="openai/gpt-4o",
                messages=[{"role": "user", "content": f"Write a brief introduction using these facts: {facts}"}]
            )

            content = writing.choices[0].message.content
            writing_span.set_attribute("content.length", len(content))

        # Stage 3: Review
        with tracer.start_as_current_span("stage-review") as review_span:
            review_span.set_attribute("stage.name", "review")

            review = client.chat.completions.create(
                model="openai/gpt-4o",
                messages=[{"role": "user", "content": f"Rate this content quality 1-10: {content}"}]
            )

            rating = review.choices[0].message.content
            review_span.set_attribute("quality.rating", rating)

        pipeline_span.set_attribute("pipeline.success", True)
        return {"content": content, "rating": rating}

result = content_generation_pipeline("neural networks")
print(f"Content: {result['content']}")
print(f"Rating: {result['rating']}")

View Traces

View your traces in the AI Studio in the Traces tab.

Traces will display detailed information about your OpenAI SDK calls

Visit your AI Studio to view real-time analytics and traces.

Evaluations & Experiments

Once your agents are running, use Evaluatorq to score outputs across a dataset and Experiments to compare configurations side-by-side.

Run Evaluations with Evaluatorq

Run parallel evaluations across your agents and compare results.

Run Experiments via the API

Compare agent configurations and view results in the AI Studio.

Documentation Index

AI Router

Observability

​AI Router

​Overview

​Key Benefits

Complete Observability

Built-in Reliability

Cost Optimization

Multi-Provider Access

​Prerequisites

​Installation

​Configuration

​Text Generation

​Streaming Responses

​Model Selection

​Observability

​Getting Started

​Prerequisites

​Install Dependencies

​Configure Orq.ai

​Setup

​Examples

​Basic Example

​Streaming Example

​Custom Spans Example

​Advanced Workflows Example

​View Traces

​Evaluations & Experiments

Run Evaluations with Evaluatorq

Run Experiments via the API

AI Router

Overview

Key Benefits

Prerequisites

Installation

Configuration

Text Generation

Streaming Responses

Model Selection

Observability

Getting Started

Prerequisites

Install Dependencies

Configure Orq.ai

Setup

Examples

Basic Example

Streaming Example

Custom Spans Example

Advanced Workflows Example

View Traces

Evaluations & Experiments