Skip to main content

AI Gateway

Route your LLM calls through the AI Gateway with a single base URL change. Zero vendor lock-in: always run on the best model at the lowest cost for your use case.

Observability

Instrument your code with OpenTelemetry to capture traces, logs, and metrics for every LLM call, agent step, and tool use.

AI Gateway

Overview

The OpenAI SDK provides powerful tools for building AI applications with GPT models. By connecting the SDK to Orq.ai’s AI Gateway, you transform your OpenAI integration into a production-ready system with enterprise-grade capabilities, complete observability, and access to 300+ models across 20+ providers.

Key Benefits

Orq.ai’s AI Gateway enhances your OpenAI SDK with:

Complete Observability

Track every API call, token usage, and model interaction with detailed traces and analytics

Built-in Reliability

Automatic fallbacks, retries, and load balancing for production resilience

Cost Optimization

Real-time cost tracking and spend management across all your AI operations

Multi-Provider Access

Access 300+ LLMs and 20+ providers through a single, unified integration

Prerequisites

Before integrating OpenAI SDK with Orq.ai, ensure you have:
  • An Orq.ai account and API Key
  • Python 3.8+ or Node.js 18+ with TypeScript support
  • OpenAI SDK installed
To setup your API key, see API keys & Endpoints.
To use libraries with private models, see Onboarding Private Models.

Installation

pip install openai

Configuration

While using the OpenAI SDK, set the Base URL to the AI Gateway to feed calls through our API without changing any other part of your code. Using the Orq.ai AI Gateway, you benefit from Platform Traces and Cost and Usage Monitoring, keeping full compatibility and a unified API with all models while using the OpenAI SDK.
base_url: https://api.orq.ai/v3/router

Text Generation

Basic text generation with the OpenAI SDK through Orq.ai:
import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://api.orq.ai/v3/router",
  apiKey: process.env.ORQ_API_KEY,
});

const response = await openai.responses.create({
  model: "openai/gpt-4o",
  instructions: "You are a helpful assistant.",
  input: "Hello!",
});

console.log(response.output_text);

Streaming Responses

Stream responses for real-time output:
import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://api.orq.ai/v3/router",
  apiKey: process.env.ORQ_API_KEY,
});

const stream = await openai.responses.create({
  model: "openai/gpt-4o",
  input: "Write a short story about robots.",
  stream: true,
});

for await (const event of stream) {
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
}

Model Selection

With Orq.ai, you can use any supported model from 20+ providers:
import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://api.orq.ai/v3/router",
  apiKey: process.env.ORQ_API_KEY,
});

const claudeResponse = await openai.responses.create({
  model: "anthropic/claude-sonnet-4-6",
  input: "Explain machine learning",
});

const geminiResponse = await openai.responses.create({
  model: "google/gemini-2.5-flash",
  input: "Explain machine learning",
});

const groqResponse = await openai.responses.create({
  model: "groq/llama-3.3-70b-versatile",
  input: "Explain machine learning",
});

Observability

Getting Started

Integrate OpenAI SDK with Orq.ai’s observability to gain complete insights into model performance, token usage, API latency, and conversation flows using OpenTelemetry.

Prerequisites

Before you begin, ensure you have:
  • An Orq.ai account and API Key
  • Python 3.8+ or Node.js 18+
  • OpenAI SDK installed

Install Dependencies

# Core OpenTelemetry packages
pip install opentelemetry-sdk opentelemetry-instrumentation opentelemetry-exporter-otlp

# OpenAI SDK
pip install openai

# OpenAI Auto-Instrumentation
pip install opentelemetry-instrumentation-openai

Configure Orq.ai

Set up your environment variables to connect to Orq.ai’s OpenTelemetry collector: Unix/Linux/macOS:
export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer $ORQ_API_KEY"
export OTEL_RESOURCE_ATTRIBUTES="service.name=openai-app,service.version=1.0.0"
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/json"
Windows (PowerShell):
$env:OTEL_EXPORTER_OTLP_ENDPOINT = "https://api.orq.ai/v2/otel"
$env:OTEL_EXPORTER_OTLP_HEADERS = "Authorization=Bearer <ORQ_API_KEY>"
$env:OTEL_RESOURCE_ATTRIBUTES = "service.name=openai-app,service.version=1.0.0"
$env:OTEL_EXPORTER_OTLP_TRACES_PROTOCOL = "http/json"
Using .env file:
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.orq.ai/v2/otel
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer <ORQ_API_KEY>
OTEL_RESOURCE_ATTRIBUTES=service.name=openai-app,service.version=1.0.0
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/json

Setup

Configure OpenTelemetry once at application startup:
from openai import OpenAI
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
import os

# Configure OpenTelemetry
tracer_provider = TracerProvider()
tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(tracer_provider)

# Instrument OpenAI
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)

# Create OpenAI client
client = OpenAI(
    base_url="https://api.orq.ai/v3/router",
    api_key=os.environ.get("ORQ_API_KEY")
)

Examples

The examples below use the Chat Completions endpoint. OpenTelemetry instrumentation works identically with the Responses API: replace client.chat.completions.create(...) with client.responses.create(...).

Basic Example

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.orq.ai/v3/router",
    api_key=os.environ.get("ORQ_API_KEY"),
)

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is quantum computing in one sentence?"}
    ]
)

print(response.choices[0].message.content)

Streaming Example

from openai import OpenAI
from opentelemetry import trace
import os

client = OpenAI(
    base_url="https://api.orq.ai/v3/router",
    api_key=os.environ.get("ORQ_API_KEY"),
)

tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("streaming-completion") as span:
    stream = client.chat.completions.create(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Write a haiku about code."}],
        stream=True
    )

    full_response = ""
    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            full_response += content
            print(content, end="", flush=True)

    span.set_attribute("response.length", len(full_response))

Custom Spans Example

from openai import OpenAI
from opentelemetry import trace
import os

client = OpenAI(
    base_url="https://api.orq.ai/v3/router",
    api_key=os.environ.get("ORQ_API_KEY"),
)

tracer = trace.get_tracer(__name__)

def analyze_document(document: str):
    with tracer.start_as_current_span("document-analysis") as span:
        span.set_attribute("document.length", len(document))

        # Prepare prompt
        prompt = f"Analyze this text: {document}"
        with tracer.start_as_current_span("prepare-prompt") as prep_span:
            prep_span.set_attribute("prompt.length", len(prompt))

        # Model inference
        with tracer.start_as_current_span("model-inference") as inf_span:
            inf_span.set_attribute("model", "gpt-4o")

            response = client.chat.completions.create(
                model="openai/gpt-4o",
                messages=[{"role": "user", "content": prompt}]
            )

            inf_span.set_attribute("tokens.total", response.usage.total_tokens)

        # Process result
        with tracer.start_as_current_span("process-result") as proc_span:
            result = response.choices[0].message.content
            proc_span.set_attribute("result.length", len(result))

        return result

analysis = analyze_document("Machine learning is a subset of AI.")
print(analysis)

Advanced Workflows Example

from openai import OpenAI
from opentelemetry import trace
import os

client = OpenAI(
    base_url="https://api.orq.ai/v3/router",
    api_key=os.environ.get("ORQ_API_KEY"),
)

tracer = trace.get_tracer(__name__)

def content_generation_pipeline(topic: str):
    with tracer.start_as_current_span("content-pipeline") as pipeline_span:
        pipeline_span.set_attribute("pipeline.topic", topic)
        pipeline_span.set_attribute("pipeline.stages", 3)

        # Stage 1: Research
        with tracer.start_as_current_span("stage-research") as research_span:
            research_span.set_attribute("stage.name", "research")

            research = client.chat.completions.create(
                model="openai/gpt-4o",
                messages=[{"role": "user", "content": f"List 3 key facts about {topic}."}]
            )

            facts = research.choices[0].message.content
            research_span.set_attribute("facts.count", 3)
            research_span.set_attribute("tokens.used", research.usage.total_tokens)

        # Stage 2: Writing
        with tracer.start_as_current_span("stage-writing") as writing_span:
            writing_span.set_attribute("stage.name", "writing")

            writing = client.chat.completions.create(
                model="openai/gpt-4o",
                messages=[{"role": "user", "content": f"Write a brief introduction using these facts: {facts}"}]
            )

            content = writing.choices[0].message.content
            writing_span.set_attribute("content.length", len(content))

        # Stage 3: Review
        with tracer.start_as_current_span("stage-review") as review_span:
            review_span.set_attribute("stage.name", "review")

            review = client.chat.completions.create(
                model="openai/gpt-4o",
                messages=[{"role": "user", "content": f"Rate this content quality 1-10: {content}"}]
            )

            rating = review.choices[0].message.content
            review_span.set_attribute("quality.rating", rating)

        pipeline_span.set_attribute("pipeline.success", True)
        return {"content": content, "rating": rating}

result = content_generation_pipeline("neural networks")
print(f"Content: {result['content']}")
print(f"Rating: {result['rating']}")

View Traces

View your traces in the AI Studio in the Traces tab.
Traces will display detailed information about your OpenAI SDK calls
Visit your AI Studio to view real-time analytics and traces.

Evaluations & Experiments

Once your agents are running, use Evaluatorq to score outputs across a dataset and Experiments to compare configurations side-by-side.

Run Evaluations with Evaluatorq

Run parallel evaluations across your agents and compare results.

Run Experiments via the API

Compare agent configurations and view results in the AI Studio.