Skip to main content

AI Gateway

Overview

Use Azure AI Inference SDK to route all model calls through Orq.ai’s AI Gateway. Point ChatCompletionsClient at Orq’s endpoint to access 300+ models from 20+ providers OpenAI, Anthropic, Google, and more without changing any agent logic.

Key Benefits

Complete Observability

Track every agent step, tool use, and LLM call with detailed traces and analytics

Built-in Reliability

Automatic fallbacks, retries, and load balancing for production resilience

Cost Optimization

Real-time cost tracking and spend management across all AI operations

Multi-Provider Access

Access 300+ LLMs and 20+ providers through a single, unified integration

Prerequisites

  • An Orq.ai account and API Key
  • Python 3.9 or higher
To set up an API key, see API keys & Endpoints.

Installation

pip install azure-ai-inference azure-core

Configuration

Configure ChatCompletionsClient to point at Orq.ai’s AI Gateway:
Python
import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

client = ChatCompletionsClient(
    endpoint="https://api.orq.ai/v3/router",
    credential=AzureKeyCredential(os.environ["ORQ_API_KEY"]),
)
endpoint: https://api.orq.ai/v3/router

Basic Example

Python
import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

client = ChatCompletionsClient(
    endpoint="https://api.orq.ai/v3/router",
    credential=AzureKeyCredential(os.environ["ORQ_API_KEY"]),
)

response = client.complete(
    model="openai/gpt-4o",
    messages=[
        SystemMessage(content="You are a helpful research assistant. Answer questions concisely and accurately."),
        UserMessage(content="What are the three most important factors when evaluating an LLM for production use?"),
    ],
)

print(response.choices[0].message.content)

Agent with Function Tools

ChatCompletionsClient supports multi-turn tool calling. The agent loop runs until no more tool calls are returned:
Python
import os
import json
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import (
    SystemMessage,
    UserMessage,
    AssistantMessage,
    ToolMessage,
    ChatCompletionsToolDefinition,
    FunctionDefinition,
)
from azure.core.credentials import AzureKeyCredential

client = ChatCompletionsClient(
    endpoint="https://api.orq.ai/v3/router",
    credential=AzureKeyCredential(os.environ["ORQ_API_KEY"]),
)

tools = [
    ChatCompletionsToolDefinition(
        function=FunctionDefinition(
            name="get_weather",
            description="Get the current weather for a given location.",
            parameters={
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and country, e.g. Amsterdam, NL",
                    }
                },
                "required": ["location"],
            },
        )
    )
]


def get_weather(location: str) -> str:
    data = {
        "amsterdam, nl": "Cloudy, 14°C",
        "london, uk": "Rainy, 11°C",
        "san francisco, us": "Sunny, 18°C",
    }
    return data.get(location.lower(), f"No weather data for {location}")


messages = [
    SystemMessage(content="You are a weather assistant. Always use get_weather to look up weather."),
    UserMessage(content="What's the weather in Amsterdam and London?"),
]

# Agent loop run until no tool calls remain
while True:
    response = client.complete(
        model="openai/gpt-4o",
        messages=messages,
        tools=tools,
    )

    choice = response.choices[0]

    if choice.finish_reason == "tool_calls":
        messages.append(AssistantMessage(tool_calls=choice.message.tool_calls))
        for tool_call in choice.message.tool_calls:
            args = json.loads(tool_call.function.arguments)
            result = get_weather(args["location"])
            messages.append(ToolMessage(tool_call_id=tool_call.id, content=result))
    else:
        print(choice.message.content)
        break

Model Selection

Switch models by changing the model parameter. All 300+ models are available through the same client:
Python
import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

client = ChatCompletionsClient(
    endpoint="https://api.orq.ai/v3/router",
    credential=AzureKeyCredential(os.environ["ORQ_API_KEY"]),
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    UserMessage(content="Explain transformer architectures briefly."),
]

# Use Claude
response = client.complete(model="anthropic/claude-sonnet-4-6", messages=messages)

# Use Gemini
response = client.complete(model="google-ai/gemini-2.5-flash", messages=messages)

# Use GPT-4o
response = client.complete(model="openai/gpt-4o", messages=messages)

print(response.choices[0].message.content)