Skip to main content

AI Gateway

Route your LLM calls through the AI Gateway with a single base URL change. Zero vendor lock-in: always run on the best model at the lowest cost for your use case.

AI Gateway

Overview

Microsoft Semantic Kernel is an SDK that integrates Large Language Models (LLMs) with conventional programming languages. Connecting Semantic Kernel to Orq.ai’s AI Gateway transforms experimental AI agents into production-ready systems with enterprise-grade capabilities.

Key Benefits

Orq.ai’s AI Gateway enhances Semantic Kernel applications with:

Complete Observability

Track every agent step, tool use, and interaction with detailed traces and analytics

Built-in Reliability

Automatic fallbacks, retries, and load balancing for production resilience

Cost Optimization

Real-time cost tracking and spend management across all AI operations

Multi-Provider Access

Access 300+ LLMs and 20+ providers through a single, unified integration

Prerequisites

Before integrating Semantic Kernel with Orq.ai, ensure you have:
  • An Orq.ai account and API Key
  • Python 3.10 or higher
  • Semantic Kernel SDK installed
To set up an API key, see API keys & Endpoints.

Installation

Install Semantic Kernel and the OpenAI SDK:
pip install semantic-kernel openai

Configuration

Configure Semantic Kernel to use Orq.ai’s AI Gateway by creating an OpenAI client with a custom base URL:
from openai import AsyncOpenAI
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
import os

# Configure OpenAI client with Orq.ai AI Gateway
client = AsyncOpenAI(
    api_key=os.getenv("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router"
)

# Create kernel
kernel = Kernel()

# Add chat completion service
chat_service = OpenAIChatCompletion(
    ai_model_id="gpt-4o",
    async_client=client
)

kernel.add_service(chat_service)
base_url: https://api.orq.ai/v3/router

Basic Example

Here’s a complete example of using Semantic Kernel with Orq.ai:
from openai import AsyncOpenAI
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import (
    OpenAIChatCompletion,
    OpenAIChatPromptExecutionSettings
)
from semantic_kernel.contents import ChatHistory
import asyncio
import os

async def main():
    # Configure client with Orq.ai
    client = AsyncOpenAI(
        api_key=os.getenv("ORQ_API_KEY"),
        base_url="https://api.orq.ai/v3/router"
    )

    # Create kernel
    kernel = Kernel()

    # Add chat completion service
    chat_service = OpenAIChatCompletion(
        ai_model_id="gpt-4o",
        async_client=client
    )
    kernel.add_service(chat_service)

    # Create execution settings
    settings = OpenAIChatPromptExecutionSettings(
        max_tokens=2000,
        temperature=0.7
    )

    # Create chat history
    history = ChatHistory()
    history.add_user_message("What is quantum computing?")

    # Get response
    response = await chat_service.get_chat_message_content(
        chat_history=history,
        settings=settings,
        kernel=kernel
    )

    print(response.content)

if __name__ == "__main__":
    asyncio.run(main())

Using Plugins (Functions)

Semantic Kernel’s power comes from combining LLMs with plugins. Here’s how to use them with Orq.ai:
from openai import AsyncOpenAI
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from semantic_kernel.functions import kernel_function
from semantic_kernel.contents import ChatHistory
import asyncio
import os

# Define a plugin
class WeatherPlugin:
    @kernel_function(
        name="get_weather",
        description="Get the weather for a location"
    )
    def get_weather(self, location: str) -> str:
        """Get weather for a location."""
        return f"The weather in {location} is sunny and 72°F"

async def main():
    # Configure client
    client = AsyncOpenAI(
        api_key=os.getenv("ORQ_API_KEY"),
        base_url="https://api.orq.ai/v3/router"
    )

    # Create kernel
    kernel = Kernel()

    # Add chat completion service
    chat_service = OpenAIChatCompletion(
        ai_model_id="gpt-4o",
        async_client=client
    )
    kernel.add_service(chat_service)

    # Add plugin
    kernel.add_plugin(
        WeatherPlugin(),
        plugin_name="WeatherPlugin"
    )

    # Create chat history
    history = ChatHistory()
    history.add_user_message("What's the weather in San Francisco?")

    # Enable automatic function calling
    from semantic_kernel.connectors.ai.open_ai import OpenAIChatPromptExecutionSettings
    from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior

    execution_settings = OpenAIChatPromptExecutionSettings(
        function_choice_behavior=FunctionChoiceBehavior.Auto()
    )

    # Get response with function calling
    response = await chat_service.get_chat_message_content(
        chat_history=history,
        settings=execution_settings,
        kernel=kernel
    )

    print(response.content)

if __name__ == "__main__":
    asyncio.run(main())

Model Selection

With Orq.ai, you can use any supported model from 20+ providers:
from openai import AsyncOpenAI
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
import os

# Configure client
client = AsyncOpenAI(
    api_key=os.getenv("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router"
)

kernel = Kernel()

# Use Claude
claude_service = OpenAIChatCompletion(
    ai_model_id="anthropic/claude-sonnet-4-6",
    async_client=client,
    service_id="claude"
)
kernel.add_service(claude_service)

# Use Gemini
gemini_service = OpenAIChatCompletion(
    ai_model_id="google-ai/gemini-2.5-flash",
    async_client=client,
    service_id="gemini"
)
kernel.add_service(gemini_service)

# Use any other model
groq_service = OpenAIChatCompletion(
    ai_model_id="groq/llama-3.3-70b-versatile",
    async_client=client,
    service_id="groq"
)
kernel.add_service(groq_service)

Streaming Responses

Semantic Kernel supports streaming with Orq.ai:
from openai import AsyncOpenAI
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import (
    OpenAIChatCompletion,
    OpenAIChatPromptExecutionSettings
)
from semantic_kernel.contents import ChatHistory
import asyncio
import os

async def main():
    client = AsyncOpenAI(
        api_key=os.getenv("ORQ_API_KEY"),
        base_url="https://api.orq.ai/v3/router"
    )

    kernel = Kernel()
    chat_service = OpenAIChatCompletion(
        ai_model_id="gpt-4o",
        async_client=client
    )
    kernel.add_service(chat_service)

    settings = OpenAIChatPromptExecutionSettings(
        max_tokens=2000,
        temperature=0.7
    )

    history = ChatHistory()
    history.add_user_message("Write a short story about AI")

    # Stream response
    async for message in chat_service.get_streaming_chat_message_content(
        chat_history=history,
        settings=settings,
        kernel=kernel
    ):
        print(message.content, end="", flush=True)

if __name__ == "__main__":
    asyncio.run(main())