Skip to main content

AI Router

Overview

Microsoft Semantic Kernel is an SDK that integrates Large Language Models (LLMs) with conventional programming languages. By connecting Semantic Kernel to Orq.ai’s AI Router, you transform experimental AI agents into production-ready systems with enterprise-grade capabilities.

Key Benefits

Orq.ai’s AI Router enhances your Semantic Kernel applications with:

Complete Observability

Track every agent step, tool use, and interaction with detailed traces and analytics

Built-in Reliability

Automatic fallbacks, retries, and load balancing for production resilience

Cost Optimization

Real-time cost tracking and spend management across all your AI operations

Multi-Provider Access

Access 300+ LLMs and 20+ providers through a single, unified integration

Prerequisites

Before integrating Semantic Kernel with Orq.ai, ensure you have:
  • An Orq.ai account and API Key
  • Python 3.8 or higher
  • Semantic Kernel SDK installed
To setup your API key, see API keys & Endpoints.

Installation

Install Semantic Kernel and the OpenAI SDK:
pip install semantic-kernel openai

Configuration

Configure Semantic Kernel to use Orq.ai’s AI Router by creating an OpenAI client with a custom base URL:
from openai import AsyncOpenAI
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
import os

# Configure OpenAI client with Orq.ai AI Router
client = AsyncOpenAI(
    api_key=os.getenv("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v2/router"
)

# Create kernel
kernel = Kernel()

# Add chat completion service
chat_service = OpenAIChatCompletion(
    ai_model_id="gpt-4o",
    async_client=client
)

kernel.add_service(chat_service)
base_url: https://api.orq.ai/v2/router

Basic Example

Here’s a complete example of using Semantic Kernel with Orq.ai:
from openai import AsyncOpenAI
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import (
    OpenAIChatCompletion,
    OpenAIChatPromptExecutionSettings
)
from semantic_kernel.contents import ChatHistory
import asyncio
import os

async def main():
    # Configure client with Orq.ai
    client = AsyncOpenAI(
        api_key=os.getenv("ORQ_API_KEY"),
        base_url="https://api.orq.ai/v2/router"
    )

    # Create kernel
    kernel = Kernel()

    # Add chat completion service
    chat_service = OpenAIChatCompletion(
        ai_model_id="gpt-4o",
        async_client=client
    )
    kernel.add_service(chat_service)

    # Create execution settings
    settings = OpenAIChatPromptExecutionSettings(
        max_tokens=2000,
        temperature=0.7
    )

    # Create chat history
    history = ChatHistory()
    history.add_user_message("What is quantum computing?")

    # Get response
    response = await chat_service.get_chat_message_content(
        chat_history=history,
        settings=settings,
        kernel=kernel
    )

    print(response.content)

if __name__ == "__main__":
    asyncio.run(main())

Using Plugins (Functions)

Semantic Kernel’s power comes from combining LLMs with plugins. Here’s how to use them with Orq.ai:
from openai import AsyncOpenAI
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from semantic_kernel.functions import kernel_function
from semantic_kernel.contents import ChatHistory
import asyncio
import os

# Define a plugin
class WeatherPlugin:
    @kernel_function(
        name="get_weather",
        description="Get the weather for a location"
    )
    def get_weather(self, location: str) -> str:
        """Get weather for a location."""
        return f"The weather in {location} is sunny and 72°F"

async def main():
    # Configure client
    client = AsyncOpenAI(
        api_key=os.getenv("ORQ_API_KEY"),
        base_url="https://api.orq.ai/v2/router"
    )

    # Create kernel
    kernel = Kernel()

    # Add chat completion service
    chat_service = OpenAIChatCompletion(
        ai_model_id="gpt-4o",
        async_client=client
    )
    kernel.add_service(chat_service)

    # Add plugin
    kernel.add_plugin(
        WeatherPlugin(),
        plugin_name="WeatherPlugin"
    )

    # Create chat history
    history = ChatHistory()
    history.add_user_message("What's the weather in San Francisco?")

    # Enable automatic function calling
    from semantic_kernel.connectors.ai.open_ai import OpenAIChatPromptExecutionSettings
    from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior

    execution_settings = OpenAIChatPromptExecutionSettings(
        function_choice_behavior=FunctionChoiceBehavior.Auto()
    )

    # Get response with function calling
    response = await chat_service.get_chat_message_content(
        chat_history=history,
        settings=execution_settings,
        kernel=kernel
    )

    print(response.content)

if __name__ == "__main__":
    asyncio.run(main())

Model Selection

With Orq.ai, you can use any supported model from 20+ providers:
from openai import AsyncOpenAI
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
import os

# Configure client
client = AsyncOpenAI(
    api_key=os.getenv("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v2/router"
)

kernel = Kernel()

# Use Claude
claude_service = OpenAIChatCompletion(
    ai_model_id="claude-sonnet-4-5-20250929",
    async_client=client,
    service_id="claude"
)
kernel.add_service(claude_service)

# Use Gemini
gemini_service = OpenAIChatCompletion(
    ai_model_id="gemini-2.5-flash",
    async_client=client,
    service_id="gemini"
)
kernel.add_service(gemini_service)

# Use any other model
groq_service = OpenAIChatCompletion(
    ai_model_id="llama-3.3-70b-versatile",
    async_client=client,
    service_id="groq"
)
kernel.add_service(groq_service)

Streaming Responses

Semantic Kernel supports streaming with Orq.ai:
from openai import AsyncOpenAI
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import (
    OpenAIChatCompletion,
    OpenAIChatPromptExecutionSettings
)
from semantic_kernel.contents import ChatHistory
import asyncio
import os

async def main():
    client = AsyncOpenAI(
        api_key=os.getenv("ORQ_API_KEY"),
        base_url="https://api.orq.ai/v2/router"
    )

    kernel = Kernel()
    chat_service = OpenAIChatCompletion(
        ai_model_id="gpt-4o",
        async_client=client
    )
    kernel.add_service(chat_service)

    settings = OpenAIChatPromptExecutionSettings(
        max_tokens=2000,
        temperature=0.7
    )

    history = ChatHistory()
    history.add_user_message("Write a short story about AI")

    # Stream response
    async for message in chat_service.get_streaming_chat_message_content(
        chat_history=history,
        settings=settings,
        kernel=kernel
    ):
        print(message.content, end="", flush=True)

if __name__ == "__main__":
    asyncio.run(main())

Observability & Monitoring

All Semantic Kernel interactions routed through Orq.ai are automatically tracked and available in the AI Studio:
  • Request Traces: View complete conversation flows and function calls
  • Plugin Usage: Monitor which plugins are being invoked and their success rates
  • Performance Metrics: Track latency, token usage, and completion rates
  • Cost Analysis: Understand spending patterns across models and providers
Visit your AI Studio to view real-time analytics and traces.