TL;DR:
- Connect Orq.ai to LLM providers
- Use cURL streaming for real-time model responses.
- Add a knowledge base to enhance contextual understanding.
- Build a simple customer support agent powered by connected models and your data.
AI Router is a single unified API endpoint that lets you seamlessly route and manage requests across multiple AI model providers (e.g., OpenAI, Anthropic, Google, AWS). This functionality comes in handy, when you want to avoid dependency on a single provider and automatically switch between providers in case of an outage. The AI Router gives you freedom from a vendor lock-in and ensures that you can scale reliably when the usage surges.
Getting started with the AI Router
To get started you need to connect to a model provider. This example uses OpenAI as the provider:
- Navigate to the AI Router
- Open the Providers tab
- Choose OpenAI and click Connect
In the pop-up window select Setup your own API key
Log in to OpenAI’s API platform and copy your secret key and paste it inside this window:
Next, grab your API keys from the API Keys page:
In the Terminal copy the cURL command below and replace $ORQ_API_KEY with your API key:
curl -X POST https://api.orq.ai/v2/router/chat/completions
-H "Authorization: Bearer $ORQ_API_KEY"
-H "Content-Type: application/json"
-d '{
"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "Hello, world!"}]
}'
This is how a successful cURL request output looks like:
{"id":"01K7M0YTJ6X90VHPRDMM5GEC4R","object":"chat.completion","created":1760534948,"model":"gpt-4o-2024-08-06","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! This is a response. How can I assist you today?","refusal":null,"annotations":[],"tool_calls":[]},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"completion_tokens":14,"total_tokens":28,"prompt_tokens_details":{"cached_tokens":0,"audio_tokens":0},"completion_tokens_details":{"reasoning_tokens":0,"audio_tokens":0,"accepted_prediction_tokens":0,"rejected_prediction_tokens":0}},"service_tier":"default","system_fingerprint":"fp_f33640a400"}%
Notice that you got Hello! This is a response. How can I assist you today? reply back from your API call
Trouble shooting common errors
{"code":401,"error":"API key for openai is not configured in your workspace.
You can configure it in the providers page.","source":"provider"}
{
"code": 429,
"error": "429 You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.",
"source": "provider"
}
Streaming
Streaming: sending a response incrementally as small chunks of data over a persistent connection, rather than waiting to deliver the complete response all at once.
For example, when you make a normal POST request, the connection closes when the full response is ready. But when you set "stream": true, the API uses a Server-Sent Events (SSE) connection, an open HTTP connection that continuously sends small packets of data.
curl -N -X POST https://api.orq.ai/v2/router/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"stream": true,
"messages": [{"role": "user", "content": "Explain quantum computing simply"}]
}'
Each line is a Server-Sent Event (SSE) chunk containing JSON
data: {"id":"01K7M30E5QP6GCQM3YKX1NRY8Q","object":"chat.completion.chunk","created":1760537098,"model":"gpt-4o-2024-08-06","service_tier":"default","system_fingerprint":"fp_eb3c3cb84d","choices":[{"index":0,"delta":{"content":","},"logprobs":null,"finish_reason":null}],"obfuscation":"sxHSdcRBR5Tk"}
data: {"id":"01K7M30E5QP6GCQM3YKX1NRY8Q","object":"chat.completion.chunk","created":1760537098,"model":"gpt-4o-2024-08-06","service_tier":"default","system_fingerprint":"fp_eb3c3cb84d","choices":[{"index":0,"delta":{"content":" opening"},"logprobs":null,"finish_reason":null}],"obfuscation":"R8up2"}
data: [DONE]
Retries & fallbacks
Retries: automatically attempting a failed API call on specific error codesFallbacks: using an alternative fallback model, if the primary fails or hits rate limits
For example, if gpt-4o hits a rate limit or downtime, the request automatically retries and may fall back to Anthropic or another model. This reduces downtime and ensures your agents remain responsive.
curl --location 'https://api.orq.ai/v2/router/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $ORQ_API_KEY' \
--data-raw '{
"model": "openai/gpt-4o",
"messages": [
{ "role": "user", "content": "Explain Orq AI retries and fallbacks." }
],
"orq": {
"retry": { "count": 3, "on_codes": [429, 500, 502, 503, 504] },
"fallbacks": [
{ "model": "anthropic/claude-3-5-sonnet-20241022" },
{ "model": "openai/gpt-4o-mini" }
]
}
}'
count : number of automatic retries on failure codes.
fallbacks : list of alternative models Orq.ai tries if primary fails.
Caching
Caching: storing and reusing previous API responses for identical requests to reduce latency and costs
For example: A repeated FAQ query will return instantly from cache instead of hitting the model.
"orq": {
"cache": {
"type": "exact_match",
"ttl": 1800
}
}
exact_match : caches identical requests and reuses responses.
ttl: 1800 : cache entries expire after 30 minutes (1800 seconds).
Adding a knowledge base
You can ground the conversation in domain-specific knowledge by linking knowledge_bases.
"orq": {
"knowledge_bases": [
{ "knowledge_id": "api-documentation", "top_k": 5, "threshold": 0.75 },
{ "knowledge_id": "integration-examples", "top_k": 3, "threshold": 0.8 }
]
}
Identity and thread tracking: associating API requests with specific users and conversation sessions to enable analytics, maintain context, and organize interactions for auditing and reporting purposes.
For example, with identity and thread tracking you can cluster messages in threads for analytics and observability, track ongoing customer sessions and maintain conversation context. Another use-case is auditing and reporting on customer support interactions.
"orq": {
"identity": {
"id": "enterprise_customer_001",
"display_name": "Enterprise User",
"email": "[email protected]"
},
"thread": {
"id": "support_session_001",
"tags": ["api-integration", "enterprise", "technical-support"]
}
}
Dynamic inputs: variable parameters passed at runtime that customize prompt templates or model behavior for specific contexts, users, or use cases.
For example, the orq object with inputs provides the variable values that get injected into prompt templates using {{variable _name}} syntax. Prompts are personalized for each user/session without rewriting messages manually.
"orq": {
"inputs": {
"company_name": "Orq AI",
"customer_tier": "Enterprise",
"use_case": "e-commerce platform"
}
}
Building a Reliable Customer Support Agent
Here we’re building a production-ready support agent with the following capabilities:
- Reliable — automatically retry or fallback if a model fails
- Contextually aware — grounded in internal documentation and examples
- Fast and cost-efficient — cache repeated queries
- Traceable — track conversations per user and session
- Personalized — dynamic prompts based on user type and project
Here’s an example implementing all these features:
curl --location 'https://api.orq.ai/v2/router/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $ORQ_API_KEY' \
--data-raw '{
"model": "openai/gpt-4o",
"messages": [
{ "role": "system", "content": "You are a helpful customer support agent for {{company_name}}. Use available knowledge to assist {{customer_tier}} customers." },
{ "role": "user", "content": "I need help with API integration for my {{use_case}} project" }
],
"orq": {
"retry": { "count": 3, "on_codes": [429, 500, 502, 503, 504] },
"fallbacks": [
{ "model": "anthropic/claude-3-5-sonnet-20241022" },
{ "model": "openai/gpt-4o-mini" }
],
"cache": { "type": "exact_match", "ttl": 1800 },
"knowledge_bases": [
{ "knowledge_id": "api-documentation", "top_k": 5, "threshold": 0.75 },
{ "knowledge_id": "integration-examples", "top_k": 3, "threshold": 0.8 }
],
"identity": {
"id": "enterprise_customer_001",
"display_name": "Enterprise User",
"email": "[email protected]"
},
"thread": {
"id": "support_session_001",
"tags": ["api-integration", "enterprise", "technical-support"]
},
"inputs": {
"company_name": "Orq AI",
"customer_tier": "Enterprise",
"use_case": "e-commerce platform"
}
}
}'
Dynamic Inputs — Placeholders like {{company_name}} and {{use_case}} are replaced from orq.inputs, enabling personalized responses without rewriting prompts.
Retries & Fallbacks — Failed requests automatically retry up to 3 times, then fall back to Claude or GPT-4o-mini for guaranteed reliability.
Caching — Identical queries return instantly from cache (1800s TTL), reducing latency and API costs.
Knowledge Bases — Relevant internal documentation is automatically injected into the prompt for accurate, grounded responses.
Identity & Thread Tracking — Each request links to a user identity and conversation thread for observability and analytics.