Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

TL;DR:

Connect Orq.ai to LLM providers
Use cURL streaming for real-time model responses.
Add a knowledge base to enhance contextual understanding.
Build a simple customer support agent powered by connected models and your data.

AI Gateway is a single unified API endpoint that lets you seamlessly route and manage requests across multiple AI model providers (e.g., OpenAI, Anthropic, Google, AWS). This functionality comes in handy, when you want to avoid dependency on a single provider and automatically switch between providers in case of an outage. API gateway gives you freedom from a vendor lock-in and ensures that you can scale reliably when the usage surges.

Getting started with AI Gateway

To get started you need to connect to a model provider. This example uses OpenAI as the provider:

Navigate to Model Garden
Open the Providers tab
Choose OpenAI and click Connect

In the pop-up window select Setup your own API key

Next, you need to grab your Orq.ai API keys. To do that go to:

Workspace settings
API Keys
Copy your key

In the Terminal copy the cURL command below and replace $ORQ_API_KEY with your API key:

curl -X POST https://api.orq.ai/v2/proxy/chat/completions
  -H "Authorization: Bearer $ORQ_API_KEY"
  -H "Content-Type: application/json"
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "Hello, world!"}]
  }'

*If you are using GUI tools (Postman, Insomnia, Swagger, VS Code REST Client and JetBrains HTTP Files) to run cURL scripts check this blog post This is how a successful cURL request output looks like:

{"id":"01K7M0YTJ6X90VHPRDMM5GEC4R","object":"chat.completion","created":1760534948,"model":"gpt-4o-2024-08-06","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! This is a response. How can I assist you today?","refusal":null,"annotations":[],"tool_calls":[]},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"completion_tokens":14,"total_tokens":28,"prompt_tokens_details":{"cached_tokens":0,"audio_tokens":0},"completion_tokens_details":{"reasoning_tokens":0,"audio_tokens":0,"accepted_prediction_tokens":0,"rejected_prediction_tokens":0}},"service_tier":"default","system_fingerprint":"fp_f33640a400"}%

Notice that you got Hello! This is a response. How can I assist you today? reply back from your API call

Trouble shooting common errors

{"code":401,"error":"API key for openai is not configured in your workspace.
You can configure it in the providers page.
Go to https://my.orq.ai/orq-YOUR-WORKSPACE-NAME/model-garden/providers","source":"provider"}

{
  "code": 429,
  "error": "429 You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.",
  "source": "provider"
}

Streaming

Streaming: sending a response incrementally as small chunks of data over a persistent connection, rather than waiting to deliver the complete response all at once.

For example, when you make a normal POST request, the connection closes when the full response is ready. But when you set "stream": true, the API uses a Server-Sent Events (SSE) connection, an open HTTP connection that continuously sends small packets of data.

curl -N -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "stream": true,
    "messages": [{"role": "user", "content": "Explain quantum computing simply"}]
  }'

Each line is a Server-Sent Event (SSE) chunk containing JSON

data: {"id":"01K7M30E5QP6GCQM3YKX1NRY8Q","object":"chat.completion.chunk","created":1760537098,"model":"gpt-4o-2024-08-06","service_tier":"default","system_fingerprint":"fp_eb3c3cb84d","choices":[{"index":0,"delta":{"content":","},"logprobs":null,"finish_reason":null}],"obfuscation":"sxHSdcRBR5Tk"}

data: {"id":"01K7M30E5QP6GCQM3YKX1NRY8Q","object":"chat.completion.chunk","created":1760537098,"model":"gpt-4o-2024-08-06","service_tier":"default","system_fingerprint":"fp_eb3c3cb84d","choices":[{"index":0,"delta":{"content":" opening"},"logprobs":null,"finish_reason":null}],"obfuscation":"R8up2"}

data: [DONE]

Retries & fallbacks

Retries: automatically attempting a failed API call on specific error codesFallbacks: using an alternative fallback model, if the primary fails or hits rate limits

For example, if gpt-4o hits a rate limit or downtime, the request automatically retries and may fall back to Anthropic or another model. This reduces downtime and ensures your agents remain responsive.

curl --location 'https://api.orq.ai/v2/router/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $ORQ_API_KEY' \
--data-raw '{
  "model": "openai/gpt-4o",
  "messages": [
    { "role": "user", "content": "Explain Orq AI retries and fallbacks." }
  ],
  "orq": {
    "retry": { "count": 3, "on_codes": [429, 500, 502, 503, 504] },
    "fallbacks": [
      { "model": "anthropic/claude-3-5-sonnet-20241022" },
      { "model": "openai/gpt-4o-mini" }
    ]
  }
}'

count : number of automatic retries on failure codes. fallbacks : list of alternative models Orq.ai tries if primary fails.

Caching

Caching: storing and reusing previous API responses for identical requests to reduce latency and costs

For example: A repeated FAQ query will return instantly from cache instead of hitting the model.

"orq": {
  "cache": {
    "type": "exact_match",
    "ttl": 1800
  }
}

exact_match : caches identical requests and reuses responses. ttl: 1800 : cache entries expire after 30 minutes (1800 seconds).

Adding a knowledge base

You can ground the conversation in domain-specific knowledge by linking knowledge_bases.

"orq": {
  "knowledge_bases": [
    { "knowledge_id": "api-documentation", "top_k": 5, "threshold": 0.75 },
    { "knowledge_id": "integration-examples", "top_k": 3, "threshold": 0.8 }
  ]
}

Contact & Thread Tracking

Contact and thread tracking: associating API requests with specific users and conversation sessions to enable analytics, maintain context, and organize interactions for auditing and reporting purposes.

For example, with contact and thread tracking you can cluster messages in threads for analytics and observability, track ongoing customer sessions and maintain conversation context. Another use-case is auditing and reporting on customer support interactions.

"orq": {
  "contact": {
    "id": "enterprise_customer_001",
    "display_name": "Enterprise User",
    "email": "[email protected]"
  },
  "thread": {
    "id": "support_session_001",
    "tags": ["api-integration", "enterprise", "technical-support"]
  }
}

Dynamic inputs

Dynamic inputs: variable parameters passed at runtime that customize prompt templates or model behavior for specific contexts, users, or use cases.

For example, the orq object with inputs provides the variable values that get injected into prompt templates using {{variable _name}} syntax. Prompts are personalized for each user/session without rewriting messages manually.

"orq": {
  "inputs": {
    "company_name": "Orq AI",
    "customer_tier": "Enterprise",
    "use_case": "e-commerce platform"
  }
}

Building a Reliable Customer Support Agent

Imagine you’re creating a customer support agent for your company, Orq AI, which helps enterprise customers integrate APIs. You want it to be:

Reliable

automatically retry or fallback if a model fails

Contextually aware

grounded in internal documentation and examples

Fast and cost-efficient

using caching for repeated queries

Traceable

track conversations per user and session

Personalized

dynamic prompts based on user type and project

Here is an example of a Customer Support Agent using principles above:

curl --location 'https://api.orq.ai/v2/router/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $ORQ_API_KEY' \
--data-raw '{
  "model": "openai/gpt-4o",
  "messages": [
    { "role": "system", "content": "You are a helpful customer support agent for {{company_name}}. Use available knowledge to assist {{customer_tier}} customers." },
    { "role": "user", "content": "I need help with API integration for my {{use_case}} project" }
  ],
  "orq": {
    "retry": { "count": 3, "on_codes": [429, 500, 502, 503, 504] },
    "fallbacks": [
      { "model": "anthropic/claude-3-5-sonnet-20241022" },
      { "model": "openai/gpt-4o-mini" }
    ],
    "cache": { "type": "exact_match", "ttl": 1800 },
    "knowledge_bases": [
      { "knowledge_id": "api-documentation", "top_k": 5, "threshold": 0.75 },
      { "knowledge_id": "integration-examples", "top_k": 3, "threshold": 0.8 }
    ],
    "contact": {
      "id": "enterprise_customer_001",
      "display_name": "Enterprise User",
      "email": "[email protected]"
    },
    "thread": {
      "id": "support_session_001",
      "tags": ["api-integration", "enterprise", "technical-support"]
    },
    "inputs": {
      "company_name": "Orq AI",
      "customer_tier": "Enterprise",
      "use_case": "e-commerce platform"
    }
  }
}'

Dynamic Inputs / Prompt Templating

Placeholders like {{company_name}}, {{customer_tier}}, and {{use_case}} are automatically replaced using the orq.inputs values.
Effect: Each customer gets a personalized, context-aware response without rewriting prompts.

Retries & Fallbacks

If gpt-4o fails or is rate-limited (429, 500 series), Orq.ai retries up to 3 times.
If retries fail, it automatically falls back to Anthropic Claude or GPT-4o-mini.
Effect: The agent remains highly reliable and doesn’t leave customers waiting.

Caching

Repeated queries with the same input return instantly from the cache (ttl: 1800s).
Effect: Reduces latency and API usage, saving costs and improving responsiveness.

Knowledge Bases

The agent pulls relevant documents from internal KBs like api-documentation or integration-examples.
Effect: Responses are grounded in your company’s content, making them accurate and trustworthy.

Contact & Thread Tracking

Each session is linked to a contact (user) and a thread (conversation cluster).
Effect: Enables session observability, analytics, and organized support tracking for enterprise customers.

Getting Started

Reference

Admin

Overview

Getting started with AI Gateway

Trouble shooting common errors

Streaming

Retries & fallbacks

Caching

Adding a knowledge base

Contact & Thread Tracking

Dynamic inputs

Building a Reliable Customer Support Agent

Getting Started

Reference

Admin

​Getting started with AI Gateway

​Trouble shooting common errors

​Streaming

​Retries & fallbacks

​Caching

​Adding a knowledge base

​Contact & Thread Tracking

​Dynamic inputs

​Building a Reliable Customer Support Agent

Getting started with AI Gateway

Trouble shooting common errors

Streaming

Retries & fallbacks

Caching

Adding a knowledge base

Contact & Thread Tracking

Dynamic inputs

Building a Reliable Customer Support Agent