Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

TL;DR

Learn how to use Orq AI Gateway
Connect primary & fallback AI providers to avoid vendor lock-in
Enable streaming for real-time responses and better UX
Add a knowledge base with your docs for contextual answers
Set up caching for recurring requests
Build a production-ready customer support agent in minutes

What we are going to build?

You will build a customer support application in Node.js using AI Gateway, where the support queries have access to the relevant business context from a knowledge base. The system will include a primary model (GPT-4o) and a fallback model (Claude Sonnet) that automatically activates during rate limits or outages. You’ll also learn to implement caching for user queries, contact tracing to monitor per-user LLM request volumes, and thread tracking to visualize complete conversation flows between users and the assistant.

What is AI gateway?

AI Gateway is a single unified API endpoint that lets you seamlessly route and manage requests across multiple AI model providers (e.g., OpenAI, Anthropic, Google, AWS). This functionality comes in handy, when you want to:

Avoid dependency on a single provider (vendor lock-in)
Automatically switch between providers in case of an outage
Scale reliably when the usage surges

Build the customer support chat

Set up the Node.js project

Inside your IDE of choice set up the Node.js project, in this tutorial we will use npm package manager, feel free to use alternatives such as pnpm.

npm init -y

Install Orq.ai SDK

npm add @orq-ai/node

Install the OpenAI SDK

npm install openai

Install TypeScript dependencies

npm install -D typescript @types/node tsx

Set up your API keys

npm install dotenv

First, inside Orq dashboard create a project that we can assign API keys to by clicking the + button next to Project menu:

Create a new project named CustomerSupport

To find the Orq API key navigate to Orq.ai dashboard and paste your API keys inside the pop-up window that appears after you click the Setup your own API key button

Workspace settings
API Keys
Copy your key

From the drop down you can select the CustomerSupport project to assign the API key to:

Create .env file. This is where you will paste your Orq API keys from the step above

echo "ORQ_API_KEY=your-orq-api-key-here" > .env

Add .env to your .gitignore

echo ".env" >> .gitignore

Create the customer-support.ts file with a Hello World example:

customer-support.ts

import 'dotenv/config';
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: 'https://api.orq.ai/v2/proxy'
});

async function main() {
  const response = await client.chat.completions.create({
    model: 'openai/gpt-4o',
    messages: [
      {
        role: 'user',
        content: 'Hello, world!'
      }
    ]
  });

  console.log(response.choices[0].message.content);
}

main().catch(console.error);

To execute the file from the terminal run:

npx tsx customer-support.ts

Streaming data in real time

In this step we will use OpenAI gpt-4o model to generate the responses. To connect any other model such as claude-3-5-sonnet follow the same steps. To enable models in Orq Ai Gateway :

Navigate to Integrations
Select OpenAI
Click on View integration

Click on Setup your own API key

Navigate back to Orq.ai dashboard and paste your API keys inside the pop-up window that appears after you click the Setup your own API key button

By default, when you make a POST request, the connection remains open until the entire response is ready, and then it closes.However, when you use streaming, the API switches to a Server-Sent Events (SSE) connection. This keeps the HTTP connection open and sends the response in small, real-time chunks as the data becomes available and is essential for real-time customer chat interactions.

customer-support.ts

import 'dotenv/config';
import { OpenAI } from 'openai';

// Use OpenAI SDK with Orq AI Gateway proxy
const client = new OpenAI({
  baseURL: "https://api.orq.ai/v2/proxy",
  apiKey: process.env.ORQ_API_KEY ?? '',
});

async function main() {
  try {
    console.log('--- Streaming started ---');

    let stream: any;
    try {
      // Use OpenAI SDK with Orq router for streaming
      stream = await client.chat.completions.create({
        model: 'openai/gpt-4o', // Use provider/model format
        messages: [{
          role: 'user',
          content: 'What are chunks in AI?'
        }],
        stream: true
      });

      console.log('Stream established successfully');
    } catch (e: any) {
      // Fallback for non-streaming
      console.log('Stream not available, falling back to non-streaming response');
      console.log('Error:', e?.message || e);

      const resp = await client.chat.completions.create({
        model: 'openai/gpt-4o',
        messages: [{
          role: 'user',
          content: 'What are chunks in AI?'
        }],
        stream: false
      });

      const content = resp.choices?.[0]?.message?.content ?? '';
      if (content) {
        process.stdout.write(String(content));
        console.log('\n--- Streaming finished ---');
        return;
      }
      console.log('\n(No content)');
      return;
    }

    // Iterate async chunks - router uses OpenAI-compatible format
    for await (const chunk of stream as any) {
      const content = chunk?.choices?.[0]?.delta?.content ?? '';

      if (content) {
        process.stdout.write(content);
      }

      if (process.env.VERBOSE_STREAM === 'true') {
        console.log('\n[chunk]', JSON.stringify(chunk, null, 2));
      }
    }

    console.log('\n--- Streaming finished ---');
  } catch (err: any) {
    console.error('Error:', err.message ?? err);
  }
}

main();

Streaming is ideal for applications where you want to display text to users as it’s generated, such as in chat interfaces or live assistants, improving perceived responsiveness:

Retries & fallbacks

Orq.ai allows automatic fallback to alternative models if the primary fails. If gpt-4o hits a rate limit or downtime, the request automatically retries and may fall back to Anthropic claude-3-5-sonnet or gpt 4o mini. Make sure that you have the models enabled in Orq.

import 'dotenv/config';
import OpenAI from 'openai';
import type { Stream } from 'openai/streaming';
import type { ChatCompletionChunk } from 'openai/resources/chat/completions';

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY!,
  baseURL: 'https://api.orq.ai/v2/proxy',
});

async function main() {
  const stream = await client.chat.completions.create({
    model: 'openai/gpt-4o',
    stream: true,
    messages: [
      { role: 'user', content: 'Explain what Streaming in Orq.ai is?' },
    ],

    orq: {
      retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
      fallbacks: [
        { model: 'openai/gpt-4o-mini' },
        { model: 'anthropic/claude-3-5-sonnet-20241022' },
      ],
    },
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    process.stdout.write(content);
  }
  console.log('\n');
}

main().catch(console.error);

Caching

Orq.ai supports response caching to reduce latency and API usage for repeated requests. It uses exact_match caching, where the cache key is generated from the exact model, messages, and all parameters, ensuring identical requests hit the cache. The TTL (time-to-live) specifies how long the response is cached (e.g., 3600 seconds for 1 hour, max 86400 seconds). Below is a TypeScript implementation with caching, retries, and fallbacks:

import 'dotenv/config';
import OpenAI from 'openai';

interface OrqConfig {
  retry?: {
    count: number;
    on_codes: number[];
  };
  fallbacks?: Array<{ model: string }>;
  cache?: {
    type: 'exact_match';
    ttl: number;
  };
}

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY ?? '',
  baseURL: 'https://api.orq.ai/v2/proxy',
});

async function main(): Promise<void> {
  try {
    const params = {
      model: 'openai/gpt-4o',
      stream: true as const,
      messages: [
        {
          role: 'user' as const,
          content: 'Explain what Streaming in Orq.ai is?',
        },
      ],
      orq: {
        retry: {
          count: 3,
          on_codes: [429, 500, 502, 503, 504],
        },
        fallbacks: [
          { model: 'anthropic/claude-3-5-sonnet-20241022' },
          { model: 'openai/gpt-4o-mini' },
        ],
        cache: {
          type: 'exact_match' as const,
          ttl: 3600, // 1 hour
        },
      },
    };

    const stream = await client.chat.completions.create(
      params as OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming
    );

    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content ?? '';
      process.stdout.write(content);
    }
    console.log('\n');
  } catch (error: unknown) {
    console.error('Error:', error instanceof Error ? error.message : String(error));
  }
}

main();

First time, when you run the code request inside Traces you will see cache-miss

Your cache is stored after you run the command for the first time. The reason why you see cache-miss the first time is because Orq.ai has no prior response stored for that exact cache key and the cache is initially empty for that key. You can read more about cache here When you run your request for the second time within the TTL inside Traces you will see cache-hit, meaning that Orq.ai retrieved successfully the cached response.

Knowledge Base

When to use:

When you want to enhance a foundational model’s responses with custom, domain-specific knowledge using Retrieval-Augmented Generation (RAG).
Orq.ai’s built-in RAG feature enables you to create a Knowledge Base with your documents (e.g., FAQs, manuals, or PDFs)
When you want to add a Vector Database (e.g., Pinecone, Qdrant) for control over embeddings and retrieval. For more see Using Vector databases with Orq

Knowledge Base inside Orq.ai support the following file types: pdf, txt, docx, csv, xls - 10mb max. Encrypted files are not supported.When you create a new Knowledge Base you have the control over the following variables:

`embedding_model`	You can select the `embedding_model` from supported models, which is a family of models that converts your input data (text, images etc.) into a vector embeddings (e.g.`text-embedding-3-large`)
`path`	Project name (e.g. `CustomerSupport`)
`key`	Come up with a unique key for your Knowledge Base (e.g. `Customer`)
`top_k`	Defines the maximum number of relevant chunks to retrieve from the Knowledge Base (e.g., `top_k: 5` retrieves up to 5 chunks)
`threshold`	Sets the minimum relevance score (0.0 to 1.0) for retrieved chunks (e.g., threshold: 0.7 filters chunks with scores below 0.7)
`search_type`	Specifies the search method for retrieving chunks ( e.g. `hybrid_search` combines keyword and semantic search)

Run the code to create a Knowledge Base:

customer-support.ts

import 'dotenv/config';
import { Orq } from '@orq-ai/node';

const orq = new Orq({
  apiKey: process.env.ORQ_API_KEY!,
});

async function createCustomerSupportKnowledge() {
  try {
    const result = await orq.knowledge.create({
      embeddingModel: 'text-embedding-3-large',
      path: 'CustomerSupport',     // Name of your project 
      key: 'Customer',             // Needs to be a unique key
      topK: 5,                     // Maximum number of relevant chunks to retrieve
      threshold: 0.7,              // Minimum relevance score (0.0 to 1.0)
      searchType: 'hybrid_search'  // Search method: 'hybrid_search', 'semantic', or 'keyword'
    });

    console.log('Knowledge base created successfully:', result);
    return result;
  } catch (error: any) {
    if (error.statusCode === 400 && error.body?.includes('already exists')) {
      console.log('Knowledge base "Customer" already exists. Retrieving existing knowledge base...');

      try {
        // Try to get the existing knowledge base
        const existing = await orq.knowledge.get({ key: 'Customer' });
        console.log('Using existing knowledge base:', existing);
        return existing;
      } catch (getError) {
        console.log('Could not retrieve existing knowledge base.');
        return { key: 'Customer', status: 'exists' };
      }
    }

    console.error('Error creating knowledge base:', error);
    throw error;
  }
}

createCustomerSupportKnowledge();

This is how a successful response should look like:

{
  _id: '$YOUR_KNOWLEDGE_ID',
  created: '2025-10-29T10:44:10.011Z',
  created_by_id: null,
  key: 'Customer',
  model: 'openai/text-embedding-3-large',
  domain_id: 'domain-id',
  path: 'CustomerSupport',
  retrieval_settings: { retrieval_type: 'hybrid_search', top_k: 5, threshold: 0 },
  updated_by_id: null,
  updated: '2025-10-29T10:44:10.011Z'
}

Make sure to save the Knowledge ID _id as YOUR_KNOWLEDGE_ID in the .env file.

echo 'YOUR_KNOWLEDGE_ID=$YOUR_KNOWLEDGE_ID' >> .env

If you want to complete this step with a GUI see Create a Knowledge

Add files to the Knowledge Base

Inside the main repository create documents directory and put the documents that you want to upload there. Orq.ai supports document types such as pdf, txt, docx, csv, xls - 10mb max.Run the following code to upload the documents:

customer-support.ts

import 'dotenv/config';
import { Orq } from '@orq-ai/node';
import fs from 'fs';
import path from 'path';
import { fileURLToPath } from 'url';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

const orq = new Orq({
  apiKey: process.env.ORQ_API_KEY!
});

const filePath = path.join(__dirname, 'documents', 'CustomerSupportDoc.pdf');

orq.files.create({
  file: new File([fs.readFileSync(filePath)], 'CustomerSupportDoc.pdf', {
    type: 'application/pdf'
  }),
  purpose: 'retrieval'
})
  .then((data) => console.log(data))
  .catch(err => console.error(err));

This is how a successful response should look like:

{
  _id: '$FILE_ID',
  object_name: 'files-api/workspaces/workspace-id/retrieval/$FILE_ID.pdf',
  purpose: 'retrieval',
  file_name: '$FILE_ID.pdf',
  workspace_id: 'workspace-id',
  bytes: 118199,
  created: '2025-10-29T11:22:56.732Z'
}

Add the file id _id to the .env file:

echo 'FILE_ID=$FILE_ID' >> .env

If you want to do this step with a GUI see: Create file

Connect the files with the Knowledge Base as datasource

import 'dotenv/config';
import { Orq } from '@orq-ai/node';

const orq = new Orq({ apiKey: process.env.ORQ_API_KEY! });

// Create datasource and search functions
const createDatasource = () => orq.knowledge.createDatasource({
  knowledgeId: process.env.YOUR_KNOWLEDGE_ID!,
  requestBody: { fileId: process.env.FILE_ID!, displayName: 'CustomerSupportDocs' }
});

const searchKnowledge = (question: string) => orq.knowledge.search({
  knowledgeId: process.env.YOUR_KNOWLEDGE_ID!,
  requestBody: { query: question, topK: 5 }
});

// Execute
createDatasource()
  .then(result => console.log('Datasource created successfully:', result))
  .catch(console.error);

export { createDatasource, searchKnowledge };

This is how a successful response looks like:

{
  _id: '$YOUR_KNOWLEDGE_ID',
  display_name: 'CustomerSupportDocs',
  file_id: '$FILE_ID',
  knowledge_id: '$YOUR_KNOWLEDGE_ID',
  status: 'queued',
  created: '2025-10-29T11:36:43.916Z',
  updated: '2025-10-29T11:36:43.916Z',
  created_by_id: null,
  update_by_id: null,
  chunks_count: 0
}

Add YOUR_KNOWLEDGE_IDto the .env

echo "YOUR_KNOWLEDGE_ID=$YOUR_KNOWLEDGE_ID" >> .env

Now, you will be able to see the uploaded file under your Knowledge Base:

To do this step with GUI check Creating a new datasourceWhen you upload documents to a Knowledge Base, Orq.ai breaks them down into smaller pieces of text called chunks. Think of it like dividing a book into manageable paragraphs or sections rather than trying to process the entire book at once.

This is the customer support chat with connected Knowledge Base:

import 'dotenv/config';
import OpenAI from 'openai';

interface OrqConfig {
  retry?: { count: number; on_codes: number[] };
  fallbacks?: Array<{ model: string }>;
  cache?: { type: 'exact_match'; ttl: number };
  knowledge_bases?: Array<{
    knowledge_id: string;
    top_k: number;
    threshold: number;
    search_type: 'hybrid_search';
  }>;
}

// Initialize the OpenAI client
const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY ?? '',
  baseURL: 'https://api.orq.ai/v2/router',
});

async function main(): Promise<void> {
  try {
    const requestParams = {
      model: 'openai/gpt-4o',
      stream: true,
      messages: [
        { role: 'user' as const, content: 'What are the best practices for customer support?' },
      ],
      orq: {
        retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
        fallbacks: [
          { model: 'anthropic/claude-3-5-sonnet-20241022' },
          { model: 'openai/gpt-4o-mini' },
        ],
        cache: { type: 'exact_match' as const, ttl: 3600 },
        knowledge_bases: [
          {
            knowledge_id: process.env.YOUR_KNOWLEDGE_ID!,
            top_k: 5,
            threshold: 0.7,
            search_type: 'hybrid_search' as const,
          },
        ],
      },
    };

    console.log('Request:', JSON.stringify(requestParams, null, 2));

    const start = Date.now();
    const stream = await client.chat.completions.create(
      requestParams as OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming
    );

    let chunkCount = 0;
    for await (const chunk of stream) {
      chunkCount++;
      const content = chunk.choices[0]?.delta?.content ?? '';
      process.stdout.write(content);
    }

    console.log(`\n\nTime taken: ${Date.now() - start}ms, Chunks: ${chunkCount}`);
    console.log('Cache status: First run is always a cache miss; run again to check for hit.');
  } catch (error: unknown) {
    console.error('Error:', error instanceof Error ? error.message : String(error));
  }
}

main();

Once you run the code you will be able to see the knowledge base retrieval on the Orq.ai dashboard

Contact Tracking

When to use:

You want to identify and remember the user between chats or sessions.
You need to audit who asked what (e.g., Alice Smith asked about “refunds”).
You’re building user profiles, dashboards, or integrating with a CRM (e.g., Salesforce, HubSpot).
If your application involves external b2b clients and you want to monitor how many calls your client and at what cost is doing to your application

For more details see Contact TrackingIf you are prototyping with cURL paste the code snipped with YOUR_API_KEY, YOUR_CONTACT_ID and YOUR_DEPLOYMENT_KEY variables:

import 'dotenv/config';
import OpenAI from 'openai';

// Define the custom `orq` interface for TypeScript
interface OrqConfig {
  retry?: { count: number; on_codes: number[] };
  fallbacks?: Array<{ model: string }>;
  cache?: { enabled: boolean; type: 'exact_match'; ttl: number };
  knowledge_bases?: Array<{
    knowledge_id: string;
    top_k: number;
    threshold: number;
    search_type: 'hybrid_search';
  }>;
  contact?: {
    id: string;
    display_name?: string;
    email?: string;
    metadata?: Array<{ key: string; value: any }>; // Array of key-value pairs
    tags?: string[];
  };
}

// Initialize the OpenAI client
const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY ?? '',
  baseURL: 'https://api.orq.ai/v2/router',
});

async function main(): Promise<void> {
  try {
    if (!process.env.YOUR_KNOWLEDGE_ID) {
      throw new Error('YOUR_KNOWLEDGE_ID not set in .env');
    }
    const requestParams = {
      model: 'openai/gpt-4o',
      stream: true,
      messages: [
        { role: 'user' as const, content: 'How do I upgrade my account?' },
      ],
      orq: {
        retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
        fallbacks: [
          { model: 'anthropic/claude-3-5-sonnet-20241022' },
          { model: 'google/gemini-1.5-pro' },
          { model: 'openai/gpt-4o-mini' },
        ],
        cache: { enabled: true, type: 'exact_match', ttl: 3600 },
        knowledge_bases: [
          {
            knowledge_id: process.env.YOUR_KNOWLEDGE_ID, // e.g., ID for "ORQsupport"
            top_k: 5,
            threshold: 0.7,
            search_type: 'hybrid_search',
          },
        ],
        contact: {
          id: 'support-TICKET-789', // Unique ticket ID
          display_name: 'John Smith',
          email: '[email protected]',
          metadata: [
            { key: 'ticket_id', value: 'TICKET-789' },
            { key: 'customer_tier', value: 'premium' },
            { key: 'issue_category', value: 'billing' },
            { key: 'created_at', value: new Date().toISOString() },
          ],
          tags: ['support', 'billing-issue', 'premium-user'],
        },
      },
    };
    console.log('Request:', JSON.stringify(requestParams, null, 2));

    const start = Date.now();
    const stream = await client.chat.completions.create(
      requestParams as OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming
    );

    let responseText = '';
    let chunkCount = 0;
    for await (const chunk of stream) {
      chunkCount++;
      const content = chunk.choices[0]?.delta?.content ?? '';
      responseText += content;
      process.stdout.write(content);
    }

    console.log(`\nTime taken: ${Date.now() - start}ms, Chunks: ${chunkCount}`);
    console.log('Full Response:', responseText);
    console.log('Cache status: First run is always a cache miss; run again to check for hit.');
  } catch (error: unknown) {
    console.error('Error:', error instanceof Error ? error.message : String(error));
  }
}

main();

Once your code snippet runs successfully you will be able to see under Contact Analytics the number of requests that the contact you selected sent and control the budget

Thread tracking

When to use:

Understand the back-and-forth between the user and the assistant
Track context drift in long conversations
Make sense of multi-step conversations at a glance

To enable contact tracing try this version of Customer Support app. To learn more see Threads

import 'dotenv/config';
import OpenAI from 'openai';

// Define the custom `orq` interface for TypeScript
interface OrqConfig {
  retry?: { count: number; on_codes: number[] };
  fallbacks?: Array<{ model: string }>;
  cache?: { enabled: boolean; type: 'exact_match'; ttl: number };
  knowledge_bases?: Array<{
    knowledge_id: string;
    top_k: number;
    threshold: number;
    search_type: 'hybrid_search';
  }>;
  contact?: {
    id: string;
    display_name?: string;
    email?: string;
    metadata?: Array<{ key: string; value: any }>;
    tags?: string[];
  };
  thread?: {
    id: string;
    tags?: string[];
  };
}

// Initialize the OpenAI client
const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY ?? '',
  baseURL: 'https://api.orq.ai/v2/router',
});

async function main(): Promise<void> {
  try {
    if (!process.env.YOUR_KNOWLEDGE_ID) {
      throw new Error('YOUR_KNOWLEDGE_ID not set in .env');
    }
    const ticketId = 'TICKET-789';
    const threadId = `support-${ticketId}-${Date.now()}`; // Unique thread ID
    const requestParams = {
      model: 'openai/gpt-4o',
      stream: true,
      messages: [
        { role: 'user' as const, content: 'How do I upgrade my account?' },
      ],
      orq: {
        retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
    fallbacks: [
      { model: 'openai/gpt-4o-mini' },
      { model: 'anthropic/claude-3-5-sonnet-20241022' },
      { model: 'google/gemini-1.5-pro' },
    ],
        cache: { enabled: true, type: 'exact_match', ttl: 3600 },
        knowledge_bases: [
          {
            knowledge_id: process.env.YOUR_KNOWLEDGE_ID, // e.g., ID for "ORQsupport"
            top_k: 5,
            threshold: 0.7,
            search_type: 'hybrid_search',
          },
        ],
        contact: {
          id: `support-${ticketId}`,
          display_name: 'John Smith',
          email: '[email protected]',
          metadata: [
            { key: 'ticket_id', value: ticketId },
            { key: 'customer_tier', value: 'premium' },
            { key: 'issue_category', value: 'billing' },
            { key: 'created_at', value: new Date().toISOString() },
          ],
          tags: ['support', 'billing-issue', 'premium-user'],
        },
        thread: {
          id: threadId,
          tags: ['support', 'billing', 'user-interaction'],
        },
      },
    };
    console.log('Request:', JSON.stringify(requestParams, null, 2));

    const start = Date.now();
    const stream = await client.chat.completions.create(
      requestParams as OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming
    );

    let responseText = '';
    let chunkCount = 0;
    for await (const chunk of stream) {
      chunkCount++;
      const content = chunk.choices[0]?.delta?.content ?? '';
      responseText += content;
      process.stdout.write(content);
    }

    console.log(`\nTime taken: ${Date.now() - start}ms, Chunks: ${chunkCount}`);
    console.log('Full Response:', responseText);
    console.log('Cache status: First run is always a cache miss; run again to check for hit.');
    console.log(`Thread ID: ${threadId}, Contact ID: support-${ticketId}`);
  } catch (error: unknown) {
    console.error('Error:', error instanceof Error ? error.message : String(error));
  }
}

main();

Once your code snippet runs successfully you will be able to see under Traces —> Threads and see a detailed break-down of your API call

If you send a request again and you will use the same thread.id support-TICKET-789-<timestamp>) for both initial and follow-up requests to group them in the same thread:

Dynamic Inputs

When to use:

Whenever you want your script, program, or tool to handle variable data at runtime instead of hardcoding values Using Third Party Vector Databases with Orq.ai

import 'dotenv/config';
import OpenAI from 'openai';
import * as readline from 'readline/promises';
import { stdin as input, stdout as output } from 'process';

// Define the custom `orq` interface for TypeScript
interface OrqConfig {
  retry?: { count: number; on_codes: number[] };
  fallbacks?: Array<{ model: string }>;
  cache?: { enabled: boolean; type: 'exact_match'; ttl: number };
  knowledge_bases?: Array<{
    knowledge_id: string;
    top_k: number;
    threshold: number;
    search_type: 'hybrid_search';
  }>;
  contact?: {
    id: string;
    display_name?: string;
    email?: string;
    metadata?: Array<{ key: string; value: any }>;
    tags?: string[];
  };
  thread?: {
    id: string;
    tags?: string[];
  };
}

// Initialize the OpenAI client
const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY ?? '',
  baseURL: 'https://api.orq.ai/v2/router',
});

// Initialize readline for dynamic input
const rl = readline.createInterface({ input, output });

// Base configuration
const ticketId = 'TICKET-789';
const threadId = `support-${ticketId}-${Date.now()}`; // Unique thread ID
const contactId = `support-${ticketId}`;
const baseParams = {
  model: 'openai/gpt-4o',
  stream: true,
  orq: {
    retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
    fallbacks: [
      { model: 'openai/gpt-4o-mini' },
      { model: 'anthropic/claude-3-5-sonnet-20241022' },
      { model: 'google/gemini-1.5-pro' },
    ],
    cache: { enabled: true, type: 'exact_match', ttl: 3600 },
    knowledge_bases: [
      {
        knowledge_id: process.env.YOUR_KNOWLEDGE_ID ?? '',
        top_k: 5,
        threshold: 0.7,
        search_type: 'hybrid_search',
      },
    ],
    contact: {
      id: contactId,
      display_name: 'John Smith',
      email: '[email protected]',
      metadata: [
        { key: 'ticket_id', value: ticketId },
        { key: 'customer_tier', value: 'premium' },
        { key: 'issue_category', value: 'billing' },
        { key: 'created_at', value: new Date().toISOString() },
      ],
      tags: ['support', 'billing-issue', 'premium-user'],
    },
    thread: {
      id: threadId,
      tags: ['support', 'billing', 'user-interaction'],
    },
  },
};

async function sendRequest(
  params: OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming,
  requestLabel: string
): Promise<string> {
  console.log(`\n--- ${requestLabel} ---`);
  console.log('Request:', JSON.stringify(params, null, 2));
  const start = Date.now();
  const stream = await client.chat.completions.create(params);
  let responseText = '';
  let chunkCount = 0;

  for await (const chunk of stream) {
    chunkCount++;
    const content = chunk.choices[0]?.delta?.content ?? '';
    responseText += content;
    process.stdout.write(content);
  }

  console.log(`\nTime taken: ${Date.now() - start}ms, Chunks: ${chunkCount}`);
  console.log('Full Response:', responseText);
  console.log(`Thread ID: ${threadId}, Contact ID: ${contactId}`);
  return responseText;
}

async function main(): Promise<void> {
  try {
    if (!process.env.YOUR_KNOWLEDGE_ID) {
      throw new Error('YOUR_KNOWLEDGE_ID not set in .env');
    }

    // Store conversation history
    const conversationHistory: Array<{ role: 'user' | 'assistant'; content: string }> = [];

    // First dynamic input
    let userInput = await rl.question('Enter your first question (e.g., "How do I upgrade my account?"): ');
    if (!userInput.trim()) {
      throw new Error('First input cannot be empty');
    }

    const initialParams = {
      ...baseParams,
      messages: [{ role: 'user' as const, content: userInput }],
    };
    const initialResponse = await sendRequest(initialParams, 'First Request');
    conversationHistory.push(
      { role: 'user', content: userInput },
      { role: 'assistant', content: initialResponse }
    );
    console.log('Cache status: First run is always a cache miss; run again to check for hit.');

    // Second dynamic input
    userInput = await rl.question('Enter your follow-up question (e.g., "I didn’t receive the confirmation email"): ');
    if (!userInput.trim()) {
      throw new Error('Follow-up input cannot be empty');
    }

    const followUpParams = {
      ...baseParams,
      messages: [...conversationHistory, { role: 'user' as const, content: userInput }],
      orq: {
        ...baseParams.orq,
        thread: {
          id: threadId, // Same thread ID
          tags: ['support', 'billing', 'user-interaction', 'follow-up'],
        },
      },
    };
    const followUpResponse = await sendRequest(followUpParams, 'Follow-up Request');
    conversationHistory.push(
      { role: 'user', content: userInput },
      { role: 'assistant', content: followUpResponse }
    );
    console.log('Cache status: Check if cached (if messages match previous run).');

  } catch (error: unknown) {
    console.error('Error:', error instanceof Error ? error.message : String(error));
  } finally {
    rl.close();
  }
}

main();

Advanced framework integrations

Orq.ai’s AI Gateway seamlessly integrates with popular AI development frameworks, allowing you to leverage existing tools and workflows while benefiting from gateway features like fallbacks, caching, and observability.

LangChain Integration

Orq.ai works natively with LangChain by simply pointing to the gateway endpoint. This gives you access to fallback models, caching, and knowledge base retrieval while using LangChain’s abstractions. For more detailed guide see LangChain integration

import { ChatOpenAI } from "@langchain/openai";

// Configure LangChain to use Orq.ai gateway
const llm = new ChatOpenAI({
  configuration: {
    baseURL: "https://api.orq.ai/v2/proxy",
  },
  openAIApiKey: "YOUR_API_KEY",
  modelName: "openai/gpt-4o",
});

const response = await llm.invoke("How do I reset my password?");

DSPy

DSPy programs can route through Orq.ai to gain automatic prompt optimization alongside gateway reliability features. For more detailed guide see DSPy Integration

import * as dspy from "dspy-ai";

// Configure DSPy with Orq.ai gateway
const lm = new dspy.OpenAI({
  apiBase: "https://api.orq.ai/v2/proxy",
  apiKey: "YOUR_API_KEY",
  model: "openai/gpt-4o"
});

dspy.settings.configure({ lm: lm });

Base URL configuration

# Orq.ai Cloud (default)
https://api.orq.ai/v2/proxy

# Your on-premises deployment
https://your-domain.com/v2/proxy

Conclusion

Orq.ai’s AI Gateway provides a unified, scalable, and production-ready solution for building reliable AI applications. By routing through a single API endpoint, you gain:

Unified access: Connect to multiple AI providers (OpenAI, Anthropic, AWS) through one API
High availability: Automatic fallbacks and retries ensure your application stays online
Cost efficiency: Response caching reduces API costs and latency
Smart context: Built-in knowledge base integration for domain-specific answers
Production observability: Comprehensive traces and OTEL compatibility for monitoring
Flexible deployment: Cloud, on-premises, or edge options to meet your needs
High availability: Automatic fallbacks and retries ensure your application stays online
**Cost efficiency **: Response caching reduces API costs and latency
Smart context : Built-in knowledge base integration for domain-specific answers
Production observability : Comprehensive traces and OTEL compatibility for monitoring
Flexible deployment: Cloud, on-premises, or edge options to meet your needs

Cookbooks

Common Architecture

Learn

Building customer support chat with AI Gateway

TL;DR

What we are going to build?

What is AI gateway?

Build the customer support chat

Advanced framework integrations

LangChain Integration

DSPy

Base URL configuration

Conclusion

Cookbooks

Common Architecture

Learn

TL;DR

​What we are going to build?

​What is AI gateway?

​Build the customer support chat

​Advanced framework integrations

​LangChain Integration

​DSPy

​Base URL configuration

​Conclusion

What we are going to build?

What is AI gateway?

Build the customer support chat

Advanced framework integrations

LangChain Integration

DSPy

Base URL configuration

Conclusion