Skip to main content

TL;DR

  • Learn how to use Orq AI Gateway
  • Connect primary & fallback AI providers to avoid vendor lock-in
  • Enable streaming for real-time responses and better UX
  • Add a knowledge base with your docs for contextual answers
  • Set up caching for recurring requests
  • Build a production-ready customer support agent in minutes

What we are going to build?

You will build a customer support application in Node.js using AI Gateway, where the support queries have access to the relevant business context from a knowledge base. The system will include a primary model (GPT-4o) and a fallback model (Claude Sonnet) that automatically activates during rate limits or outages. You’ll also learn to implement caching for user queries, contact tracing to monitor per-user LLM request volumes, and thread tracking to visualize complete conversation flows between users and the assistant.

What is AI gateway?

AI Gateway is a single unified API endpoint that lets you seamlessly route and manage requests across multiple AI model providers (e.g., OpenAI, Anthropic, Google, AWS). This functionality comes in handy, when you want to:
  • Avoid dependency on a single provider (vendor lock-in)
  • Automatically switch between providers in case of an outage
  • Scale reliably when the usage surges

Build the customer support chat

1

Set up the Node.js project

Inside your IDE of choice set up the Node.js project, in this tutorial we will use npm package manager, feel free to use alternatives such as pnpm.
npm init -y
Install Orq.ai SDK
npm add @orq-ai/node
Install the OpenAI SDK
npm install openai
Install TypeScript dependencies
npm install -D typescript @types/node tsx
Set up your API keys
npm install dotenv
First, inside Orq dashboard create a project that we can assign API keys to by clicking the + button next to Project menu:Add projectCreate a new project named CustomerSupportCustomerSupportTo find the Orq API key navigate to Orq.ai dashboard and paste your API keys inside the pop-up window that appears after you click the Setup your own API key button
  1. Workspace settings
  2. API Keys
  3. Copy your key How to find API Key in Orq
From the drop down you can select the CustomerSupport project to assign the API key to:Add API KeyCreate .env file. This is where you will paste your Orq API keys from the step above
echo "ORQ_API_KEY=your-orq-api-key-here" > .env
Add .env to your .gitignore
echo ".env" >> .gitignore
Create the customer-support.ts file with a Hello World example:
customer-support.ts
import 'dotenv/config';
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: 'https://api.orq.ai/v2/proxy'
});

async function main() {
  const response = await client.chat.completions.create({
    model: 'openai/gpt-4o',
    messages: [
      {
        role: 'user',
        content: 'Hello, world!'
      }
    ]
  });

  console.log(response.choices[0].message.content);
}

main().catch(console.error);
To execute the file from the terminal run:
npx tsx customer-support.ts 
Hello world
2

Streaming data in real time

In this step we will use OpenAI gpt-4o model to generate the responses. To connect any other model such as claude-3-5-sonnet follow the same steps. To enable models in Orq Ai Gateway :
  1. Navigate to Integrations
  2. Select OpenAI
  3. Click on View integration
Streaming dataClick on Setup your own API keySet up APILog in to OpenAI’s API platform and copy your secret key:OpenAINavigate back to Orq.ai dashboard and paste your API keys inside the pop-up window that appears after you click the Setup your own API key buttonOpenAI setupBy default, when you make a POST request, the connection remains open until the entire response is ready, and then it closes.However, when you use streaming, the API switches to a Server-Sent Events (SSE) connection. This keeps the HTTP connection open and sends the response in small, real-time chunks as the data becomes available and is essential for real-time customer chat interactions.
customer-support.ts
import 'dotenv/config';
import { OpenAI } from 'openai';

// Use OpenAI SDK with Orq AI Gateway proxy
const client = new OpenAI({
  baseURL: "https://api.orq.ai/v2/proxy",
  apiKey: process.env.ORQ_API_KEY ?? '',
});

async function main() {
  try {
    console.log('--- Streaming started ---');

    let stream: any;
    try {
      // Use OpenAI SDK with Orq router for streaming
      stream = await client.chat.completions.create({
        model: 'openai/gpt-4o', // Use provider/model format
        messages: [{
          role: 'user',
          content: 'What are chunks in AI?'
        }],
        stream: true
      });

      console.log('Stream established successfully');
    } catch (e: any) {
      // Fallback for non-streaming
      console.log('Stream not available, falling back to non-streaming response');
      console.log('Error:', e?.message || e);

      const resp = await client.chat.completions.create({
        model: 'openai/gpt-4o',
        messages: [{
          role: 'user',
          content: 'What are chunks in AI?'
        }],
        stream: false
      });

      const content = resp.choices?.[0]?.message?.content ?? '';
      if (content) {
        process.stdout.write(String(content));
        console.log('\n--- Streaming finished ---');
        return;
      }
      console.log('\n(No content)');
      return;
    }

    // Iterate async chunks - router uses OpenAI-compatible format
    for await (const chunk of stream as any) {
      const content = chunk?.choices?.[0]?.delta?.content ?? '';

      if (content) {
        process.stdout.write(content);
      }

      if (process.env.VERBOSE_STREAM === 'true') {
        console.log('\n[chunk]', JSON.stringify(chunk, null, 2));
      }
    }

    console.log('\n--- Streaming finished ---');
  } catch (err: any) {
    console.error('Error:', err.message ?? err);
  }
}

main();
Streaming is ideal for applications where you want to display text to users as it’s generated, such as in chat interfaces or live assistants, improving perceived responsiveness:
3

Retries & fallbacks

Orq.ai allows automatic fallback to alternative models if the primary fails. If gpt-4o hits a rate limit or downtime, the request automatically retries and may fall back to Anthropic claude-3-5-sonnet or gpt 4o mini. Make sure that you have the models enabled in Orq.
import 'dotenv/config';
import OpenAI from 'openai';
import type { Stream } from 'openai/streaming';
import type { ChatCompletionChunk } from 'openai/resources/chat/completions';

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY!,
  baseURL: 'https://api.orq.ai/v2/proxy',
});

async function main() {
  const stream = await client.chat.completions.create({
    model: 'openai/gpt-4o',
    stream: true,
    messages: [
      { role: 'user', content: 'Explain what Streaming in Orq.ai is?' },
    ],

    orq: {
      retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
      fallbacks: [
        { model: 'openai/gpt-4o-mini' },
        { model: 'anthropic/claude-3-5-sonnet-20241022' },
      ],
    },
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    process.stdout.write(content);
  }
  console.log('\n');
}

main().catch(console.error);
4

Caching

Orq.ai supports response caching to reduce latency and API usage for repeated requests. It uses exact_match caching, where the cache key is generated from the exact model, messages, and all parameters, ensuring identical requests hit the cache. The TTL (time-to-live) specifies how long the response is cached (e.g., 3600 seconds for 1 hour, max 86400 seconds). Below is a TypeScript implementation with caching, retries, and fallbacks:
import 'dotenv/config';
import OpenAI from 'openai';

interface OrqConfig {
  retry?: {
    count: number;
    on_codes: number[];
  };
  fallbacks?: Array<{ model: string }>;
  cache?: {
    type: 'exact_match';
    ttl: number;
  };
}

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY ?? '',
  baseURL: 'https://api.orq.ai/v2/proxy',
});

async function main(): Promise<void> {
  try {
    const params = {
      model: 'openai/gpt-4o',
      stream: true as const,
      messages: [
        {
          role: 'user' as const,
          content: 'Explain what Streaming in Orq.ai is?',
        },
      ],
      orq: {
        retry: {
          count: 3,
          on_codes: [429, 500, 502, 503, 504],
        },
        fallbacks: [
          { model: 'anthropic/claude-3-5-sonnet-20241022' },
          { model: 'openai/gpt-4o-mini' },
        ],
        cache: {
          type: 'exact_match' as const,
          ttl: 3600, // 1 hour
        },
      },
    };

    const stream = await client.chat.completions.create(
      params as OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming
    );

    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content ?? '';
      process.stdout.write(content);
    }
    console.log('\n');
  } catch (error: unknown) {
    console.error('Error:', error instanceof Error ? error.message : String(error));
  }
}

main();
First time, when you run the code request inside Traces you will see cache-missCache missYour cache is stored after you run the command for the first time. The reason why you see cache-miss the first time is because Orq.ai has no prior response stored for that exact cache key and the cache is initially empty for that key. You can read more about cache here When you run your request for the second time within the TTL inside Traces you will see cache-hit, meaning that Orq.ai retrieved successfully the cached response.Cache hit
5

Knowledge Base

When to use:
  • When you want to enhance a foundational model’s responses with custom, domain-specific knowledge using Retrieval-Augmented Generation (RAG).
  • Orq.ai’s built-in RAG feature enables you to create a Knowledge Base with your documents (e.g., FAQs, manuals, or PDFs)
  • When you want to add a Vector Database (e.g., Pinecone, Qdrant) for control over embeddings and retrieval. For more see Using Vector databases with Orq
Knowledge Base inside Orq.ai support the following file types: pdf, txt, docx, csv, xls - 10mb max. Encrypted files are not supported.When you create a new Knowledge Base you have the control over the following variables:
embedding_modelYou can select the embedding_model from supported models, which is a family of models that converts your input data (text, images etc.) into a vector embeddings (e.g.text-embedding-3-large)
pathProject name (e.g. CustomerSupport)
keyCome up with a unique key for your Knowledge Base (e.g. Customer)
top_kDefines the maximum number of relevant chunks to retrieve from the Knowledge Base (e.g., top_k: 5 retrieves up to 5 chunks)
thresholdSets the minimum relevance score (0.0 to 1.0) for retrieved chunks (e.g., threshold: 0.7 filters chunks with scores below 0.7)
search_typeSpecifies the search method for retrieving chunks ( e.g. hybrid_search combines keyword and semantic search)
Run the code to create a Knowledge Base:
customer-support.ts
import 'dotenv/config';
import { Orq } from '@orq-ai/node';

const orq = new Orq({
  apiKey: process.env.ORQ_API_KEY!,
});

async function createCustomerSupportKnowledge() {
  try {
    const result = await orq.knowledge.create({
      embeddingModel: 'text-embedding-3-large',
      path: 'CustomerSupport',     // Name of your project 
      key: 'Customer',             // Needs to be a unique key
      topK: 5,                     // Maximum number of relevant chunks to retrieve
      threshold: 0.7,              // Minimum relevance score (0.0 to 1.0)
      searchType: 'hybrid_search'  // Search method: 'hybrid_search', 'semantic', or 'keyword'
    });

    console.log('Knowledge base created successfully:', result);
    return result;
  } catch (error: any) {
    if (error.statusCode === 400 && error.body?.includes('already exists')) {
      console.log('Knowledge base "Customer" already exists. Retrieving existing knowledge base...');

      try {
        // Try to get the existing knowledge base
        const existing = await orq.knowledge.get({ key: 'Customer' });
        console.log('Using existing knowledge base:', existing);
        return existing;
      } catch (getError) {
        console.log('Could not retrieve existing knowledge base.');
        return { key: 'Customer', status: 'exists' };
      }
    }

    console.error('Error creating knowledge base:', error);
    throw error;
  }
}

createCustomerSupportKnowledge();
This is how a successful response should look like:

{
  _id: '$YOUR_KNOWLEDGE_ID',
  created: '2025-10-29T10:44:10.011Z',
  created_by_id: null,
  key: 'Customer',
  model: 'openai/text-embedding-3-large',
  domain_id: 'domain-id',
  path: 'CustomerSupport',
  retrieval_settings: { retrieval_type: 'hybrid_search', top_k: 5, threshold: 0 },
  updated_by_id: null,
  updated: '2025-10-29T10:44:10.011Z'
}
Make sure to save the Knowledge ID _id as YOUR_KNOWLEDGE_ID in the .env file.
echo 'YOUR_KNOWLEDGE_ID=$YOUR_KNOWLEDGE_ID' >> .env
If you want to complete this step with a GUI see Create a Knowledge
6

Add files to the Knowledge Base

Inside the main repository create documents directory and put the documents that you want to upload there. Orq.ai supports document types such as pdf, txt, docx, csv, xls - 10mb max.Run the following code to upload the documents:
customer-support.ts
import 'dotenv/config';
import { Orq } from '@orq-ai/node';
import fs from 'fs';
import path from 'path';
import { fileURLToPath } from 'url';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

const orq = new Orq({
  apiKey: process.env.ORQ_API_KEY!
});

const filePath = path.join(__dirname, 'documents', 'CustomerSupportDoc.pdf');

orq.files.create({
  file: new File([fs.readFileSync(filePath)], 'CustomerSupportDoc.pdf', {
    type: 'application/pdf'
  }),
  purpose: 'retrieval'
})
  .then((data) => console.log(data))
  .catch(err => console.error(err));
This is how a successful response should look like:
{
  _id: '$FILE_ID',
  object_name: 'files-api/workspaces/workspace-id/retrieval/$FILE_ID.pdf',
  purpose: 'retrieval',
  file_name: '$FILE_ID.pdf',
  workspace_id: 'workspace-id',
  bytes: 118199,
  created: '2025-10-29T11:22:56.732Z'
}
Add the file id _id to the .env file:
echo 'FILE_ID=$FILE_ID' >> .env
If you want to do this step with a GUI see: Create file
7

Connect the files with the Knowledge Base as datasource

import 'dotenv/config';
import { Orq } from '@orq-ai/node';

const orq = new Orq({ apiKey: process.env.ORQ_API_KEY! });

// Create datasource and search functions
const createDatasource = () => orq.knowledge.createDatasource({
  knowledgeId: process.env.YOUR_KNOWLEDGE_ID!,
  requestBody: { fileId: process.env.FILE_ID!, displayName: 'CustomerSupportDocs' }
});

const searchKnowledge = (question: string) => orq.knowledge.search({
  knowledgeId: process.env.YOUR_KNOWLEDGE_ID!,
  requestBody: { query: question, topK: 5 }
});

// Execute
createDatasource()
  .then(result => console.log('Datasource created successfully:', result))
  .catch(console.error);

export { createDatasource, searchKnowledge };
This is how a successful response looks like:
{
  _id: '$YOUR_KNOWLEDGE_ID',
  display_name: 'CustomerSupportDocs',
  file_id: '$FILE_ID',
  knowledge_id: '$YOUR_KNOWLEDGE_ID',
  status: 'queued',
  created: '2025-10-29T11:36:43.916Z',
  updated: '2025-10-29T11:36:43.916Z',
  created_by_id: null,
  update_by_id: null,
  chunks_count: 0
}
Add YOUR_KNOWLEDGE_IDto the .env
echo "YOUR_KNOWLEDGE_ID=$YOUR_KNOWLEDGE_ID" >> .env
Now, you will be able to see the uploaded file under your Knowledge Base:Your knowledge IDTo do this step with GUI check Creating a new datasourceWhen you upload documents to a Knowledge Base, Orq.ai breaks them down into smaller pieces of text called chunks. Think of it like dividing a book into manageable paragraphs or sections rather than trying to process the entire book at once.ChunksThis is the customer support chat with connected Knowledge Base:
import 'dotenv/config';
import OpenAI from 'openai';

interface OrqConfig {
  retry?: { count: number; on_codes: number[] };
  fallbacks?: Array<{ model: string }>;
  cache?: { type: 'exact_match'; ttl: number };
  knowledge_bases?: Array<{
    knowledge_id: string;
    top_k: number;
    threshold: number;
    search_type: 'hybrid_search';
  }>;
}

// Initialize the OpenAI client
const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY ?? '',
  baseURL: 'https://api.orq.ai/v2/router',
});

async function main(): Promise<void> {
  try {
    const requestParams = {
      model: 'openai/gpt-4o',
      stream: true,
      messages: [
        { role: 'user' as const, content: 'What are the best practices for customer support?' },
      ],
      orq: {
        retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
        fallbacks: [
          { model: 'anthropic/claude-3-5-sonnet-20241022' },
          { model: 'openai/gpt-4o-mini' },
        ],
        cache: { type: 'exact_match' as const, ttl: 3600 },
        knowledge_bases: [
          {
            knowledge_id: process.env.YOUR_KNOWLEDGE_ID!,
            top_k: 5,
            threshold: 0.7,
            search_type: 'hybrid_search' as const,
          },
        ],
      },
    };

    console.log('Request:', JSON.stringify(requestParams, null, 2));

    const start = Date.now();
    const stream = await client.chat.completions.create(
      requestParams as OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming
    );

    let chunkCount = 0;
    for await (const chunk of stream) {
      chunkCount++;
      const content = chunk.choices[0]?.delta?.content ?? '';
      process.stdout.write(content);
    }

    console.log(`\n\nTime taken: ${Date.now() - start}ms, Chunks: ${chunkCount}`);
    console.log('Cache status: First run is always a cache miss; run again to check for hit.');
  } catch (error: unknown) {
    console.error('Error:', error instanceof Error ? error.message : String(error));
  }
}

main();
Once you run the code you will be able to see the knowledge base retrieval on the Orq.ai dashboardTraces
8

Contact Tracking

When to use:
  • You want to identify and remember the user between chats or sessions.
  • You need to audit who asked what (e.g., Alice Smith asked about “refunds”).
  • You’re building user profiles, dashboards, or integrating with a CRM (e.g., Salesforce, HubSpot).
  • If your application involves external b2b clients and you want to monitor how many calls your client and at what cost is doing to your application
For more details see Contact TrackingIf you are prototyping with cURL paste the code snipped with YOUR_API_KEY, YOUR_CONTACT_ID and YOUR_DEPLOYMENT_KEY variables:
import 'dotenv/config';
import OpenAI from 'openai';

// Define the custom `orq` interface for TypeScript
interface OrqConfig {
  retry?: { count: number; on_codes: number[] };
  fallbacks?: Array<{ model: string }>;
  cache?: { enabled: boolean; type: 'exact_match'; ttl: number };
  knowledge_bases?: Array<{
    knowledge_id: string;
    top_k: number;
    threshold: number;
    search_type: 'hybrid_search';
  }>;
  contact?: {
    id: string;
    display_name?: string;
    email?: string;
    metadata?: Array<{ key: string; value: any }>; // Array of key-value pairs
    tags?: string[];
  };
}

// Initialize the OpenAI client
const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY ?? '',
  baseURL: 'https://api.orq.ai/v2/router',
});

async function main(): Promise<void> {
  try {
    if (!process.env.YOUR_KNOWLEDGE_ID) {
      throw new Error('YOUR_KNOWLEDGE_ID not set in .env');
    }
    const requestParams = {
      model: 'openai/gpt-4o',
      stream: true,
      messages: [
        { role: 'user' as const, content: 'How do I upgrade my account?' },
      ],
      orq: {
        retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
        fallbacks: [
          { model: 'anthropic/claude-3-5-sonnet-20241022' },
          { model: 'google/gemini-1.5-pro' },
          { model: 'openai/gpt-4o-mini' },
        ],
        cache: { enabled: true, type: 'exact_match', ttl: 3600 },
        knowledge_bases: [
          {
            knowledge_id: process.env.YOUR_KNOWLEDGE_ID, // e.g., ID for "ORQsupport"
            top_k: 5,
            threshold: 0.7,
            search_type: 'hybrid_search',
          },
        ],
        contact: {
          id: 'support-TICKET-789', // Unique ticket ID
          display_name: 'John Smith',
          email: 'john@company.com',
          metadata: [
            { key: 'ticket_id', value: 'TICKET-789' },
            { key: 'customer_tier', value: 'premium' },
            { key: 'issue_category', value: 'billing' },
            { key: 'created_at', value: new Date().toISOString() },
          ],
          tags: ['support', 'billing-issue', 'premium-user'],
        },
      },
    };
    console.log('Request:', JSON.stringify(requestParams, null, 2));

    const start = Date.now();
    const stream = await client.chat.completions.create(
      requestParams as OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming
    );

    let responseText = '';
    let chunkCount = 0;
    for await (const chunk of stream) {
      chunkCount++;
      const content = chunk.choices[0]?.delta?.content ?? '';
      responseText += content;
      process.stdout.write(content);
    }

    console.log(`\nTime taken: ${Date.now() - start}ms, Chunks: ${chunkCount}`);
    console.log('Full Response:', responseText);
    console.log('Cache status: First run is always a cache miss; run again to check for hit.');
  } catch (error: unknown) {
    console.error('Error:', error instanceof Error ? error.message : String(error));
  }
}

main();
Once your code snippet runs successfully you will be able to see under Contact Analytics the number of requests that the contact you selected sent and control the budgetControl the budget
9

Thread tracking

When to use:
  • Understand the back-and-forth between the user and the assistant
  • Track context drift in long conversations
  • Make sense of multi-step conversations at a glance
To enable contact tracing try this version of Customer Support app. To learn more see Threads
import 'dotenv/config';
import OpenAI from 'openai';

// Define the custom `orq` interface for TypeScript
interface OrqConfig {
  retry?: { count: number; on_codes: number[] };
  fallbacks?: Array<{ model: string }>;
  cache?: { enabled: boolean; type: 'exact_match'; ttl: number };
  knowledge_bases?: Array<{
    knowledge_id: string;
    top_k: number;
    threshold: number;
    search_type: 'hybrid_search';
  }>;
  contact?: {
    id: string;
    display_name?: string;
    email?: string;
    metadata?: Array<{ key: string; value: any }>;
    tags?: string[];
  };
  thread?: {
    id: string;
    tags?: string[];
  };
}

// Initialize the OpenAI client
const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY ?? '',
  baseURL: 'https://api.orq.ai/v2/router',
});

async function main(): Promise<void> {
  try {
    if (!process.env.YOUR_KNOWLEDGE_ID) {
      throw new Error('YOUR_KNOWLEDGE_ID not set in .env');
    }
    const ticketId = 'TICKET-789';
    const threadId = `support-${ticketId}-${Date.now()}`; // Unique thread ID
    const requestParams = {
      model: 'openai/gpt-4o',
      stream: true,
      messages: [
        { role: 'user' as const, content: 'How do I upgrade my account?' },
      ],
      orq: {
        retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
    fallbacks: [
      { model: 'openai/gpt-4o-mini' },
      { model: 'anthropic/claude-3-5-sonnet-20241022' },
      { model: 'google/gemini-1.5-pro' },
    ],
        cache: { enabled: true, type: 'exact_match', ttl: 3600 },
        knowledge_bases: [
          {
            knowledge_id: process.env.YOUR_KNOWLEDGE_ID, // e.g., ID for "ORQsupport"
            top_k: 5,
            threshold: 0.7,
            search_type: 'hybrid_search',
          },
        ],
        contact: {
          id: `support-${ticketId}`,
          display_name: 'John Smith',
          email: 'john@company.com',
          metadata: [
            { key: 'ticket_id', value: ticketId },
            { key: 'customer_tier', value: 'premium' },
            { key: 'issue_category', value: 'billing' },
            { key: 'created_at', value: new Date().toISOString() },
          ],
          tags: ['support', 'billing-issue', 'premium-user'],
        },
        thread: {
          id: threadId,
          tags: ['support', 'billing', 'user-interaction'],
        },
      },
    };
    console.log('Request:', JSON.stringify(requestParams, null, 2));

    const start = Date.now();
    const stream = await client.chat.completions.create(
      requestParams as OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming
    );

    let responseText = '';
    let chunkCount = 0;
    for await (const chunk of stream) {
      chunkCount++;
      const content = chunk.choices[0]?.delta?.content ?? '';
      responseText += content;
      process.stdout.write(content);
    }

    console.log(`\nTime taken: ${Date.now() - start}ms, Chunks: ${chunkCount}`);
    console.log('Full Response:', responseText);
    console.log('Cache status: First run is always a cache miss; run again to check for hit.');
    console.log(`Thread ID: ${threadId}, Contact ID: support-${ticketId}`);
  } catch (error: unknown) {
    console.error('Error:', error instanceof Error ? error.message : String(error));
  }
}

main();
Once your code snippet runs successfully you will be able to see under Traces —> Threads and see a detailed break-down of your API callTraces 2If you send a request again and you will use the same thread.id support-TICKET-789-<timestamp>) for both initial and follow-up requests to group them in the same thread:Streamed answer
10

Dynamic Inputs

When to use:
import 'dotenv/config';
import OpenAI from 'openai';
import * as readline from 'readline/promises';
import { stdin as input, stdout as output } from 'process';

// Define the custom `orq` interface for TypeScript
interface OrqConfig {
  retry?: { count: number; on_codes: number[] };
  fallbacks?: Array<{ model: string }>;
  cache?: { enabled: boolean; type: 'exact_match'; ttl: number };
  knowledge_bases?: Array<{
    knowledge_id: string;
    top_k: number;
    threshold: number;
    search_type: 'hybrid_search';
  }>;
  contact?: {
    id: string;
    display_name?: string;
    email?: string;
    metadata?: Array<{ key: string; value: any }>;
    tags?: string[];
  };
  thread?: {
    id: string;
    tags?: string[];
  };
}

// Initialize the OpenAI client
const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY ?? '',
  baseURL: 'https://api.orq.ai/v2/router',
});

// Initialize readline for dynamic input
const rl = readline.createInterface({ input, output });

// Base configuration
const ticketId = 'TICKET-789';
const threadId = `support-${ticketId}-${Date.now()}`; // Unique thread ID
const contactId = `support-${ticketId}`;
const baseParams = {
  model: 'openai/gpt-4o',
  stream: true,
  orq: {
    retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
    fallbacks: [
      { model: 'openai/gpt-4o-mini' },
      { model: 'anthropic/claude-3-5-sonnet-20241022' },
      { model: 'google/gemini-1.5-pro' },
    ],
    cache: { enabled: true, type: 'exact_match', ttl: 3600 },
    knowledge_bases: [
      {
        knowledge_id: process.env.YOUR_KNOWLEDGE_ID ?? '',
        top_k: 5,
        threshold: 0.7,
        search_type: 'hybrid_search',
      },
    ],
    contact: {
      id: contactId,
      display_name: 'John Smith',
      email: 'john@company.com',
      metadata: [
        { key: 'ticket_id', value: ticketId },
        { key: 'customer_tier', value: 'premium' },
        { key: 'issue_category', value: 'billing' },
        { key: 'created_at', value: new Date().toISOString() },
      ],
      tags: ['support', 'billing-issue', 'premium-user'],
    },
    thread: {
      id: threadId,
      tags: ['support', 'billing', 'user-interaction'],
    },
  },
};

async function sendRequest(
  params: OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming,
  requestLabel: string
): Promise<string> {
  console.log(`\n--- ${requestLabel} ---`);
  console.log('Request:', JSON.stringify(params, null, 2));
  const start = Date.now();
  const stream = await client.chat.completions.create(params);
  let responseText = '';
  let chunkCount = 0;

  for await (const chunk of stream) {
    chunkCount++;
    const content = chunk.choices[0]?.delta?.content ?? '';
    responseText += content;
    process.stdout.write(content);
  }

  console.log(`\nTime taken: ${Date.now() - start}ms, Chunks: ${chunkCount}`);
  console.log('Full Response:', responseText);
  console.log(`Thread ID: ${threadId}, Contact ID: ${contactId}`);
  return responseText;
}

async function main(): Promise<void> {
  try {
    if (!process.env.YOUR_KNOWLEDGE_ID) {
      throw new Error('YOUR_KNOWLEDGE_ID not set in .env');
    }

    // Store conversation history
    const conversationHistory: Array<{ role: 'user' | 'assistant'; content: string }> = [];

    // First dynamic input
    let userInput = await rl.question('Enter your first question (e.g., "How do I upgrade my account?"): ');
    if (!userInput.trim()) {
      throw new Error('First input cannot be empty');
    }

    const initialParams = {
      ...baseParams,
      messages: [{ role: 'user' as const, content: userInput }],
    };
    const initialResponse = await sendRequest(initialParams, 'First Request');
    conversationHistory.push(
      { role: 'user', content: userInput },
      { role: 'assistant', content: initialResponse }
    );
    console.log('Cache status: First run is always a cache miss; run again to check for hit.');

    // Second dynamic input
    userInput = await rl.question('Enter your follow-up question (e.g., "I didn’t receive the confirmation email"): ');
    if (!userInput.trim()) {
      throw new Error('Follow-up input cannot be empty');
    }

    const followUpParams = {
      ...baseParams,
      messages: [...conversationHistory, { role: 'user' as const, content: userInput }],
      orq: {
        ...baseParams.orq,
        thread: {
          id: threadId, // Same thread ID
          tags: ['support', 'billing', 'user-interaction', 'follow-up'],
        },
      },
    };
    const followUpResponse = await sendRequest(followUpParams, 'Follow-up Request');
    conversationHistory.push(
      { role: 'user', content: userInput },
      { role: 'assistant', content: followUpResponse }
    );
    console.log('Cache status: Check if cached (if messages match previous run).');

  } catch (error: unknown) {
    console.error('Error:', error instanceof Error ? error.message : String(error));
  } finally {
    rl.close();
  }
}

main();

Advanced framework integrations

Orq.ai’s AI Gateway seamlessly integrates with popular AI development frameworks, allowing you to leverage existing tools and workflows while benefiting from gateway features like fallbacks, caching, and observability.

LangChain Integration

Orq.ai works natively with LangChain by simply pointing to the gateway endpoint. This gives you access to fallback models, caching, and knowledge base retrieval while using LangChain’s abstractions. For more detailed guide see LangChain integration
import { ChatOpenAI } from "@langchain/openai";

// Configure LangChain to use Orq.ai gateway
const llm = new ChatOpenAI({
  configuration: {
    baseURL: "https://api.orq.ai/v2/proxy",
  },
  openAIApiKey: "YOUR_API_KEY",
  modelName: "openai/gpt-4o",
});

const response = await llm.invoke("How do I reset my password?");

DSPy

DSPy programs can route through Orq.ai to gain automatic prompt optimization alongside gateway reliability features. For more detailed guide see DSPy Integration
import * as dspy from "dspy-ai";

// Configure DSPy with Orq.ai gateway
const lm = new dspy.OpenAI({
  apiBase: "https://api.orq.ai/v2/proxy",
  apiKey: "YOUR_API_KEY",
  model: "openai/gpt-4o"
});

dspy.settings.configure({ lm: lm });

Base URL configuration

# Orq.ai Cloud (default)
https://api.orq.ai/v2/proxy

# Your on-premises deployment
https://your-domain.com/v2/proxy

Conclusion

Orq.ai’s AI Gateway provides a unified, scalable, and production-ready solution for building reliable AI applications. By routing through a single API endpoint, you gain:
  1. Unified access: Connect to multiple AI providers (OpenAI, Anthropic, AWS) through one API
  2. High availability: Automatic fallbacks and retries ensure your application stays online
  3. Cost efficiency: Response caching reduces API costs and latency
  4. Smart context: Built-in knowledge base integration for domain-specific answers
  5. Production observability: Comprehensive traces and OTEL compatibility for monitoring
  6. Flexible deployment: Cloud, on-premises, or edge options to meet your needs
  7. High availability: Automatic fallbacks and retries ensure your application stays online
  8. **Cost efficiency **: Response caching reduces API costs and latency
  9. Smart context : Built-in knowledge base integration for domain-specific answers
  10. Production observability : Comprehensive traces and OTEL compatibility for monitoring
  11. Flexible deployment: Cloud, on-premises, or edge options to meet your needs