> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Build a customer support chatbot

> Build a production-ready customer support chatbot with streaming, fallbacks, caching, and RAG. Complete Node.js tutorial with knowledge base integration.

<Card title="TL;DR">
  * Learn how to use Orq AI Gateway
  * Connect primary & fallback AI providers to avoid vendor lock-in
  * Enable streaming for real-time responses and better UX
  * Add a knowledge base with your docs for contextual answers
  * Set up caching for recurring requests
  * Build a production-ready customer support agent in minutes
</Card>

## What we are going to build?

You will build a customer support application in Node.js using AI Gateway, where the support queries have access to the relevant business context from a knowledge base. The system will include a primary model (GPT-4o) and a fallback model (Claude Sonnet) that automatically activates during rate limits or outages.

You'll also learn to implement caching for user queries, identity tracing to monitor per-user LLM request volumes, and thread tracking to visualize complete conversation flows between users and the assistant.

```mermaid theme={"theme":{"light":"github-light","dark":"github-dark"}}
graph LR
    A[User Query] --> B[Orq AI Gateway]
    
    B --> C{Cache Enabled?}
    C -->|Yes - Cache Hit| D[Return Cached Response]
    C -->|Yes - Cache Miss| E[Search Knowledge Base]
    C -->|No| E
    
    E --> F[Enrich Query with Context]
    
    F --> G{Select Model}
    
    G -->|Primary Available| H[OpenAI GPT-4o]
    G -->|Rate Limit/Outage| I[Claude Sonnet Fallback]
    
    H -->|Error/Unavailable| I
    H -->|Success| J[Generate Response]
    I --> J
    
    J --> A
    D --> A
    
    B -.->|Track| K[Identity Tracing]
    B -.->|Track| L[Thread Tracking]
    
    style B fill:#e1f5ff
    style E fill:#f0e1ff
    style J fill:#e1ffe1
    style C fill:#fff4e1
```

## What is AI gateway?

**AI Gateway** is a **single unified API endpoint** that lets you seamlessly route and manage requests across multiple AI model providers (e.g., OpenAI, Anthropic, Google, AWS). This functionality comes in handy, when you want to:

* Avoid dependency on a single provider (vendor lock-in)
* Automatically switch between providers in case of an outage
* Scale reliably when the usage surges

## Build the customer support chat

<Steps>
  <Step title="Set up the Node.js project">
    Inside your IDE of choice set up the Node.js project, in this tutorial we will use npm package manager, feel free to use alternatives such as pnpm.

    ```
    npm init -y
    ```

    Install [Orq.ai](http://Orq.ai) SDK

    ```
    npm add @orq-ai/node
    ```

    Install the OpenAI SDK

    ```
    npm install openai
    ```

    Install TypeScript dependencies

    ```
    npm install -D typescript @types/node tsx
    ```

    Set up your API keys

    ```
    npm install dotenv
    ```

    First, inside Orq dashboard create a project that we can assign API keys to by clicking the + button next to Project menu:

    <img src="https://mintcdn.com/orqai/ZxxBGRboNk4uKU_4/images/1.png?fit=max&auto=format&n=ZxxBGRboNk4uKU_4&q=85&s=b0f53fdc73ea50800059cbff6483d220" alt="Add project" title="Add project" style={{ width:"28%" }} width="1076" height="1320" data-path="images/1.png" />

    Create a new project named `CustomerSupport`

    <img src="https://mintcdn.com/orqai/MbTprtvL0twtWLxU/images/Screenshot2025-11-06at14.01.50.png?fit=max&auto=format&n=MbTprtvL0twtWLxU&q=85&s=9f187945fd70765ba93506036a568aae" alt="CustomerSupport" title="CustomerSupport" style={{ width:"62%" }} width="1800" height="1130" data-path="images/Screenshot2025-11-06at14.01.50.png" />

    To find the Orq API key navigate to [Orq.ai](http://Orq.ai) dashboard and paste your API keys inside the pop-up window that appears after you click the Setup your own API key button

    1. Workspace settings
    2. API Keys
    3. Copy your key

           <img src="https://mintcdn.com/orqai/KmOi5q6C7zhrRGSN/images/Screenshot2025-11-06at13.37.11.png?fit=max&auto=format&n=KmOi5q6C7zhrRGSN&q=85&s=95db07f739952af349e4793dd7f49351" alt="How to find API Key in Orq" width="2638" height="1112" data-path="images/Screenshot2025-11-06at13.37.11.png" />

    From the drop down you can select the CustomerSupport project to assign the API key to:

    <img src="https://mintcdn.com/orqai/MbTprtvL0twtWLxU/images/apiassign.png?fit=max&auto=format&n=MbTprtvL0twtWLxU&q=85&s=96c1353fb5a16a302f792602b8943cdf" alt="Add API Key" title="Add API Key" style={{ width:"45%" }} width="824" height="496" data-path="images/apiassign.png" />

    Create .env file. This is where you will paste your Orq API keys from the step above

    ```
    echo "ORQ_API_KEY=your-orq-api-key-here" > .env
    ```

    Add `.env` to your `.gitignore`

    ```
    echo ".env" >> .gitignore
    ```

    Create the `customer-support.ts` file with a Hello World example:

    ```typescript customer-support.ts theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import 'dotenv/config';
    import OpenAI from 'openai';

    const client = new OpenAI({
      apiKey: process.env.ORQ_API_KEY,
      baseURL: 'https://api.orq.ai/v3/router'
    });

    async function main() {
      const response = await client.chat.completions.create({
        model: 'openai/gpt-4o',
        messages: [
          {
            role: 'user',
            content: 'Hello, world!'
          }
        ]
      });

      console.log(response.choices[0].message.content);
    }

    main().catch(console.error);
    ```

    To execute the file from the terminal run:

    ```
    npx tsx customer-support.ts 
    ```

    <img src="https://mintcdn.com/orqai/KmOi5q6C7zhrRGSN/images/Screenshot2025-11-06at13.40.04.png?fit=max&auto=format&n=KmOi5q6C7zhrRGSN&q=85&s=12d89718b0e7a56aa24ccbcbbb99d372" alt="Hello world " width="1170" height="128" data-path="images/Screenshot2025-11-06at13.40.04.png" />
  </Step>

  <Step title="Streaming data in real time">
    In this step we will use OpenAI `gpt-4o` model to generate the responses. To connect any other model such as `claude-3-5-sonnet` follow the same steps. To enable models in Orq Ai Gateway :

    1. Navigate to Integrations
    2. Select OpenAI
    3. Click on View integration

    <img src="https://mintcdn.com/orqai/KmOi5q6C7zhrRGSN/images/Screenshot2025-11-06at13.43.19.png?fit=max&auto=format&n=KmOi5q6C7zhrRGSN&q=85&s=6515945dfa29359913c887e4f3e53f34" alt="Streaming data" title="Streaming data" style={{ width:"85%" }} width="1380" height="1450" data-path="images/Screenshot2025-11-06at13.43.19.png" />

    Click on Setup your own API key

    <img src="https://mintcdn.com/orqai/KmOi5q6C7zhrRGSN/images/Screenshot2025-11-06at13.43.56.png?fit=max&auto=format&n=KmOi5q6C7zhrRGSN&q=85&s=513cf8a7a48a304cd4ea6ff7dfc0252d" alt="Set up API" title="Set up API" style={{ width:"84%" }} width="1398" height="880" data-path="images/Screenshot2025-11-06at13.43.56.png" />

    Log in to  [OpenAI's API platform](https://openai.com/) and copy your secret key:

    <img src="https://mintcdn.com/orqai/KmOi5q6C7zhrRGSN/images/Screenshot2025-11-06at13.46.17.png?fit=max&auto=format&n=KmOi5q6C7zhrRGSN&q=85&s=c992a7911dbfc6807b764682b7ca7924" alt="OpenAI" width="2290" height="384" data-path="images/Screenshot2025-11-06at13.46.17.png" />

    Navigate back to [Orq.ai](http://Orq.ai) dashboard and paste your API keys inside the pop-up window that appears after you click the Setup your own API key button

    <img src="https://mintcdn.com/orqai/KmOi5q6C7zhrRGSN/images/Screenshot2025-11-06at13.47.03.png?fit=max&auto=format&n=KmOi5q6C7zhrRGSN&q=85&s=d834ba2ed2c598dac0478a5efb3c4dc2" alt="OpenAI setup" title="OpenAI setup" style={{ width:"85%" }} width="1442" height="868" data-path="images/Screenshot2025-11-06at13.47.03.png" />

    By default, when you make a POST request, the connection remains open until the entire response is ready, and then it closes.

    However, when you use streaming, the API switches to a Server-Sent Events (SSE) connection. This keeps the HTTP connection open and sends the response in small, real-time chunks as the data becomes available and is essential for real-time customer chat interactions.

    ```typescript customer-support.ts theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import 'dotenv/config';
    import { OpenAI } from 'openai';

    // Use OpenAI SDK with Orq AI Gateway proxy
    const client = new OpenAI({
      baseURL: "https://api.orq.ai/v3/router",
      apiKey: process.env.ORQ_API_KEY ?? '',
    });

    async function main() {
      try {
        console.log('--- Streaming started ---');

        let stream: any;
        try {
          // Use OpenAI SDK with Orq router for streaming
          stream = await client.chat.completions.create({
            model: 'openai/gpt-4o', // Use provider/model format
            messages: [{
              role: 'user',
              content: 'What are chunks in AI?'
            }],
            stream: true
          });

          console.log('Stream established successfully');
        } catch (e: any) {
          // Fallback for non-streaming
          console.log('Stream not available, falling back to non-streaming response');
          console.log('Error:', e?.message || e);

          const resp = await client.chat.completions.create({
            model: 'openai/gpt-4o',
            messages: [{
              role: 'user',
              content: 'What are chunks in AI?'
            }],
            stream: false
          });

          const content = resp.choices?.[0]?.message?.content ?? '';
          if (content) {
            process.stdout.write(String(content));
            console.log('\n--- Streaming finished ---');
            return;
          }
          console.log('\n(No content)');
          return;
        }

        // Iterate async chunks - router uses OpenAI-compatible format
        for await (const chunk of stream as any) {
          const content = chunk?.choices?.[0]?.delta?.content ?? '';

          if (content) {
            process.stdout.write(content);
          }

          if (process.env.VERBOSE_STREAM === 'true') {
            console.log('\n[chunk]', JSON.stringify(chunk, null, 2));
          }
        }

        console.log('\n--- Streaming finished ---');
      } catch (err: any) {
        console.error('Error:', err.message ?? err);
      }
    }

    main();
    ```

    Streaming is ideal for applications where you want to display text to users as it’s generated, such as in chat interfaces or live assistants, improving perceived responsiveness:

    <iframe src="https://drive.google.com/file/d/14_WpMA5a5IHKNCr93FVTDwneHHbHS18O/preview" width="650" height="235" allow="autoplay" frameborder="0" allowfullscreen />
  </Step>

  <Step title="Retries & fallbacks">
    [Orq.ai](http://Orq.ai) allows automatic fallback to alternative models if the primary fails. If `gpt-4o` hits a rate limit or downtime, the request automatically retries and may fall back to Anthropic `claude-3-5-sonnet` or `gpt 4o mini.` Make sure that you have the models enabled in Orq.

    ```typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import 'dotenv/config';
    import OpenAI from 'openai';
    import type { Stream } from 'openai/streaming';
    import type { ChatCompletionChunk } from 'openai/resources/chat/completions';

    const client = new OpenAI({
      apiKey: process.env.ORQ_API_KEY!,
      baseURL: 'https://api.orq.ai/v3/router',
    });

    async function main() {
      const stream = await client.chat.completions.create({
        model: 'openai/gpt-4o',
        stream: true,
        messages: [
          { role: 'user', content: 'Explain what Streaming in Orq.ai is?' },
        ],

        orq: {
          retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
          fallbacks: [
            { model: 'openai/gpt-4o-mini' },
            { model: 'anthropic/claude-3-5-sonnet-20241022' },
          ],
        },
      });

      for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content || '';
        process.stdout.write(content);
      }
      console.log('\n');
    }

    main().catch(console.error);
    ```
  </Step>

  <Step title="Caching">
    [Orq.ai](http://Orq.ai) supports response caching to reduce latency and API usage for repeated requests. It uses `exact_match` caching, where the cache key is generated from the exact model, messages, and all parameters, ensuring identical requests hit the cache. The TTL (time-to-live) specifies how long the response is cached (e.g., 3600 seconds for 1 hour, max 86400 seconds). Below is a TypeScript implementation with caching, retries, and fallbacks:

    ```typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import 'dotenv/config';
    import OpenAI from 'openai';

    interface OrqConfig {
      retry?: {
        count: number;
        on_codes: number[];
      };
      fallbacks?: Array<{ model: string }>;
      cache?: {
        type: 'exact_match';
        ttl: number;
      };
    }

    const client = new OpenAI({
      apiKey: process.env.ORQ_API_KEY ?? '',
      baseURL: 'https://api.orq.ai/v3/router',
    });

    async function main(): Promise<void> {
      try {
        const params = {
          model: 'openai/gpt-4o',
          stream: true as const,
          messages: [
            {
              role: 'user' as const,
              content: 'Explain what Streaming in Orq.ai is?',
            },
          ],
          orq: {
            retry: {
              count: 3,
              on_codes: [429, 500, 502, 503, 504],
            },
            fallbacks: [
              { model: 'anthropic/claude-3-5-sonnet-20241022' },
              { model: 'openai/gpt-4o-mini' },
            ],
            cache: {
              type: 'exact_match' as const,
              ttl: 3600, // 1 hour
            },
          },
        };

        const stream = await client.chat.completions.create(
          params as OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming
        );

        for await (const chunk of stream) {
          const content = chunk.choices[0]?.delta?.content ?? '';
          process.stdout.write(content);
        }
        console.log('\n');
      } catch (error: unknown) {
        console.error('Error:', error instanceof Error ? error.message : String(error));
      }
    }

    main();
    ```

    First time, when you run the code request inside Traces you will see `cache-miss`

    <img src="https://mintcdn.com/orqai/MbTprtvL0twtWLxU/images/Screenshot2025-11-06at13.52.23.png?fit=max&auto=format&n=MbTprtvL0twtWLxU&q=85&s=09715cff569f6e44886e071bb914f0a5" alt="Cache miss" width="2776" height="330" data-path="images/Screenshot2025-11-06at13.52.23.png" />

    Your cache is stored after you run the command for the first time. The reason why you see `cache-miss` the first time is because [Orq.ai](http://Orq.ai) has no prior response stored for that exact cache key and the cache is initially empty for that key. You can read more about cache [here ](https://docs.orq.ai/docs/ai-gateway-cache#/)

    When you run your request for the second time within the TTL inside Traces you will see `cache-hit`, meaning that [Orq.ai](http://Orq.ai) retrieved successfully the cached response.

    <img src="https://mintcdn.com/orqai/MbTprtvL0twtWLxU/images/Screenshot2025-11-06at13.54.09.png?fit=max&auto=format&n=MbTprtvL0twtWLxU&q=85&s=73998cfb662f5e952662a07bf222428b" alt="Cache hit" width="2398" height="282" data-path="images/Screenshot2025-11-06at13.54.09.png" />
  </Step>

  <Step title="Knowledge Base">
    <Info>
      **When to use**:

      * When you want to enhance a foundational model's responses with custom, domain-specific knowledge using Retrieval-Augmented Generation (RAG).
      * [Orq.ai](http://Orq.ai)'s built-in RAG feature enables you to create a Knowledge Base with your documents (e.g., FAQs, manuals, or PDFs)
      * When you want to add a Vector Database (e.g., Pinecone, Qdrant) for control over embeddings and retrieval. For more see [Using Vector databases with Orq ](https://docs.orq.ai/docs/using-thirdparty-vectordbs-with-orq#/)
    </Info>

    Knowledge Base inside [Orq.ai](http://Orq.ai) support the following file types: pdf, txt, docx, csv, xls - 10mb max. Encrypted files are not supported.

    When you create a new Knowledge Base you have the control over the following variables:

    | `embedding_model` | You can select the `embedding_model` from [supported models](https://docs.orq.ai/docs/ai-gateway-supported-models#/), which is a family of models that converts your input data (text, images etc.) into a vector embeddings (e.g.`text-embedding-3-large`) |
    | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | `path`            | Project name (e.g. `CustomerSupport`)                                                                                                                                                                                                                       |
    | `key`             | Come up with a unique key for your Knowledge Base (e.g. `Customer`)                                                                                                                                                                                         |
    | `top_k`           | Defines the maximum number of relevant chunks to retrieve from the Knowledge Base (e.g., `top_k: 5` retrieves up to 5 chunks)                                                                                                                               |
    | `threshold`       | Sets the minimum relevance score (0.0 to 1.0) for retrieved chunks (e.g., threshold: 0.7 filters chunks with scores below 0.7)                                                                                                                              |
    | `search_type`     | Specifies the search method for retrieving chunks ( e.g. `hybrid_search` combines keyword and semantic search)                                                                                                                                              |

    Run the code to create a Knowledge Base:

    ```typescript customer-support.ts theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import 'dotenv/config';
    import { Orq } from '@orq-ai/node';

    const orq = new Orq({
      apiKey: process.env.ORQ_API_KEY!,
    });

    async function createCustomerSupportKnowledge() {
      try {
        const result = await orq.knowledge.create({
          embeddingModel: 'text-embedding-3-large',
          path: 'CustomerSupport',     // Name of your project 
          key: 'Customer',             // Needs to be a unique key
          topK: 5,                     // Maximum number of relevant chunks to retrieve
          threshold: 0.7,              // Minimum relevance score (0.0 to 1.0)
          searchType: 'hybrid_search'  // Search method: 'hybrid_search', 'semantic', or 'keyword'
        });

        console.log('Knowledge base created successfully:', result);
        return result;
      } catch (error: any) {
        if (error.statusCode === 400 && error.body?.includes('already exists')) {
          console.log('Knowledge base "Customer" already exists. Retrieving existing knowledge base...');

          try {
            // Try to get the existing knowledge base
            const existing = await orq.knowledge.get({ key: 'Customer' });
            console.log('Using existing knowledge base:', existing);
            return existing;
          } catch (getError) {
            console.log('Could not retrieve existing knowledge base.');
            return { key: 'Customer', status: 'exists' };
          }
        }

        console.error('Error creating knowledge base:', error);
        throw error;
      }
    }

    createCustomerSupportKnowledge();
    ```

    This is how a successful response should look like:

    ```

    {
      _id: '$YOUR_KNOWLEDGE_ID',
      created: '2025-10-29T10:44:10.011Z',
      created_by_id: null,
      key: 'Customer',
      model: 'openai/text-embedding-3-large',
      domain_id: 'domain-id',
      path: 'CustomerSupport',
      retrieval_settings: { retrieval_type: 'hybrid_search', top_k: 5, threshold: 0 },
      updated_by_id: null,
      updated: '2025-10-29T10:44:10.011Z'
    }
    ```

    Make sure to save the Knowledge ID `_id` as `YOUR_KNOWLEDGE_ID` in the `.env` file.

    ```
    echo 'YOUR_KNOWLEDGE_ID=$YOUR_KNOWLEDGE_ID' >> .env
    ```

    If you want to complete this step with a GUI see [Create a Knowledge](https://docs.orq.ai/reference/knowledge-bases/create-a-knowledge)
  </Step>

  <Step title="Add files to the Knowledge Base">
    Inside the main repository create `documents` directory and put the documents that you want to upload there. [Orq.ai](http://Orq.ai) supports document types such as pdf, txt, docx, csv, xls - 10mb max.

    Run the following code to upload the documents:

    ```typescript customer-support.ts theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import 'dotenv/config';
    import { Orq } from '@orq-ai/node';
    import fs from 'fs';
    import path from 'path';
    import { fileURLToPath } from 'url';

    const __filename = fileURLToPath(import.meta.url);
    const __dirname = path.dirname(__filename);

    const orq = new Orq({
      apiKey: process.env.ORQ_API_KEY!
    });

    const filePath = path.join(__dirname, 'documents', 'CustomerSupportDoc.pdf');

    orq.files.create({
      file: new File([fs.readFileSync(filePath)], 'CustomerSupportDoc.pdf', {
        type: 'application/pdf'
      }),
      purpose: 'retrieval'
    })
      .then((data) => console.log(data))
      .catch(err => console.error(err));
    ```

    This is how a successful response should look like:

    ```
    {
      _id: '$FILE_ID',
      object_name: 'files-api/workspaces/workspace-id/retrieval/$FILE_ID.pdf',
      purpose: 'retrieval',
      file_name: '$FILE_ID.pdf',
      workspace_id: 'workspace-id',
      bytes: 118199,
      created: '2025-10-29T11:22:56.732Z'
    }
    ```

    Add the file id `_id` to the `.env` file:

    ```
    echo 'FILE_ID=$FILE_ID' >> .env
    ```

    If you want to do this step with a GUI see: [Upload a file](https://docs.orq.ai/reference/files/upload-a-file)
  </Step>

  <Step title="Connect the files with the Knowledge Base as datasource">
    ```typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import 'dotenv/config';
    import { Orq } from '@orq-ai/node';

    const orq = new Orq({ apiKey: process.env.ORQ_API_KEY! });

    // Create datasource and search functions
    const createDatasource = () => orq.knowledge.createDatasource({
      knowledgeId: process.env.YOUR_KNOWLEDGE_ID!,
      requestBody: { fileId: process.env.FILE_ID!, displayName: 'CustomerSupportDocs' }
    });

    const searchKnowledge = (question: string) => orq.knowledge.search({
      knowledgeId: process.env.YOUR_KNOWLEDGE_ID!,
      requestBody: { query: question, topK: 5 }
    });

    // Execute
    createDatasource()
      .then(result => console.log('Datasource created successfully:', result))
      .catch(console.error);

    export { createDatasource, searchKnowledge };
    ```

    This is how a successful response looks like:

    ```
    {
      _id: '$YOUR_KNOWLEDGE_ID',
      display_name: 'CustomerSupportDocs',
      file_id: '$FILE_ID',
      knowledge_id: '$YOUR_KNOWLEDGE_ID',
      status: 'queued',
      created: '2025-10-29T11:36:43.916Z',
      updated: '2025-10-29T11:36:43.916Z',
      created_by_id: null,
      update_by_id: null,
      chunks_count: 0
    }
    ```

    Add `YOUR_KNOWLEDGE_ID`to the `.env`

    ```
    echo "YOUR_KNOWLEDGE_ID=$YOUR_KNOWLEDGE_ID" >> .env
    ```

    Now, you will be able to see the uploaded file under your Knowledge Base:

    <img src="https://mintcdn.com/orqai/MbTprtvL0twtWLxU/images/Screenshot2025-11-06at14.19.31.png?fit=max&auto=format&n=MbTprtvL0twtWLxU&q=85&s=5ac3cfd2c7f488a6bbe466a88af109d0" alt="Your knowledge ID" width="2672" height="420" data-path="images/Screenshot2025-11-06at14.19.31.png" />

    To do this step with GUI check [Creating a new datasource](https://docs.orq.ai/reference/knowledge-bases/create-a-new-datasource)

    When you upload documents to a Knowledge Base, [Orq.ai](http://Orq.ai) breaks them down into smaller pieces of text called chunks. Think of it like dividing a book into manageable paragraphs or sections rather than trying to process the entire book at once.

    <img src="https://mintcdn.com/orqai/MbTprtvL0twtWLxU/images/Screenshot2025-11-06at14.20.32.png?fit=max&auto=format&n=MbTprtvL0twtWLxU&q=85&s=c159476a90be914f30e8a446e2de4b48" alt="Chunks" width="2892" height="796" data-path="images/Screenshot2025-11-06at14.20.32.png" />

    This is the customer support chat with connected Knowledge Base:

    ```typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import 'dotenv/config';
    import OpenAI from 'openai';

    interface OrqConfig {
      retry?: { count: number; on_codes: number[] };
      fallbacks?: Array<{ model: string }>;
      cache?: { type: 'exact_match'; ttl: number };
      knowledge_bases?: Array<{
        knowledge_id: string;
        top_k: number;
        threshold: number;
        search_type: 'hybrid_search';
      }>;
    }

    // Initialize the OpenAI client
    const client = new OpenAI({
      apiKey: process.env.ORQ_API_KEY ?? '',
      baseURL: 'https://api.orq.ai/v3/router',
    });

    async function main(): Promise<void> {
      try {
        const requestParams = {
          model: 'openai/gpt-4o',
          stream: true,
          messages: [
            { role: 'user' as const, content: 'What are the best practices for customer support?' },
          ],
          orq: {
            retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
            fallbacks: [
              { model: 'anthropic/claude-3-5-sonnet-20241022' },
              { model: 'openai/gpt-4o-mini' },
            ],
            cache: { type: 'exact_match' as const, ttl: 3600 },
            knowledge_bases: [
              {
                knowledge_id: process.env.YOUR_KNOWLEDGE_ID!,
                top_k: 5,
                threshold: 0.7,
                search_type: 'hybrid_search' as const,
              },
            ],
          },
        };

        console.log('Request:', JSON.stringify(requestParams, null, 2));

        const start = Date.now();
        const stream = await client.chat.completions.create(
          requestParams as OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming
        );

        let chunkCount = 0;
        for await (const chunk of stream) {
          chunkCount++;
          const content = chunk.choices[0]?.delta?.content ?? '';
          process.stdout.write(content);
        }

        console.log(`\n\nTime taken: ${Date.now() - start}ms, Chunks: ${chunkCount}`);
        console.log('Cache status: First run is always a cache miss; run again to check for hit.');
      } catch (error: unknown) {
        console.error('Error:', error instanceof Error ? error.message : String(error));
      }
    }

    main();
    ```

    Once you run the code you will be able to see the knowledge base retrieval on the [Orq.ai](http://Orq.ai) dashboard

    <img src="https://mintcdn.com/orqai/MbTprtvL0twtWLxU/images/Screenshot2025-11-06at14.21.53.png?fit=max&auto=format&n=MbTprtvL0twtWLxU&q=85&s=f05bedd44f3ac18bd6a6f038f1f0a72e" alt="Traces" width="2820" height="806" data-path="images/Screenshot2025-11-06at14.21.53.png" />
  </Step>

  <Step title="Identity Tracking">
    <Info>
      **When to use:**

      * You want to identify and remember the user between chats or sessions.
      * You need to audit who asked what (e.g., Alice Smith asked about "refunds").
      * You're building user profiles, dashboards, or integrating with a CRM (e.g., Salesforce, HubSpot).
      * If your application involves external b2b clients and you want to monitor how many calls your client and at what cost is doing to your application
    </Info>

    For more details see [Identity Tracking](/docs/proxy/identity-tracking)

    If you are prototyping with cURL paste the code snipped with `YOUR_API_KEY`, `YOUR_IDENTITY_ID` and `YOUR_DEPLOYMENT_KEY` variables:

    ```typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import 'dotenv/config';
    import OpenAI from 'openai';

    // Define the custom `orq` interface for TypeScript
    interface OrqConfig {
      retry?: { count: number; on_codes: number[] };
      fallbacks?: Array<{ model: string }>;
      cache?: { enabled: boolean; type: 'exact_match'; ttl: number };
      knowledge_bases?: Array<{
        knowledge_id: string;
        top_k: number;
        threshold: number;
        search_type: 'hybrid_search';
      }>;
      identity?: {
        id: string;
        display_name?: string;
        email?: string;
        metadata?: Array<{ key: string; value: any }>; // Array of key-value pairs
        tags?: string[];
      };
    }

    // Initialize the OpenAI client
    const client = new OpenAI({
      apiKey: process.env.ORQ_API_KEY ?? '',
      baseURL: 'https://api.orq.ai/v3/router',
    });

    async function main(): Promise<void> {
      try {
        if (!process.env.YOUR_KNOWLEDGE_ID) {
          throw new Error('YOUR_KNOWLEDGE_ID not set in .env');
        }
        const requestParams = {
          model: 'openai/gpt-4o',
          stream: true,
          messages: [
            { role: 'user' as const, content: 'How do I upgrade my account?' },
          ],
          orq: {
            retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
            fallbacks: [
              { model: 'anthropic/claude-3-5-sonnet-20241022' },
              { model: 'google/gemini-1.5-pro' },
              { model: 'openai/gpt-4o-mini' },
            ],
            cache: { enabled: true, type: 'exact_match', ttl: 3600 },
            knowledge_bases: [
              {
                knowledge_id: process.env.YOUR_KNOWLEDGE_ID, // e.g., ID for "ORQsupport"
                top_k: 5,
                threshold: 0.7,
                search_type: 'hybrid_search',
              },
            ],
            identity: {
              id: 'support-TICKET-789', // Unique ticket ID
              display_name: 'John Smith',
              email: 'john@company.com',
              metadata: [
                { key: 'ticket_id', value: 'TICKET-789' },
                { key: 'customer_tier', value: 'premium' },
                { key: 'issue_category', value: 'billing' },
                { key: 'created_at', value: new Date().toISOString() },
              ],
              tags: ['support', 'billing-issue', 'premium-user'],
            },
          },
        };
        console.log('Request:', JSON.stringify(requestParams, null, 2));

        const start = Date.now();
        const stream = await client.chat.completions.create(
          requestParams as OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming
        );

        let responseText = '';
        let chunkCount = 0;
        for await (const chunk of stream) {
          chunkCount++;
          const content = chunk.choices[0]?.delta?.content ?? '';
          responseText += content;
          process.stdout.write(content);
        }

        console.log(`\nTime taken: ${Date.now() - start}ms, Chunks: ${chunkCount}`);
        console.log('Full Response:', responseText);
        console.log('Cache status: First run is always a cache miss; run again to check for hit.');
      } catch (error: unknown) {
        console.error('Error:', error instanceof Error ? error.message : String(error));
      }
    }

    main();
    ```

    Once your code snippet runs successfully you will be able to see under Identity Analytics the number of requests that the identity you selected sent and [control the budget](/docs/analytics/identity#budget-control)

    <img src="https://mintcdn.com/orqai/MbTprtvL0twtWLxU/images/Screenshot2025-11-06at14.26.12.png?fit=max&auto=format&n=MbTprtvL0twtWLxU&q=85&s=0bc953d51316c1f08b0abc515c094c27" alt="Control the budget" width="2366" height="156" data-path="images/Screenshot2025-11-06at14.26.12.png" />
  </Step>

  <Step title="Thread tracking">
    <Info>
      **When to use:**

      * Understand the back-and-forth between the user and the assistant
      * Track context drift in long conversations
      * Make sense of multi-step conversations at a glance
    </Info>

    To enable identity tracing try this version of Customer Support app. To learn more see [Threads](/docs/observability/threads)

    ```typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import 'dotenv/config';
    import OpenAI from 'openai';

    // Define the custom `orq` interface for TypeScript
    interface OrqConfig {
      retry?: { count: number; on_codes: number[] };
      fallbacks?: Array<{ model: string }>;
      cache?: { enabled: boolean; type: 'exact_match'; ttl: number };
      knowledge_bases?: Array<{
        knowledge_id: string;
        top_k: number;
        threshold: number;
        search_type: 'hybrid_search';
      }>;
      identity?: {
        id: string;
        display_name?: string;
        email?: string;
        metadata?: Array<{ key: string; value: any }>;
        tags?: string[];
      };
      thread?: {
        id: string;
        tags?: string[];
      };
    }

    // Initialize the OpenAI client
    const client = new OpenAI({
      apiKey: process.env.ORQ_API_KEY ?? '',
      baseURL: 'https://api.orq.ai/v3/router',
    });

    async function main(): Promise<void> {
      try {
        if (!process.env.YOUR_KNOWLEDGE_ID) {
          throw new Error('YOUR_KNOWLEDGE_ID not set in .env');
        }
        const ticketId = 'TICKET-789';
        const threadId = `support-${ticketId}-${Date.now()}`; // Unique thread ID
        const requestParams = {
          model: 'openai/gpt-4o',
          stream: true,
          messages: [
            { role: 'user' as const, content: 'How do I upgrade my account?' },
          ],
          orq: {
            retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
        fallbacks: [
          { model: 'openai/gpt-4o-mini' },
          { model: 'anthropic/claude-3-5-sonnet-20241022' },
          { model: 'google/gemini-1.5-pro' },
        ],
            cache: { enabled: true, type: 'exact_match', ttl: 3600 },
            knowledge_bases: [
              {
                knowledge_id: process.env.YOUR_KNOWLEDGE_ID, // e.g., ID for "ORQsupport"
                top_k: 5,
                threshold: 0.7,
                search_type: 'hybrid_search',
              },
            ],
            identity: {
              id: `support-${ticketId}`,
              display_name: 'John Smith',
              email: 'john@company.com',
              metadata: [
                { key: 'ticket_id', value: ticketId },
                { key: 'customer_tier', value: 'premium' },
                { key: 'issue_category', value: 'billing' },
                { key: 'created_at', value: new Date().toISOString() },
              ],
              tags: ['support', 'billing-issue', 'premium-user'],
            },
            thread: {
              id: threadId,
              tags: ['support', 'billing', 'user-interaction'],
            },
          },
        };
        console.log('Request:', JSON.stringify(requestParams, null, 2));

        const start = Date.now();
        const stream = await client.chat.completions.create(
          requestParams as OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming
        );

        let responseText = '';
        let chunkCount = 0;
        for await (const chunk of stream) {
          chunkCount++;
          const content = chunk.choices[0]?.delta?.content ?? '';
          responseText += content;
          process.stdout.write(content);
        }

        console.log(`\nTime taken: ${Date.now() - start}ms, Chunks: ${chunkCount}`);
        console.log('Full Response:', responseText);
        console.log('Cache status: First run is always a cache miss; run again to check for hit.');
        console.log(`Thread ID: ${threadId}, Identity ID: support-${ticketId}`);
      } catch (error: unknown) {
        console.error('Error:', error instanceof Error ? error.message : String(error));
      }
    }

    main();
    ```

    Once your code snippet runs successfully you will be able to see under Traces --> Threads and see a detailed break-down of your API call

    <img src="https://mintcdn.com/orqai/MbTprtvL0twtWLxU/images/Screenshot2025-11-06at14.28.41.png?fit=max&auto=format&n=MbTprtvL0twtWLxU&q=85&s=0e34c60c48cf2a7f5568b393eb4ba0bf" alt="Traces 2" width="2640" height="524" data-path="images/Screenshot2025-11-06at14.28.41.png" />

    If you send a request again and you will use the same thread.id `support-TICKET-789-<timestamp>`) for both initial and follow-up requests to group them in the same thread:

    <img src="https://mintcdn.com/orqai/MbTprtvL0twtWLxU/images/Screenshot2025-11-06at14.29.30.png?fit=max&auto=format&n=MbTprtvL0twtWLxU&q=85&s=501c7b9f54e21392a4c47849c433167e" alt="Streamed answer" width="2414" height="1138" data-path="images/Screenshot2025-11-06at14.29.30.png" />
  </Step>

  <Step title="Dynamic Inputs">
    <Info>
      **When to use:**

      * Whenever you want your script, program, or tool to handle variable data at runtime instead of hardcoding values [Using Third Party Vector Databases with Orq.ai](https://docs.orq.ai/docs/tutorials/using-thirdparty-vectordbs-with-orq)
    </Info>

    ```typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
    import 'dotenv/config';
    import OpenAI from 'openai';
    import * as readline from 'readline/promises';
    import { stdin as input, stdout as output } from 'process';

    // Define the custom `orq` interface for TypeScript
    interface OrqConfig {
      retry?: { count: number; on_codes: number[] };
      fallbacks?: Array<{ model: string }>;
      cache?: { enabled: boolean; type: 'exact_match'; ttl: number };
      knowledge_bases?: Array<{
        knowledge_id: string;
        top_k: number;
        threshold: number;
        search_type: 'hybrid_search';
      }>;
      identity?: {
        id: string;
        display_name?: string;
        email?: string;
        metadata?: Array<{ key: string; value: any }>;
        tags?: string[];
      };
      thread?: {
        id: string;
        tags?: string[];
      };
    }

    // Initialize the OpenAI client
    const client = new OpenAI({
      apiKey: process.env.ORQ_API_KEY ?? '',
      baseURL: 'https://api.orq.ai/v3/router',
    });

    // Initialize readline for dynamic input
    const rl = readline.createInterface({ input, output });

    // Base configuration
    const ticketId = 'TICKET-789';
    const threadId = `support-${ticketId}-${Date.now()}`; // Unique thread ID
    const identityId = `support-${ticketId}`;
    const baseParams = {
      model: 'openai/gpt-4o',
      stream: true,
      orq: {
        retry: { count: 3, on_codes: [429, 500, 502, 503, 504] },
        fallbacks: [
          { model: 'openai/gpt-4o-mini' },
          { model: 'anthropic/claude-3-5-sonnet-20241022' },
          { model: 'google/gemini-1.5-pro' },
        ],
        cache: { enabled: true, type: 'exact_match', ttl: 3600 },
        knowledge_bases: [
          {
            knowledge_id: process.env.YOUR_KNOWLEDGE_ID ?? '',
            top_k: 5,
            threshold: 0.7,
            search_type: 'hybrid_search',
          },
        ],
        identity: {
          id: identityId,
          display_name: 'John Smith',
          email: 'john@company.com',
          metadata: [
            { key: 'ticket_id', value: ticketId },
            { key: 'customer_tier', value: 'premium' },
            { key: 'issue_category', value: 'billing' },
            { key: 'created_at', value: new Date().toISOString() },
          ],
          tags: ['support', 'billing-issue', 'premium-user'],
        },
        thread: {
          id: threadId,
          tags: ['support', 'billing', 'user-interaction'],
        },
      },
    };

    async function sendRequest(
      params: OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming,
      requestLabel: string
    ): Promise<string> {
      console.log(`\n--- ${requestLabel} ---`);
      console.log('Request:', JSON.stringify(params, null, 2));
      const start = Date.now();
      const stream = await client.chat.completions.create(params);
      let responseText = '';
      let chunkCount = 0;

      for await (const chunk of stream) {
        chunkCount++;
        const content = chunk.choices[0]?.delta?.content ?? '';
        responseText += content;
        process.stdout.write(content);
      }

      console.log(`\nTime taken: ${Date.now() - start}ms, Chunks: ${chunkCount}`);
      console.log('Full Response:', responseText);
      console.log(`Thread ID: ${threadId}, Identity ID: ${identityId}`);
      return responseText;
    }

    async function main(): Promise<void> {
      try {
        if (!process.env.YOUR_KNOWLEDGE_ID) {
          throw new Error('YOUR_KNOWLEDGE_ID not set in .env');
        }

        // Store conversation history
        const conversationHistory: Array<{ role: 'user' | 'assistant'; content: string }> = [];

        // First dynamic input
        let userInput = await rl.question('Enter your first question (e.g., "How do I upgrade my account?"): ');
        if (!userInput.trim()) {
          throw new Error('First input cannot be empty');
        }

        const initialParams = {
          ...baseParams,
          messages: [{ role: 'user' as const, content: userInput }],
        };
        const initialResponse = await sendRequest(initialParams, 'First Request');
        conversationHistory.push(
          { role: 'user', content: userInput },
          { role: 'assistant', content: initialResponse }
        );
        console.log('Cache status: First run is always a cache miss; run again to check for hit.');

        // Second dynamic input
        userInput = await rl.question('Enter your follow-up question (e.g., "I didn’t receive the confirmation email"): ');
        if (!userInput.trim()) {
          throw new Error('Follow-up input cannot be empty');
        }

        const followUpParams = {
          ...baseParams,
          messages: [...conversationHistory, { role: 'user' as const, content: userInput }],
          orq: {
            ...baseParams.orq,
            thread: {
              id: threadId, // Same thread ID
              tags: ['support', 'billing', 'user-interaction', 'follow-up'],
            },
          },
        };
        const followUpResponse = await sendRequest(followUpParams, 'Follow-up Request');
        conversationHistory.push(
          { role: 'user', content: userInput },
          { role: 'assistant', content: followUpResponse }
        );
        console.log('Cache status: Check if cached (if messages match previous run).');

      } catch (error: unknown) {
        console.error('Error:', error instanceof Error ? error.message : String(error));
      } finally {
        rl.close();
      }
    }

    main();
    ```

    <iframe src="https://drive.google.com/file/d/1hg8YAtO1MgVdfS04vPVH2KQci-1Vq5NV/preview" width="700" height="200" allow="autoplay" />
  </Step>
</Steps>

## Advanced framework integrations

[Orq.ai](http://Orq.ai)'s AI Gateway seamlessly integrates with popular AI development frameworks, allowing you to leverage existing tools and workflows while benefiting from gateway features like fallbacks, caching, and observability.

## LangChain Integration

[Orq.ai](http://Orq.ai) works natively with LangChain by simply pointing to the gateway endpoint. This gives you access to fallback models, caching, and knowledge base retrieval while using LangChain's abstractions. For more detailed guide see [LangChain integration](https://docs.orq.ai/docs/langchain-1#/)

```typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
import { ChatOpenAI } from "@langchain/openai";

// Configure LangChain to use Orq.ai gateway
const llm = new ChatOpenAI({
  configuration: {
    baseURL: "https://api.orq.ai/v3/router",
  },
  openAIApiKey: "YOUR_API_KEY",
  modelName: "openai/gpt-4o",
});

const response = await llm.invoke("How do I reset my password?");
```

## DSPy

DSPy programs can route through [Orq.ai](http://Orq.ai) to gain automatic prompt optimization alongside gateway reliability features.  For more detailed guide see [DSPy Integration](https://docs.orq.ai/docs/dspy-gateway#/)

```typescript theme={"theme":{"light":"github-light","dark":"github-dark"}}
import * as dspy from "dspy-ai";

// Configure DSPy with Orq.ai gateway
const lm = new dspy.OpenAI({
  apiBase: "https://api.orq.ai/v3/router",
  apiKey: "YOUR_API_KEY",
  model: "openai/gpt-4o"
});

dspy.settings.configure({ lm: lm });
```

## Base URL configuration

```
# Orq.ai Cloud (default)
https://api.orq.ai/v3/router

# Your on-premises deployment
https://your-domain.com/v3/router
```

## Conclusion

[Orq.ai](http://Orq.ai)'s AI Gateway provides a unified, scalable, and production-ready solution for building reliable AI applications. By routing through a single API endpoint, you gain:

1. **Unified access**: Connect to multiple AI providers (OpenAI, Anthropic, AWS) through one API
2. **High availability**: Automatic fallbacks and retries ensure your application stays online
3. **Cost efficiency**: Response caching reduces API costs and latency
4. **Smart context**: Built-in knowledge base integration for domain-specific answers
5. **Production observability**: Comprehensive traces and OTEL compatibility for monitoring
6. **Flexible deployment**: Cloud, on-premises, or edge options to meet your needs
7. **High availability:** Automatic fallbacks and retries ensure your application stays online
8. \*\*Cost efficiency \*\*: Response caching reduces API costs and latency
9. **Smart context :** Built-in knowledge base integration for domain-specific answers
10. **Production observability** : Comprehensive traces and OTEL compatibility for monitoring
11. **Flexible deployment**: Cloud, on-premises, or edge options to meet your needs
