Skip to main content
Use Cases
  • Chat UIs that show responses as they arrive, before generation completes.
  • Long-form generation (reports, code) where waiting for the full output hurts UX.
  • Agent workflows that surface reasoning steps or tool calls in real time.
  • Reducing perceived latency on slow models or large outputs.

Quick Start

Enable real-time response streaming for better user experience.
curl -N -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "input": "Write a story about space exploration",
    "stream": true
  }'

Configuration

ParameterTypeRequiredDescription
streambooleanYesEnable streaming responses
All models support streaming: no additional configuration needed.

Response Format

Streaming chunks:
{
  "type": "response.output_text.delta",
  "delta": "Hello"
}
Final chunk:
{
  "type": "response.output_text.done"
}

Code examples

curl -N -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "input": "Write a detailed explanation of quantum computing",
    "stream": true
  }'

Stream Processing Patterns

The examples in this section use the Chat Completions endpoint. The same patterns apply to the Responses API: replace chat.completions.create(...) with responses.create(...), update the endpoint to /v3/router/responses, and handle response.output_text.delta events instead of choices[0].delta.content.

Basic processing

const processStream = async (stream) => {
  let fullResponse = "";

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || "";
    if (content) {
      fullResponse += content;
      console.log(content); // Real-time output
    }

    // Check for completion
    if (chunk.choices[0]?.finish_reason) {
      console.log(`\nStream finished: ${chunk.choices[0].finish_reason}`);
      break;
    }
  }

  return fullResponse;
};

With error handling

const updateUI = (content: string) => { process.stdout.write(content); }; // replace with your UI update logic

const robustStreamProcessing = async (stream) => {
  try {
    let response = "";
    const timeout = setTimeout(() => {
      throw new Error("Stream timeout");
    }, 30000);

    for await (const chunk of stream) {
      clearTimeout(timeout);

      if (chunk.choices[0]?.delta?.content) {
        response += chunk.choices[0].delta.content;
        // Update UI with new content
        updateUI(chunk.choices[0].delta.content);
      }

      if (chunk.choices[0]?.finish_reason) {
        break;
      }
    }

    return response;
  } catch (error) {
    console.error("Streaming error:", error);
    throw error;
  }
};

Function Calling with Streaming

Stream tool calls as they’re generated:
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const tools = [
  {
    type: "function" as const,
    function: {
      name: "get_weather",
      description: "Get current weather",
      parameters: {
        type: "object",
        properties: { location: { type: "string" } },
        required: ["location"],
      },
    },
  },
];

const stream = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "What's the weather in Paris?" }],
  tools,
  stream: true,
});

for await (const chunk of stream) {
  if (!chunk.choices.length) continue;
  const delta = chunk.choices[0].delta;
  if (delta.tool_calls?.[0]?.function?.arguments) {
    process.stdout.write(delta.tool_calls[0].function.arguments);
  } else if (delta.content) {
    process.stdout.write(delta.content);
  }
}

UI Integration Examples

React hook for streaming

import OpenAI from "openai";
import { useState, useCallback } from "react";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const useStreamingChat = () => {
  const [response, setResponse] = useState("");
  const [isStreaming, setIsStreaming] = useState(false);

  const streamChat = useCallback(async (message) => {
    setIsStreaming(true);
    setResponse("");

    try {
      const stream = await client.chat.completions.create({
        model: "openai/gpt-4o",
        messages: [{ role: "user", content: message }],
        stream: true,
      });

      for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content || "";
        if (content) {
          setResponse((prev) => prev + content);
        }

        if (chunk.choices[0]?.finish_reason) {
          setIsStreaming(false);
          break;
        }
      }
    } catch (error) {
      console.error("Streaming failed:", error);
      setIsStreaming(false);
    }
  }, []);

  return { response, isStreaming, streamChat };
};
Server-Sent Events (Browser):
const streamWithSSE = async (message: string): Promise<void> => {
  const response = await fetch("/api/chat-stream", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ message }),
  });

  if (!response.ok || !response.body) {
    throw new Error(`Request failed: ${response.status}`);
  }

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  const output = document.getElementById("response")!;
  let buffer = "";

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split("\n");
    buffer = lines.pop() ?? "";

    for (const line of lines) {
      if (line === "data: [DONE]") break;
      if (!line.startsWith("data: ")) continue;
      const data = JSON.parse(line.slice(6));
      const content = data.choices[0]?.delta?.content;
      if (content) output.innerHTML += content;
    }
  }
};

Performance Optimization

Chunk buffering

class StreamBuffer {
  private buffer: string;
  private flushInterval: number;
  private lastFlush: number;

  constructor(flushInterval = 50) {
    this.buffer = "";
    this.flushInterval = flushInterval;
    this.lastFlush = Date.now();
  }

  add(content: string): void {
    this.buffer += content;

    // Flush periodically or when buffer is large
    if (
      Date.now() - this.lastFlush > this.flushInterval ||
      this.buffer.length > 100
    ) {
      this.flush();
    }
  }

  flush(): void {
    if (this.buffer) {
      this.onFlush(this.buffer);
      this.buffer = "";
      this.lastFlush = Date.now();
    }
  }

  onFlush(content: string): void {
    // Override this method
    console.log(content);
  }
}

Memory management

const processLargeStream = async (stream, maxMemory = 1000000) => {
  let totalLength = 0;
  const chunks = [];

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || "";

    if (content) {
      totalLength += content.length;
      chunks.push(content);

      // Prevent memory overflow
      if (totalLength > maxMemory) {
        console.warn("Stream too large, truncating");
        break;
      }
    }

    if (chunk.choices[0]?.finish_reason) {
      break;
    }
  }

  return chunks.join("");
};

Best Practices

Stream management

  • Set reasonable timeouts (30-60 seconds).
  • Implement proper error boundaries.
  • Handle network interruptions gracefully.
  • Provide user cancellation options.

UI/UX considerations

  • Show typing indicators during streaming.
  • Allow users to stop generation.
  • Buffer small chunks for smoother display.
  • Handle rapid updates efficiently.

Error recovery example

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const streamWithRetry = async (input: string, maxRetries = 3) => {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const stream = await client.responses.create({
        model: "openai/gpt-4o",
        input,
        stream: true,
      });

      let fullResponse = "";
      for await (const event of stream) {
        if (event.type === "response.output_text.delta") {
          fullResponse += event.delta;
          process.stdout.write(event.delta);
        }
      }
      return fullResponse;
    } catch (error) {
      if (attempt === maxRetries) throw error;

      console.log(`Stream attempt ${attempt} failed, retrying...`);
      await new Promise((resolve) => setTimeout(resolve, 1000 * attempt));
    }
  }
};

Troubleshooting

Stream cuts off unexpectedly
  • Check network stability.
  • Verify timeout settings.
  • Monitor for rate limiting.
  • Check model-specific limits. Slow streaming performance
  • Optimize chunk processing.
  • Reduce buffer flush frequency.
  • Check network latency.
  • Consider model selection. Memory issues
  • Implement chunk size limits.
  • Use streaming parsers.
  • Clear processed chunks.
  • Monitor memory usage.

Limitations

LimitationImpactWorkaround
Network interruptionStream breaksImplement reconnection logic
Processing overheadSlight performance costOptimize chunk handling
Model variationsDifferent chunk sizesHandle variable chunk lengths
Rate limitingStream throttlingImplement backoff strategies

Advanced Features

The examples in this section use the Chat Completions endpoint. The same patterns apply to the Responses API: replace chat.completions.create(...) with responses.create(...). For cURL, use /v3/router/responses.

Stream with other Gateway features

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const advancedStream = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Explain machine learning" }],
  stream: true,
  name: "StreamingBot-v1",
  cache: { type: "exact_match", ttl: 3600 },
  timeout: { call_timeout: 30000 },
});

Parallel streaming

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const processStream = async (stream) => {
  let fullResponse = "";
  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || "";
    if (content) fullResponse += content;
  }
  return fullResponse;
};

const parallelStreaming = async (queries) => {
  const streams = queries.map((query) =>
    client.chat.completions.create({
      model: "openai/gpt-4o",
      messages: [{ role: "user", content: query }],
      stream: true,
    }),
  );

  // Process all streams concurrently
  return Promise.all(streams.map(processStream));
};