Use Cases
Chat UIs that show responses as they arrive, before generation completes.
Long-form generation (reports, code) where waiting for the full output hurts UX.
Agent workflows that surface reasoning steps or tool calls in real time.
Reducing perceived latency on slow models or large outputs.
Quick Start
Enable real-time response streaming for better user experience.
cURL
TypeScript
Python
TypeScript (Chat Completions)
curl -N -X POST https://api.orq.ai/v3/router/responses \
-H "Authorization: Bearer $ORQ_API_KEY " \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"input": "Write a story about space exploration",
"stream": true
}'
Configuration
Parameter Type Required Description streamboolean Yes Enable streaming responses
All models support streaming : no additional configuration needed.
Streaming chunks:
JSON
JSON (Chat Completions)
{
"type" : "response.output_text.delta" ,
"delta" : "Hello"
}
Final chunk:
JSON
JSON (Chat Completions)
{
"type" : "response.output_text.done"
}
Code examples
cURL
cURL (Chat Completions)
TypeScript
Python
TypeScript (Chat Completions)
Python (Chat Completions)
curl -N -X POST https://api.orq.ai/v3/router/responses \
-H "Authorization: Bearer $ORQ_API_KEY " \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"input": "Write a detailed explanation of quantum computing",
"stream": true
}'
Stream Processing Patterns
The examples in this section use the Chat Completions endpoint. The same patterns apply to the Responses API: replace chat.completions.create(...) with responses.create(...), update the endpoint to /v3/router/responses, and handle response.output_text.delta events instead of choices[0].delta.content.
Basic processing
TypeScript (Chat Completions)
const processStream = async ( stream ) => {
let fullResponse = "" ;
for await ( const chunk of stream) {
const content = chunk.choices[ 0 ]?.delta?.content || "" ;
if (content) {
fullResponse += content;
console. log (content); // Real-time output
}
// Check for completion
if (chunk.choices[ 0 ]?.finish_reason) {
console. log ( ` \n Stream finished: ${ chunk . choices [ 0 ]. finish_reason }` );
break ;
}
}
return fullResponse;
};
With error handling
TypeScript (Chat Completions)
const updateUI = ( content : string ) => { process.stdout. write (content); }; // replace with your UI update logic
const robustStreamProcessing = async ( stream ) => {
try {
let response = "" ;
const timeout = setTimeout (() => {
throw new Error ( "Stream timeout" );
}, 30000 );
for await ( const chunk of stream) {
clearTimeout (timeout);
if (chunk.choices[ 0 ]?.delta?.content) {
response += chunk.choices[ 0 ].delta.content;
// Update UI with new content
updateUI (chunk.choices[ 0 ].delta.content);
}
if (chunk.choices[ 0 ]?.finish_reason) {
break ;
}
}
return response;
} catch (error) {
console. error ( "Streaming error:" , error);
throw error;
}
};
Function Calling with Streaming
Stream tool calls as they’re generated:
TypeScript (Chat Completions)
Python (Chat Completions)
import OpenAI from "openai" ;
const client = new OpenAI ({
apiKey: process.env. ORQ_API_KEY ,
baseURL: "https://api.orq.ai/v3/router" ,
});
const tools = [
{
type: "function" as const ,
function: {
name: "get_weather" ,
description: "Get current weather" ,
parameters: {
type: "object" ,
properties: { location: { type: "string" } },
required: [ "location" ],
},
},
},
];
const stream = await client.chat.completions. create ({
model: "openai/gpt-4o" ,
messages: [{ role: "user" , content: "What's the weather in Paris?" }],
tools,
stream: true ,
});
for await ( const chunk of stream) {
if ( ! chunk.choices. length ) continue ;
const delta = chunk.choices[ 0 ].delta;
if (delta.tool_calls?.[ 0 ]?.function?.arguments) {
process.stdout. write (delta.tool_calls[ 0 ].function.arguments);
} else if (delta.content) {
process.stdout. write (delta.content);
}
}
UI Integration Examples
React hook for streaming
TypeScript (Chat Completions)
import OpenAI from "openai" ;
import { useState, useCallback } from "react" ;
const client = new OpenAI ({
apiKey: process.env. ORQ_API_KEY ,
baseURL: "https://api.orq.ai/v3/router" ,
});
const useStreamingChat = () => {
const [ response , setResponse ] = useState ( "" );
const [ isStreaming , setIsStreaming ] = useState ( false );
const streamChat = useCallback ( async ( message ) => {
setIsStreaming ( true );
setResponse ( "" );
try {
const stream = await client.chat.completions. create ({
model: "openai/gpt-4o" ,
messages: [{ role: "user" , content: message }],
stream: true ,
});
for await ( const chunk of stream) {
const content = chunk.choices[ 0 ]?.delta?.content || "" ;
if (content) {
setResponse (( prev ) => prev + content);
}
if (chunk.choices[ 0 ]?.finish_reason) {
setIsStreaming ( false );
break ;
}
}
} catch (error) {
console. error ( "Streaming failed:" , error);
setIsStreaming ( false );
}
}, []);
return { response, isStreaming, streamChat };
};
Server-Sent Events (Browser):
TypeScript (Chat Completions)
const streamWithSSE = async ( message : string ) : Promise < void > => {
const response = await fetch ( "/api/chat-stream" , {
method: "POST" ,
headers: { "Content-Type" : "application/json" },
body: JSON . stringify ({ message }),
});
if ( ! response.ok || ! response.body) {
throw new Error ( `Request failed: ${ response . status }` );
}
const reader = response.body. getReader ();
const decoder = new TextDecoder ();
const output = document. getElementById ( "response" ) ! ;
let buffer = "" ;
while ( true ) {
const { done , value } = await reader. read ();
if (done) break ;
buffer += decoder. decode (value, { stream: true });
const lines = buffer. split ( " \n " );
buffer = lines. pop () ?? "" ;
for ( const line of lines) {
if (line === "data: [DONE]" ) break ;
if ( ! line. startsWith ( "data: " )) continue ;
const data = JSON . parse (line. slice ( 6 ));
const content = data.choices[ 0 ]?.delta?.content;
if (content) output.innerHTML += content;
}
}
};
Chunk buffering
class StreamBuffer {
private buffer : string ;
private flushInterval : number ;
private lastFlush : number ;
constructor ( flushInterval = 50 ) {
this .buffer = "" ;
this .flushInterval = flushInterval;
this .lastFlush = Date. now ();
}
add ( content : string ) : void {
this .buffer += content;
// Flush periodically or when buffer is large
if (
Date. now () - this .lastFlush > this .flushInterval ||
this .buffer. length > 100
) {
this . flush ();
}
}
flush () : void {
if ( this .buffer) {
this . onFlush ( this .buffer);
this .buffer = "" ;
this .lastFlush = Date. now ();
}
}
onFlush ( content : string ) : void {
// Override this method
console. log (content);
}
}
Memory management
TypeScript (Chat Completions)
const processLargeStream = async ( stream , maxMemory = 1000000 ) => {
let totalLength = 0 ;
const chunks = [];
for await ( const chunk of stream) {
const content = chunk.choices[ 0 ]?.delta?.content || "" ;
if (content) {
totalLength += content. length ;
chunks. push (content);
// Prevent memory overflow
if (totalLength > maxMemory) {
console. warn ( "Stream too large, truncating" );
break ;
}
}
if (chunk.choices[ 0 ]?.finish_reason) {
break ;
}
}
return chunks. join ( "" );
};
Best Practices
Stream management
Set reasonable timeouts (30-60 seconds).
Implement proper error boundaries.
Handle network interruptions gracefully.
Provide user cancellation options.
UI/UX considerations
Show typing indicators during streaming.
Allow users to stop generation.
Buffer small chunks for smoother display.
Handle rapid updates efficiently.
Error recovery example
TypeScript
TypeScript (Chat Completions)
import OpenAI from "openai" ;
const client = new OpenAI ({
apiKey: process.env. ORQ_API_KEY ,
baseURL: "https://api.orq.ai/v3/router" ,
});
const streamWithRetry = async ( input : string , maxRetries = 3 ) => {
for ( let attempt = 1 ; attempt <= maxRetries; attempt ++ ) {
try {
const stream = await client.responses. create ({
model: "openai/gpt-4o" ,
input,
stream: true ,
});
let fullResponse = "" ;
for await ( const event of stream) {
if (event.type === "response.output_text.delta" ) {
fullResponse += event.delta;
process.stdout. write (event.delta);
}
}
return fullResponse;
} catch (error) {
if (attempt === maxRetries) throw error;
console. log ( `Stream attempt ${ attempt } failed, retrying...` );
await new Promise (( resolve ) => setTimeout (resolve, 1000 * attempt));
}
}
};
Troubleshooting
Stream cuts off unexpectedly
Check network stability.
Verify timeout settings.
Monitor for rate limiting.
Check model-specific limits.
Slow streaming performance
Optimize chunk processing.
Reduce buffer flush frequency.
Check network latency.
Consider model selection.
Memory issues
Implement chunk size limits.
Use streaming parsers.
Clear processed chunks.
Monitor memory usage.
Limitations
Limitation Impact Workaround Network interruption Stream breaks Implement reconnection logic Processing overhead Slight performance cost Optimize chunk handling Model variations Different chunk sizes Handle variable chunk lengths Rate limiting Stream throttling Implement backoff strategies
Advanced Features
The examples in this section use the Chat Completions endpoint. The same patterns apply to the Responses API: replace chat.completions.create(...) with responses.create(...). For cURL, use /v3/router/responses.
Stream with other Gateway features
TypeScript (Chat Completions)
import OpenAI from "openai" ;
const client = new OpenAI ({
apiKey: process.env. ORQ_API_KEY ,
baseURL: "https://api.orq.ai/v3/router" ,
});
const advancedStream = await client.chat.completions. create ({
model: "openai/gpt-4o" ,
messages: [{ role: "user" , content: "Explain machine learning" }],
stream: true ,
name: "StreamingBot-v1" ,
cache: { type: "exact_match" , ttl: 3600 },
timeout: { call_timeout: 30000 },
});
Parallel streaming
TypeScript (Chat Completions)
import OpenAI from "openai" ;
const client = new OpenAI ({
apiKey: process.env. ORQ_API_KEY ,
baseURL: "https://api.orq.ai/v3/router" ,
});
const processStream = async ( stream ) => {
let fullResponse = "" ;
for await ( const chunk of stream) {
const content = chunk.choices[ 0 ]?.delta?.content || "" ;
if (content) fullResponse += content;
}
return fullResponse;
};
const parallelStreaming = async ( queries ) => {
const streams = queries. map (( query ) =>
client.chat.completions. create ({
model: "openai/gpt-4o" ,
messages: [{ role: "user" , content: query }],
stream: true ,
}),
);
// Process all streams concurrently
return Promise . all (streams. map (processStream));
};