The AI Router exposes three OpenAI-compatible audio endpoints: text-to-speech, transcription, and translation. All endpoints support fallbacks, load balancing, and retries.
Text to speech
| Provider | Model |
|---|
| OpenAI | openai/tts-1 |
| OpenAI | openai/tts-1-hd |
| OpenAI | openai/gpt-4o-mini-tts |
| ElevenLabs | elevenlabs/eleven_multilingual_v2 |
| ElevenLabs | elevenlabs/eleven_turbo_v2_5 |
| ElevenLabs | elevenlabs/eleven_flash_v2_5 |
| ElevenLabs | elevenlabs/eleven_flash_v2 |
| Google AI | google-ai/gemini-2.5-flash-preview-tts |
| Google AI | google-ai/gemini-2.5-pro-preview-tts |
| Vertex AI | google/gemini-2.5-flash-preview-tts |
| Vertex AI | google/gemini-2.5-pro-preview-tts |
Convert text to audio using POST /v2/router/audio/speech.
import OpenAI from "openai";
import fs from "fs";
const client = new OpenAI({
apiKey: process.env.ORQ_API_KEY,
baseURL: "https://my.orq.ai/v2/router",
});
const response = await client.audio.speech.create({
model: "openai/tts-1",
voice: "alloy",
input: "Hello, welcome to Acme Corp. How can I help you today?",
response_format: "mp3",
});
const buffer = Buffer.from(await response.arrayBuffer());
fs.writeFileSync("output.mp3", buffer);
Streaming
Process audio chunks in real time as they arrive, useful for low-latency playback pipelines.
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.ORQ_API_KEY,
baseURL: "https://my.orq.ai/v2/router",
});
const response = await client.audio.speech.create({
model: "openai/tts-1",
voice: "alloy",
input: "Hello, welcome to Acme Corp. How can I help you today?",
response_format: "pcm",
});
const reader = response.body!.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
processAudioChunk(value); // pipe to speaker, buffer, etc.
}
Parameters
| Parameter | Description |
|---|
model | Model ID |
input | Text to synthesize. Maximum length varies by provider |
voice | Voice ID. See Voices below |
response_format | Output format. Supported values vary by provider (mp3, opus, aac, flac, wav, pcm) |
speed | Playback speed of the generated audio |
Voices
| Provider | Voice |
|---|
| OpenAI | alloy, echo, fable, onyx, nova, shimmer |
| ElevenLabs | aria, roger, sarah, laura, charlie, george, callum, river, liam, charlotte, alice, matilda, will, jessica, eric, chris |
Transcription
| Provider | Model |
|---|
| OpenAI | openai/whisper-1 |
| OpenAI | openai/gpt-4o-transcribe |
| OpenAI | openai/gpt-4o-mini-transcribe |
| ElevenLabs | elevenlabs/scribe_v1 |
| Groq | groq/whisper-large-v3 |
| Groq | groq/whisper-large-v3-turbo |
| Mistral | mistral/voxtral-mini-2507 |
| Azure | azure/whisper |
For the full and up-to-date list of transcription models, see Speech-to-Text models on the Supported Models page.
Transcribe an audio file to text using POST /v2/router/audio/transcriptions.
import OpenAI from "openai";
import fs from "fs";
const client = new OpenAI({
apiKey: process.env.ORQ_API_KEY,
baseURL: "https://my.orq.ai/v2/router",
});
const transcription = await client.audio.transcriptions.create({
model: "openai/gpt-4o-transcribe",
file: fs.createReadStream("meeting.mp3"),
response_format: "json",
language: "en",
});
console.log(transcription.text);
Parameters
| Parameter | Description |
|---|
model | Model ID |
file | Audio file to transcribe. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm |
language | ISO-639-1 language code of the input audio (e.g. en, fr, de) |
prompt | Optional text to guide the model’s style or continue a previous segment |
response_format | json, text, srt, verbose_json, or vtt |
temperature | Sampling temperature between 0 and 1 |
timestamp_granularities | Array of granularities: ["word"], ["segment"], or ["word", "segment"]. Requires verbose_json. Not supported by all models |
diarize | Annotate which speaker is talking in the file. ElevenLabs only |
num_speakers | Maximum number of speakers to identify. ElevenLabs only |
tag_audio_events | Tag non-speech events such as (laughter) or (applause). ElevenLabs only |
enable_logging | Set to false to disable logging and enable zero data retention |
Translation
The OpenAI translation endpoint only supports openai/whisper-1. gpt-4o-transcribe and gpt-4o-mini-transcribe do not support translation.
Transcribe and translate audio to English using POST /v2/router/audio/translations. The output is always in English regardless of the source language.
import OpenAI from "openai";
import fs from "fs";
const client = new OpenAI({
apiKey: process.env.ORQ_API_KEY,
baseURL: "https://my.orq.ai/v2/router",
});
const translation = await client.audio.translations.create({
model: "openai/whisper-1",
file: fs.createReadStream("interview_french.mp3"),
response_format: "json",
});
console.log(translation.text);
Translation supports the same response_format and temperature parameters as transcription.