Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

The AI Router supports all input and output modalities through a single OpenAI-compatible API. All endpoints share the same base URL, authentication, and orq.ai router features: fallbacks, caching, load balancing, and retries.

Modality	Endpoint
Image input	`POST /v3/router/chat/completions`
PDF input	`POST /v3/router/chat/completions`
Image generation	`POST /v3/router/images/generations`
Image editing	`POST /v3/router/images/edits`
Image variations	`POST /v3/router/images/variations`
Text to speech	`POST /v3/router/audio/speech`
Transcription	`POST /v3/router/audio/transcriptions`
Translation	`POST /v3/router/audio/translations`

All endpoints use the same base URL and authentication:

BASE_URL=https://api.orq.ai/v3/router
Authorization: Bearer $ORQ_API_KEY

To see which models support a specific modality, filter the Supported Models page or check the Providers page in orq.ai.

Image input

Analyze images alongside text. Pass image URLs or base64-encoded files in chat/completions messages.

PDF input

Send PDF documents for extraction and analysis. Supported natively by compatible models.

Image generation

Generate, edit, and vary images using DALL-E 2, DALL-E 3, and GPT Image 1.

Audio

Convert text to speech, transcribe audio files, and translate audio to English.

Image input

Analyze images alongside text using POST /v3/router/chat/completions. Pass images as public URLs or base64-encoded data in the message content array.

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image? Describe in detail."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image.jpg"
            }
          }
        ]
      }
    ]
  }'

Supported formats

Format	Use case	Max size
JPEG/JPG	Photos, general images	20MB
PNG	Screenshots, diagrams	20MB
GIF	Static images only	20MB
WebP	Modern web images	20MB
Base64	Embedded image data	Model context limit
URLs	Public image links	Model context limit

Detail levels

Level	Resolution	Speed	Cost	Use case
`"low"`	512x512	Fast	Low	Quick overview
`"high"`	Full resolution	Slow	High	Detailed analysis
`"auto"`	Model decides	Medium	Medium	Balanced (default)

Set detail in the image_url object:

{
  "type": "image_url",
  "image_url": {
    "url": "https://example.com/image.jpg",
    "detail": "high"
  }
}

Patterns

Multiple images:

const content = [
  { type: "text", text: "Compare these before and after photos. What changes do you notice?" },
  { type: "image_url", image_url: { url: "https://example.com/before.jpg", detail: "high" } },
  { type: "image_url", image_url: { url: "https://example.com/after.jpg", detail: "high" } },
];

const response = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content }],
});

OCR and text extraction:

const response = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Extract all text from this image. Return as plain text, preserving formatting where possible.",
        },
        {
          type: "image_url",
          image_url: { url: imageUrl, detail: "high" },
        },
      ],
    },
  ],
});

Structured output:

from pydantic import BaseModel
from typing import List

class ImageAnalysis(BaseModel):
    objects: List[str]
    text_content: str
    dominant_colors: List[str]
    confidence: float

response = client.beta.chat.completions.parse(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this image systematically"},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }],
    response_format=ImageAnalysis
)

Limitations

Limitation	Details	Workaround
File size	20MB max per image	Compress before upload
Image count	Varies by model (5-16)	Process in batches
Video	Static images only	Extract frames for analysis
Privacy	Images sent to provider	Use on-premise models if needed

PDF input

Send PDF documents directly in chat messages for analysis and content extraction using POST /v3/router/chat/completions.

PDF input support varies by model. See the Supported Models page and check your provider’s documentation for PDF capability.

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Please analyze this PDF document and provide a summary"
          },
          {
            "type": "file",
            "file": {
              "file_data": "data:application/pdf;base64,YOUR_BASE64_ENCODED_PDF",
              "filename": "document.pdf"
            }
          }
        ]
      }
    ]
  }'

Parameters

Chat Completions (/v3/router/chat/completions):

Parameter	Type	Required	Description
`type`	`"file"`	Yes	Content type for file input
`file.file_data`	string	Yes	Data URI with base64 PDF content
`file.filename`	string	Yes	Name of the file for model context

Responses API (/v3/router/responses):

Parameter	Type	Required	Description
`type`	`"input_file"`	Yes	Content type for file input
`file_data`	string	Yes	Data URI with base64 PDF content
`filename`	string	Yes	Name of the file for model context

Format: data:application/pdf;base64,{base64_content}

Use cases

Scenario	Example prompt
Contract analysis	”Extract key terms and obligations”
Invoice processing	”Extract amounts, dates, vendor info”
Research papers	”Summarize methodology and findings”
Form extraction	”Convert form data to JSON”

Limitations

Limitation	Details	Workaround
File size	Model context limits	Split large PDFs
Scanned documents	Quality varies by model	Use OCR preprocessing
Complex layouts	Tables and charts may not extract well	Use structured prompts
Security	Sensitive documents sent to provider	Use on-premise models

Image generation

Generate images from a text prompt using POST /v3/router/images/generations.

For the full and up-to-date list of supported image models, see Image models on the Supported Models page.

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.images.generate({
  model: "openai/gpt-image-1",
  prompt: "A futuristic city skyline at sunset, photorealistic",
  n: 1,
  size: "1024x1024",
});

console.log(response.data[0].b64_json?.slice(0, 40));

Parameters

Parameter	Description
`model`	Model ID
`prompt`	Text description of the desired image
`n`	Number of images to generate
`size`	Image dimensions (see Supported Models for per-model sizes)
`response_format`	`url` or `b64_json`. DALL-E 2/3 only; `gpt-image-1` always returns `b64_json`
`quality`	Image quality level. Values vary by model
`style`	`vivid` or `natural`. DALL-E 3 only
`background`	`transparent`, `opaque`, or `auto`. `gpt-image-1` only
`output_format`	`png`, `jpeg`, or `webp`. `gpt-image-1` only
`output_compression`	Compression level 0-100%. `gpt-image-1` only
`moderation`	`auto` or `low`. `gpt-image-1` only

Set response_format to url to receive a hosted image link, or b64_json to receive the image inline as a base64-encoded string.

{
  "created": 1234567890,
  "data": [
    {
      "b64_json": "iVBORw0KGgo..."
    }
  ]
}

Image editing

Modify an existing image using a prompt and an optional mask with POST /v3/router/images/edits.

import fs from "fs";
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.images.edit({
  model: "openai/gpt-image-1",
  image: fs.createReadStream("original.png"),
  prompt: "Add a sunset sky behind the buildings",
  size: "1024x1024",
});

console.log(response.data[0].b64_json?.slice(0, 40));

Parameter	Description
`model`	Model ID
`image`	PNG, WEBP, or JPEG file to edit. Some models accept an array of images
`prompt`	Text description of the desired edit
`mask`	Optional PNG mask where transparent areas indicate where to edit
`size`	Output image dimensions
`response_format`	`url` or `b64_json`. `gpt-image-1` always returns `b64_json`
`quality`	Image quality level. Values vary by model

Image variations

Generate variations of an existing image with POST /v3/router/images/variations. See Image models for which models support variations.

import fs from "fs";
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.images.createVariation({
  model: "openai/dall-e-2",
  image: fs.createReadStream("original.png"),
  size: "1024x1024",
  response_format: "url",
});

response.data.forEach((img) => console.log(img.url));

Parameter	Description
`model`	Model ID
`image`	PNG image to create a variation of
`n`	Number of variations to generate (1-10)
`size`	Output image dimensions
`response_format`	`url` or `b64_json`

Fallbacks and reliability

Image endpoints support the same fallbacks and retry parameters as chat completions:

TypeScript

const response = await client.images.generate({
  model: "openai/gpt-image-1",
  prompt: "A mountain lake at dawn",
  size: "1024x1024",
  // @ts-ignore - orq.ai extension
  fallbacks: ["openai/dall-e-3", "openai/dall-e-2"],
});

Audio

The AI Router exposes three OpenAI-compatible audio endpoints. All support fallbacks, load balancing, and retries.

Text to speech

Convert text to audio using POST /v3/router/audio/speech.

Provider	Model
OpenAI	`openai/tts-1`
OpenAI	`openai/tts-1-hd`
OpenAI	`openai/gpt-4o-mini-tts`
ElevenLabs	`elevenlabs/eleven_multilingual_v2`
ElevenLabs	`elevenlabs/eleven_turbo_v2_5`
ElevenLabs	`elevenlabs/eleven_flash_v2_5`
ElevenLabs	`elevenlabs/eleven_flash_v2`
Google AI	`google-ai/gemini-2.5-flash-preview-tts`
Google AI	`google-ai/gemini-2.5-pro-preview-tts`
Vertex AI	`google/gemini-2.5-flash-preview-tts`
Vertex AI	`google/gemini-2.5-pro-preview-tts`

For the full and up-to-date list of TTS models, see Text-to-Speech models on the Supported Models page.

import OpenAI from "openai";
import fs from "fs";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.audio.speech.create({
  model: "openai/tts-1",
  voice: "alloy",
  input: "Hello, welcome to Acme Corp. How can I help you today?",
  response_format: "mp3",
});

const buffer = Buffer.from(await response.arrayBuffer());
fs.writeFileSync("output.mp3", buffer);

Streaming: Process audio chunks in real time as they arrive, useful for low-latency playback pipelines.

const response = await client.audio.speech.create({
  model: "openai/tts-1",
  voice: "alloy",
  input: "Hello, welcome to Acme Corp. How can I help you today?",
  response_format: "pcm",
});

const reader = response.body!.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  processAudioChunk(value);
}

Parameters:

Parameter	Description
`model`	Model ID
`input`	Text to synthesize. Maximum length varies by provider
`voice`	Voice ID. See voices table below
`response_format`	Output format: `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm`. Supported values vary by provider
`speed`	Playback speed of the generated audio

Voices:

Provider	Voices
OpenAI	`alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`
ElevenLabs	`aria`, `roger`, `sarah`, `laura`, `charlie`, `george`, `callum`, `river`, `liam`, `charlotte`, `alice`, `matilda`, `will`, `jessica`, `eric`, `chris`

Transcription

Transcribe an audio file to text using POST /v3/router/audio/transcriptions.

Provider	Model
OpenAI	`openai/whisper-1`
OpenAI	`openai/gpt-4o-transcribe`
OpenAI	`openai/gpt-4o-mini-transcribe`
ElevenLabs	`elevenlabs/scribe_v1`
Groq	`groq/whisper-large-v3`
Groq	`groq/whisper-large-v3-turbo`
Mistral	`mistral/voxtral-mini-2507`
Azure	`azure/whisper`

For the full and up-to-date list of transcription models, see Speech-to-Text models on the Supported Models page.

import OpenAI from "openai";
import fs from "fs";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const transcription = await client.audio.transcriptions.create({
  model: "openai/gpt-4o-transcribe",
  file: fs.createReadStream("meeting.mp3"),
  response_format: "json",
  language: "en",
});

console.log(transcription.text);

Parameters:

Parameter	Description
`model`	Model ID
`file`	Audio file to transcribe. Supported formats: `flac`, `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `ogg`, `wav`, `webm`
`language`	ISO-639-1 language code of the input audio (e.g. `en`, `fr`, `de`)
`prompt`	Optional text to guide the model’s style or continue a previous segment
`response_format`	`json`, `text`, `srt`, `verbose_json`, or `vtt`
`temperature`	Sampling temperature between 0 and 1
`timestamp_granularities`	Array of granularities: `["word"]`, `["segment"]`, or `["word", "segment"]`. Requires `verbose_json`. Not supported by all models
`diarize`	Annotate which speaker is talking in the file. ElevenLabs only
`num_speakers`	Maximum number of speakers to identify. ElevenLabs only
`tag_audio_events`	Tag non-speech events such as `(laughter)` or `(applause)`. ElevenLabs only
`enable_logging`	Set to `false` to disable logging and enable zero data retention

Translation

The OpenAI translation endpoint only supports openai/whisper-1. gpt-4o-transcribe and gpt-4o-mini-transcribe do not support translation.

Transcribe and translate audio to English using POST /v3/router/audio/translations. The output is always in English regardless of the source language.

import OpenAI from "openai";
import fs from "fs";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const translation = await client.audio.translations.create({
  model: "openai/whisper-1",
  file: fs.createReadStream("interview_french.mp3"),
  response_format: "json",
});

console.log(translation.text);

Translation supports the same response_format and temperature parameters as transcription.

Documentation Index

Image input

PDF input

Image generation

Audio

​Image input

​Supported formats

​Detail levels

​Patterns

​Limitations

​PDF input

​Parameters

​Use cases

​Limitations

​Image generation

​Parameters

​Image editing

​Image variations

​Fallbacks and reliability

​Audio

​Text to speech

​Transcription

​Translation

Image input

Supported formats

Detail levels

Patterns

Limitations

PDF input

Parameters

Use cases

Limitations

Image generation

Parameters

Image editing

Image variations

Fallbacks and reliability

Audio

Text to speech

Transcription

Translation