Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

Create chat completion

curl --request POST \
  --url https://api.orq.ai/v2/router/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "messages": [
    {
      "role": "system",
      "content": "<string>",
      "name": "<string>"
    }
  ],
  "model": "<string>",
  "metadata": {},
  "name": "<string>",
  "audio": {
    "voice": "alloy",
    "format": "wav"
  },
  "frequency_penalty": 123,
  "max_tokens": 123,
  "max_completion_tokens": 123,
  "logprobs": true,
  "top_logprobs": 10,
  "n": 2,
  "presence_penalty": 123,
  "response_format": {
    "type": "text"
  },
  "reasoning_effort": "none",
  "verbosity": "<string>",
  "seed": 123,
  "stop": "<string>",
  "stream_options": {
    "include_usage": true
  },
  "thinking": {
    "type": "disabled"
  },
  "temperature": 1,
  "top_p": 0.5,
  "top_k": 123,
  "tools": [
    {
      "function": {
        "name": "<string>",
        "description": "<string>",
        "parameters": {
          "type": "object",
          "properties": {},
          "required": [
            "<string>"
          ],
          "additionalProperties": true
        },
        "strict": true
      },
      "type": "function"
    }
  ],
  "tool_choice": "none",
  "parallel_tool_calls": true,
  "modalities": [
    "text"
  ],
  "guardrails": [
    {
      "id": "orq_pii_detection",
      "execute_on": "input"
    }
  ],
  "fallbacks": [
    {
      "model": "openai/gpt-4o-mini"
    }
  ],
  "retry": {
    "count": 3,
    "on_codes": [
      429,
      500,
      502,
      503,
      504
    ]
  },
  "cache": {
    "type": "exact_match",
    "ttl": 3600
  },
  "load_balancer": {
    "type": "weight_based",
    "models": {
      "type": "weight_based",
      "models": [
        {
          "model": "openai/gpt-4o",
          "weight": 0.7
        },
        {
          "model": "anthropic/claude-3-5-sonnet",
          "weight": 0.3
        }
      ]
    }
  },
  "timeout": {
    "call_timeout": 30000
  },
  "variables": {
    "customer_name": "John Smith",
    "product_name": "Premium Plan"
  },
  "orq": {
    "retry": {
      "count": 3,
      "on_codes": [
        429,
        500,
        502
      ]
    },
    "fallbacks": [
      {
        "model": "openai/gpt-5"
      },
      {
        "model": "anthropic/claude-4-opus"
      }
    ],
    "identity": {
      "id": "identity_01ARZ3NDEKTSV4RRFFQ69G5FAV",
      "display_name": "Jane Doe",
      "email": "jane.doe@example.com"
    },
    "thread": {
      "id": "thread_01ARZ3NDEKTSV4RRFFQ69G5FAV",
      "tags": [
        "customer-support"
      ]
    },
    "inputs": {
      "customer_name": "John Smith",
      "issue_type": "billing"
    },
    "cache": {
      "ttl": 3600,
      "type": "exact_match"
    },
    "knowledge_bases": [
      {
        "knowledge_id": "knowledge_01ARZ3NDEKTSV4RRFFQ69G5FAV",
        "top_k": 5
      }
    ],
    "timeout": {
      "call_timeout": 30000
    }
  },
  "stream": false
}
'

{
  "id": "<string>",
  "choices": [
    {
      "finish_reason": "stop",
      "message": {
        "content": "<string>",
        "refusal": "<string>",
        "tool_calls": [
          {
            "index": 123,
            "id": "<string>",
            "type": "function",
            "function": {
              "name": "<string>",
              "arguments": "<string>"
            },
            "thought_signature": "<string>"
          }
        ],
        "role": "assistant",
        "reasoning": "<string>",
        "reasoning_signature": "<string>",
        "redacted_reasoning": "<string>",
        "audio": {
          "id": "<string>",
          "expires_at": 123,
          "data": "<string>",
          "transcript": "<string>"
        }
      },
      "index": 0,
      "logprobs": {
        "content": [
          {
            "token": "<string>",
            "logprob": 123,
            "bytes": [
              123
            ],
            "top_logprobs": [
              {
                "token": "<string>",
                "logprob": 123,
                "bytes": [
                  123
                ]
              }
            ]
          }
        ],
        "refusal": [
          {
            "token": "<string>",
            "logprob": 123,
            "bytes": [
              123
            ],
            "top_logprobs": [
              {
                "token": "<string>",
                "logprob": 123,
                "bytes": [
                  123
                ]
              }
            ]
          }
        ]
      }
    }
  ],
  "created": 123,
  "model": "<string>",
  "object": "chat.completion",
  "system_fingerprint": "<string>",
  "usage": {
    "completion_tokens": 123,
    "prompt_tokens": 123,
    "total_tokens": 123,
    "prompt_tokens_details": {
      "cached_tokens": 123,
      "cache_creation_tokens": 123,
      "audio_tokens": 123
    },
    "completion_tokens_details": {
      "reasoning_tokens": 123,
      "accepted_prediction_tokens": 123,
      "rejected_prediction_tokens": 123,
      "audio_tokens": 123
    }
  }
}

POST

router

chat

completions

Create chat completion

curl --request POST \
  --url https://api.orq.ai/v2/router/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "messages": [
    {
      "role": "system",
      "content": "<string>",
      "name": "<string>"
    }
  ],
  "model": "<string>",
  "metadata": {},
  "name": "<string>",
  "audio": {
    "voice": "alloy",
    "format": "wav"
  },
  "frequency_penalty": 123,
  "max_tokens": 123,
  "max_completion_tokens": 123,
  "logprobs": true,
  "top_logprobs": 10,
  "n": 2,
  "presence_penalty": 123,
  "response_format": {
    "type": "text"
  },
  "reasoning_effort": "none",
  "verbosity": "<string>",
  "seed": 123,
  "stop": "<string>",
  "stream_options": {
    "include_usage": true
  },
  "thinking": {
    "type": "disabled"
  },
  "temperature": 1,
  "top_p": 0.5,
  "top_k": 123,
  "tools": [
    {
      "function": {
        "name": "<string>",
        "description": "<string>",
        "parameters": {
          "type": "object",
          "properties": {},
          "required": [
            "<string>"
          ],
          "additionalProperties": true
        },
        "strict": true
      },
      "type": "function"
    }
  ],
  "tool_choice": "none",
  "parallel_tool_calls": true,
  "modalities": [
    "text"
  ],
  "guardrails": [
    {
      "id": "orq_pii_detection",
      "execute_on": "input"
    }
  ],
  "fallbacks": [
    {
      "model": "openai/gpt-4o-mini"
    }
  ],
  "retry": {
    "count": 3,
    "on_codes": [
      429,
      500,
      502,
      503,
      504
    ]
  },
  "cache": {
    "type": "exact_match",
    "ttl": 3600
  },
  "load_balancer": {
    "type": "weight_based",
    "models": {
      "type": "weight_based",
      "models": [
        {
          "model": "openai/gpt-4o",
          "weight": 0.7
        },
        {
          "model": "anthropic/claude-3-5-sonnet",
          "weight": 0.3
        }
      ]
    }
  },
  "timeout": {
    "call_timeout": 30000
  },
  "variables": {
    "customer_name": "John Smith",
    "product_name": "Premium Plan"
  },
  "orq": {
    "retry": {
      "count": 3,
      "on_codes": [
        429,
        500,
        502
      ]
    },
    "fallbacks": [
      {
        "model": "openai/gpt-5"
      },
      {
        "model": "anthropic/claude-4-opus"
      }
    ],
    "identity": {
      "id": "identity_01ARZ3NDEKTSV4RRFFQ69G5FAV",
      "display_name": "Jane Doe",
      "email": "jane.doe@example.com"
    },
    "thread": {
      "id": "thread_01ARZ3NDEKTSV4RRFFQ69G5FAV",
      "tags": [
        "customer-support"
      ]
    },
    "inputs": {
      "customer_name": "John Smith",
      "issue_type": "billing"
    },
    "cache": {
      "ttl": 3600,
      "type": "exact_match"
    },
    "knowledge_bases": [
      {
        "knowledge_id": "knowledge_01ARZ3NDEKTSV4RRFFQ69G5FAV",
        "top_k": 5
      }
    ],
    "timeout": {
      "call_timeout": 30000
    }
  },
  "stream": false
}
'

{
  "id": "<string>",
  "choices": [
    {
      "finish_reason": "stop",
      "message": {
        "content": "<string>",
        "refusal": "<string>",
        "tool_calls": [
          {
            "index": 123,
            "id": "<string>",
            "type": "function",
            "function": {
              "name": "<string>",
              "arguments": "<string>"
            },
            "thought_signature": "<string>"
          }
        ],
        "role": "assistant",
        "reasoning": "<string>",
        "reasoning_signature": "<string>",
        "redacted_reasoning": "<string>",
        "audio": {
          "id": "<string>",
          "expires_at": 123,
          "data": "<string>",
          "transcript": "<string>"
        }
      },
      "index": 0,
      "logprobs": {
        "content": [
          {
            "token": "<string>",
            "logprob": 123,
            "bytes": [
              123
            ],
            "top_logprobs": [
              {
                "token": "<string>",
                "logprob": 123,
                "bytes": [
                  123
                ]
              }
            ]
          }
        ],
        "refusal": [
          {
            "token": "<string>",
            "logprob": 123,
            "bytes": [
              123
            ],
            "top_logprobs": [
              {
                "token": "<string>",
                "logprob": 123,
                "bytes": [
                  123
                ]
              }
            ]
          }
        ]
      }
    }
  ],
  "created": 123,
  "model": "<string>",
  "object": "chat.completion",
  "system_fingerprint": "<string>",
  "usage": {
    "completion_tokens": 123,
    "prompt_tokens": 123,
    "total_tokens": 123,
    "prompt_tokens_details": {
      "cached_tokens": 123,
      "cache_creation_tokens": 123,
      "audio_tokens": 123
    },
    "completion_tokens_details": {
      "reasoning_tokens": 123,
      "accepted_prediction_tokens": 123,
      "rejected_prediction_tokens": 123,
      "audio_tokens": 123
    }
  }
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

messages

(System message · object | Developer message · object | User message · object | Assistant message · object | Tool message · object)[]

required

A list of messages comprising the conversation so far.

Developer-provided instructions that the model should follow, regardless of messages sent by the user.

System message
Developer message
User message
Assistant message
Tool message

Show child attributes

model

string

required

Model ID used to generate the response, like openai/gpt-4o or anthropic/claude-haiku-4-5-20251001. The AI Gateway offers a wide range of models with different capabilities, performance characteristics, and price points. Refer to the (Supported models)[/docs/proxy/supported-models] to browse available models.

metadata

object

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can have a maximum length of 64 characters and values can have a maximum length of 512 characters.

Show child attributes

name

string

The name to display on the trace. If not specified, the default system name will be used.

audio

object

Parameters for audio output. Required when audio output is requested with modalities: ["audio"]. Learn more.

Show child attributes

frequency_penalty

number | null

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

max_tokens

integer | null

[Deprecated]. The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

This value is now deprecated in favor of max_completion_tokens, and is not compatible with o1 series models.

max_completion_tokens

integer | null

An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens

logprobs

boolean | null

Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.

top_logprobs

integer | null

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

Required range: 0 <= x <= 20

integer | null

How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

Required range: x >= 1

presence_penalty

number | null

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

response_format

Text · object

An object specifying the format that the model must output

Text
JSON object
JSON schema

Show child attributes

reasoning_effort

enum<string>

Constrains effort on reasoning for reasoning models. Currently supported values are none, minimal, low, medium, high, and xhigh. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

gpt-5.1 defaults to none, which does not perform reasoning. The supported reasoning values for gpt-5.1 are none, low, medium, and high. Tool calls are supported for all reasoning values in gpt-5.1.
All models before gpt-5.1 default to medium reasoning effort, and do not support none.
The gpt-5-pro model defaults to (and only supports) high reasoning effort.
xhigh is currently only supported for gpt-5.1-codex-max.

Any of "none", "minimal", "low", "medium", "high", "xhigh".

Available options:

none,

minimal,

low,

medium,

high,

xhigh

verbosity

string

Adjusts response verbosity. Lower levels yield shorter answers.

seed

number | null

If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

stop

Up to 4 sequences where the API will stop generating further tokens.

stream_options

object

Options for streaming response. Only set this when you set stream: true.

Show child attributes

thinking

Thinking config disabled · object

Disables the thinking mode capability

Thinking config disabled
Thinking config enabled

Show child attributes

temperature

number | null

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Required range: 0 <= x <= 2

top_p

number | null

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

Required range: 0 <= x <= 1

top_k

number | null

Limits the model to consider only the top k most likely tokens at each step.

tools

object[]

A list of tools the model may call.

Show child attributes

tool_choice

Controls which (if any) tool is called by the model.

Available options:

none,

auto,

required

parallel_tool_calls

boolean

Whether to enable parallel function calling during tool use.

modalities

enum<string>[] | null

Output types that you would like the model to generate. Most models are capable of generating text, which is the default: ["text"]. The gpt-4o-audio-preview model can also be used to generate audio. To request that this model generate both text and audio responses, you can use: ["text", "audio"].

Available options:

text,

audio

guardrails

object[]

A list of guardrails to apply to the request.

Show child attributes

fallbacks

object[]

Array of fallback models to use if primary model fails

Show child attributes

retry

object

Retry configuration for the request

Show child attributes

cache

object

Cache configuration for the request.

Show child attributes

load_balancer

object

Load balancer configuration for the request.

Show child attributes

Example:

{
  "type": "weight_based",
  "models": {
    "type": "weight_based",
    "models": [
      { "model": "openai/gpt-4o", "weight": 0.7 },
      {
        "model": "anthropic/claude-3-5-sonnet",
        "weight": 0.3
      }
    ]
  }
}

timeout

object

Timeout configuration to apply to the request. If the request exceeds the timeout, it will be retried or fallback to the next model if configured.

Show child attributes

variables

object

Variables to substitute in message templates. Uses f-string syntax ({{variableName}}) by default. For advanced templating with Jinja or Mustache syntax, use in conjunction with template_engine.

Show child attributes

Example:

{
  "customer_name": "John Smith",
  "product_name": "Premium Plan"
}

orq

object

deprecated

Leverage Orq's intelligent routing capabilities to enhance your AI application with enterprise-grade reliability and observability. Orq provides automatic request management including retries on failures, model fallbacks for high availability, identity-level analytics tracking, conversation threading, and dynamic prompt templating with variable substitution.

Show child attributes

Example:

{
  "retry": { "count": 3, "on_codes": [429, 500, 502] },
  "fallbacks": [
    { "model": "openai/gpt-5" },
    { "model": "anthropic/claude-4-opus" }
  ],
  "identity": {
    "id": "identity_01ARZ3NDEKTSV4RRFFQ69G5FAV",
    "display_name": "Jane Doe",
    "email": "jane.doe@example.com"
  },
  "thread": {
    "id": "thread_01ARZ3NDEKTSV4RRFFQ69G5FAV",
    "tags": ["customer-support"]
  },
  "inputs": {
    "customer_name": "John Smith",
    "issue_type": "billing"
  },
  "cache": { "ttl": 3600, "type": "exact_match" },
  "knowledge_bases": [
    {
      "knowledge_id": "knowledge_01ARZ3NDEKTSV4RRFFQ69G5FAV",
      "top_k": 5
    }
  ],
  "timeout": { "call_timeout": 30000 }
}

stream

boolean

default:false

Response

Returns a chat completion object, or a streamed sequence of chat completion chunk objects if the request is streamed.

Represents a chat completion response returned by model, based on the provided input.

string

required

A unique identifier for the chat completion.

choices

object[]

required

A list of chat completion choices. Can be more than one if n is greater than 1.

Show child attributes

created

number

required

The Unix timestamp (in seconds) of when the chat completion was created.

model

string

required

The model used for the chat completion.

object

enum<string>

required

Available options:

chat.completion

system_fingerprint

string | null

This fingerprint represents the backend configuration that the model runs with.

usage

object

Usage statistics for the completion request.

Show child attributes

Create translation

Create response

⌘I

Access & Security

AI & Execution

AI Router Features

API Reference

Create chat completion

Authorizations

Body

Response