Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

Create response

curl --request POST \
  --url https://api.orq.ai/v2/router/responses \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "input": "<string>",
  "metadata": {},
  "temperature": 1,
  "top_p": 0.5,
  "previous_response_id": "<string>",
  "instructions": "<string>",
  "reasoning": {
    "effort": "low"
  },
  "max_output_tokens": 123,
  "text": {
    "format": {
      "type": "text"
    }
  },
  "include": [
    "code_interpreter_call.outputs"
  ],
  "parallel_tool_calls": true,
  "store": true,
  "service_tier": "auto",
  "tools": [
    {
      "type": "function",
      "name": "<string>",
      "parameters": {
        "type": "object",
        "properties": {},
        "required": [
          "<string>"
        ],
        "additionalProperties": true
      },
      "description": "<string>",
      "strict": true
    }
  ],
  "tool_choice": "none",
  "stream": false
}
'

{
  "id": "<string>",
  "object": "response",
  "created_at": 123,
  "status": "completed",
  "error": {
    "code": "<string>",
    "message": "<string>"
  },
  "incomplete_details": {
    "reason": "max_output_tokens"
  },
  "model": "<string>",
  "output": [
    {
      "id": "<string>",
      "type": "message",
      "role": "assistant",
      "status": "in_progress",
      "content": []
    }
  ],
  "parallel_tool_calls": true,
  "instructions": "<string>",
  "output_text": "<string>",
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123,
    "total_tokens": 123,
    "input_tokens_details": {
      "cached_tokens": 123
    },
    "output_tokens_details": {
      "reasoning_tokens": 123,
      "accepted_prediction_tokens": 123,
      "rejected_prediction_tokens": 123
    }
  },
  "temperature": 123,
  "top_p": 123,
  "max_output_tokens": 123,
  "previous_response_id": "<string>",
  "metadata": {},
  "tool_choice": "none",
  "tools": [
    {
      "type": "function",
      "name": "<string>",
      "parameters": {
        "type": "object",
        "properties": {},
        "required": [
          "<string>"
        ],
        "additionalProperties": true
      },
      "description": "<string>",
      "strict": true
    }
  ],
  "reasoning": {
    "effort": "<string>",
    "summary": "<string>"
  },
  "store": true,
  "text": {
    "format": {
      "type": "text"
    }
  },
  "truncation": "disabled",
  "user": "<string>",
  "service_tier": "auto",
  "background": true,
  "top_logprobs": 10,
  "logprobs": true
}

POST

router

responses

Create response

curl --request POST \
  --url https://api.orq.ai/v2/router/responses \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "input": "<string>",
  "metadata": {},
  "temperature": 1,
  "top_p": 0.5,
  "previous_response_id": "<string>",
  "instructions": "<string>",
  "reasoning": {
    "effort": "low"
  },
  "max_output_tokens": 123,
  "text": {
    "format": {
      "type": "text"
    }
  },
  "include": [
    "code_interpreter_call.outputs"
  ],
  "parallel_tool_calls": true,
  "store": true,
  "service_tier": "auto",
  "tools": [
    {
      "type": "function",
      "name": "<string>",
      "parameters": {
        "type": "object",
        "properties": {},
        "required": [
          "<string>"
        ],
        "additionalProperties": true
      },
      "description": "<string>",
      "strict": true
    }
  ],
  "tool_choice": "none",
  "stream": false
}
'

{
  "id": "<string>",
  "object": "response",
  "created_at": 123,
  "status": "completed",
  "error": {
    "code": "<string>",
    "message": "<string>"
  },
  "incomplete_details": {
    "reason": "max_output_tokens"
  },
  "model": "<string>",
  "output": [
    {
      "id": "<string>",
      "type": "message",
      "role": "assistant",
      "status": "in_progress",
      "content": []
    }
  ],
  "parallel_tool_calls": true,
  "instructions": "<string>",
  "output_text": "<string>",
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123,
    "total_tokens": 123,
    "input_tokens_details": {
      "cached_tokens": 123
    },
    "output_tokens_details": {
      "reasoning_tokens": 123,
      "accepted_prediction_tokens": 123,
      "rejected_prediction_tokens": 123
    }
  },
  "temperature": 123,
  "top_p": 123,
  "max_output_tokens": 123,
  "previous_response_id": "<string>",
  "metadata": {},
  "tool_choice": "none",
  "tools": [
    {
      "type": "function",
      "name": "<string>",
      "parameters": {
        "type": "object",
        "properties": {},
        "required": [
          "<string>"
        ],
        "additionalProperties": true
      },
      "description": "<string>",
      "strict": true
    }
  ],
  "reasoning": {
    "effort": "<string>",
    "summary": "<string>"
  },
  "store": true,
  "text": {
    "format": {
      "type": "text"
    }
  },
  "truncation": "disabled",
  "user": "<string>",
  "service_tier": "auto",
  "background": true,
  "top_logprobs": 10,
  "logprobs": true
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

model

string

required

ID of the model to use. You can use the List models API to see all of your available models.

input

required

The actual user input(s) for the model. Can be a simple string, or an array of structured input items (messages, tool outputs) representing a conversation history or complex input.

metadata

object

Developer-defined key-value pairs that will be included in response objects

Show child attributes

temperature

number | null

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Required range: 0 <= x <= 2

top_p

number | null

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

Required range: 0 <= x <= 1

previous_response_id

string | null

The ID of a previous response to continue the conversation from. The model will have access to the previous response context.

instructions

string | null

Developer-provided instructions that the model should follow. Overwrites the default system message.

reasoning

object

Configuration for reasoning models

Show child attributes

max_output_tokens

integer | null

The maximum number of tokens that can be generated in the response

text

object

Show child attributes

include

enum<string>[] | null

Specifies which (potentially large) fields to include in the response. By default, the results of Code Interpreter and file searches are excluded. Available options:

code_interpreter_call.outputs: Include the outputs of Code Interpreter tool calls
computer_call_output.output.image_url: Include the image URLs from computer use tool calls
file_search_call.results: Include the results of file search tool calls
message.input_image.image_url: Include URLs of input images
message.output_text.logprobs: Include log probabilities for output text (when logprobs is enabled)
reasoning.encrypted_content: Include encrypted reasoning content for reasoning models

Available options:

code_interpreter_call.outputs,

computer_call_output.output.image_url,

file_search_call.results,

message.input_image.image_url,

message.output_text.logprobs,

reasoning.encrypted_content

parallel_tool_calls

boolean | null

Whether to enable parallel function calling during tool use.

store

boolean | null

default:true

Whether to store this response for use in distillations or evals.

service_tier

enum<string> | null

Specifies the latency tier to use for processing the request. Defaults to "auto".

Available options:

auto,

default,

flex,

priority,

null

tools

object[]

A list of tools the model may call. Use this to provide a list of functions the model may generate JSON inputs for.

A function tool definition

Option 1
Option 2
Option 3

Show child attributes

tool_choice

How the model should select which tool (or tools) to use when generating a response. Can be a string (none, auto, required) or an object to force a specific tool.

Available options:

none,

auto,

required

stream

boolean

default:false

Response

Returns a response object or a stream of events.

Represents the completed model response returned when stream is false

string

required

The unique identifier for the response

object

enum<string>

required

The object type, which is always "response"

Available options:

response

created_at

number

required

The Unix timestamp (in seconds) of when the response was created

status

enum<string>

required

The status of the response

Available options:

completed,

failed,

in_progress,

incomplete

error

object

required

The error that occurred, if any

Show child attributes

incomplete_details

object

required

Details about why the response is incomplete

Show child attributes

model

string

required

The model used to generate the response

output

object[]

required

The list of output items generated by the model

An assistant message output

Option 1
Option 2
Option 3
Option 4

Show child attributes

parallel_tool_calls

boolean

required

instructions

string | null

The instructions provided for the response

output_text

string | null

A convenience field with the concatenated text from all text content parts

usage

object

Usage statistics for the response

Show child attributes

temperature

number | null

top_p

number | null

max_output_tokens

integer | null

previous_response_id

string | null

metadata

object

Show child attributes

tool_choice

Controls which (if any) tool is called by the model

Available options:

none,

auto,

required

tools

object[]

A function tool definition

Option 1
Option 2
Option 3

Show child attributes

reasoning

object

Show child attributes

store

boolean

text

object

Show child attributes

truncation

enum<string> | null

default:disabled

Controls how the model handles inputs longer than the maximum token length

Available options:

auto,

disabled,

null

user

string | null

A unique identifier representing your end-user

service_tier

enum<string> | null

The service tier used for processing the request

Available options:

auto,

default,

flex,

priority,

null

background

boolean | null

Whether the response was processed in the background

top_logprobs

integer | null

The number of top log probabilities to return for each output token

Required range: 0 <= x <= 20

logprobs

boolean | null

Whether to return log probabilities of the output tokens

Create chat completion

Create completion

⌘I

AI & Execution

Access & Security

AI Router Features

API Reference

Create response

Authorizations

Body

Response