Skip to main content
POST
/
v3
/
router
/
responses
curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "input": "What is the capital of France?"
  }'
{
  "id": "resp_01KP6DFC5FB7K7K10TVP60PF81",
  "object": "response",
  "created_at": 1776184439,
  "completed_at": 1776184439,
  "status": "completed",
  "model": "openai/gpt-4o",
  "output": [
    {
      "type": "message",
      "id": "msg_01KP6DFCG3RF80BEBXP06XX258",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "The capital of France is Paris.",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 83,
    "output_tokens": 54,
    "total_tokens": 137,
    "input_tokens_details": {
      "cached_tokens": 0,
      "cache_creation_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 38
    }
  }
}
This endpoint implements the OpenResponses specification — a multi-provider, interoperable LLM interface. orq.ai extends the spec with platform features like variables, memory, identity, and orq.ai tools. For a comprehensive guide with examples, see the Responses API documentation. For function tool continuation, see the step-by-step guide.
curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "input": "What is the capital of France?"
  }'
{
  "id": "resp_01KP6DFC5FB7K7K10TVP60PF81",
  "object": "response",
  "created_at": 1776184439,
  "completed_at": 1776184439,
  "status": "completed",
  "model": "openai/gpt-4o",
  "output": [
    {
      "type": "message",
      "id": "msg_01KP6DFCG3RF80BEBXP06XX258",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "The capital of France is Paris.",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 83,
    "output_tokens": 54,
    "total_tokens": 137,
    "input_tokens_details": {
      "cached_tokens": 0,
      "cache_creation_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 38
    }
  }
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
conversation
object

Conversation context for multi-turn interactions.

fallbacks
object[] | null

Fallback models to try if the primary model fails. Each entry specifies a model in provider/model format.

frequency_penalty
number<double>

Penalize new tokens based on their frequency in the text so far. Between -2.0 and 2.0.

identity
object

Identity/contact information for the end-user.

input

Input to the model: a string or an array of input items (messages, files, etc.).

instructions
string

System prompt / instructions for the model.

max_output_tokens
integer<int64>

Maximum number of tokens in the response output.

max_tool_calls
integer<int64>

Maximum number of tool call rounds in the agentic loop.

memory
object

Attach a memory store entity to enable persistent memory across requests. See Memory Stores documentation for setup.

metadata
object

Developer-defined key-value pairs attached to the response.

model
string

The model to use in provider/model format (e.g. openai/gpt-4o). Use agent/ to invoke a pre-configured agent from the orq.ai platform.

parallel_tool_calls
boolean

Whether to allow parallel tool calls.

presence_penalty
number<double>

Penalize new tokens based on their presence in the text so far. Between -2.0 and 2.0.

previous_response_id
string

The ID of a previous response to continue from. Requires store to be true (default) on the original response.

prompt_cache_key
string

Key for prompt caching across requests.

reasoning
object

Configure reasoning behavior. Set effort (none, minimal, low, medium, high, xhigh) to control how much the model thinks before answering. Higher effort means more reasoning tokens and better answers for complex tasks, at higher cost.

retry
object

Retry configuration. Specify the number of retries and which HTTP status codes should trigger a retry.

safety_identifier
string

Safety identifier for content filtering.

store
boolean

Whether to persist the response (default: true). When false, the response cannot be retrieved later and previous_response_id will not work for follow-up requests.

stream
boolean

If true, returns a stream of server-sent events.

stream_options
object
temperature
number<double>

Sampling temperature between 0 and 2.

text
object

Configuration for text output.

thread
object

Thread for grouping related requests.

tool_choice

How the model should use the provided tools. Can be a string shorthand or a specific function selector.

Available options:
auto,
none,
required
tools
(Function · object | orq.ai Tool · object)[]

Tools available to the model.

A tool definition. The "type" field determines the tool kind.

top_logprobs
integer<int64>

Number of most likely tokens to return at each position.

top_p
number<double>

Nucleus sampling parameter.

variables
object

Template variables for prompt substitution. Plain values fill {{variable}} placeholders in instructions. For secrets, use {"secret": true, "value": "sensitive-data"} — secrets are automatically passed to platform tools (Python, HTTP, MCP) and redacted from traces.

Response

Returns a response object or a stream of events.

background
boolean
required
completed_at
integer<int64> | null
required
created_at
integer<int64>
required
error
object
required
frequency_penalty
number<double>
required
id
string
required
incomplete_details
object
required
input
any[] | null
required

Array of input items (messages, function call outputs, etc.)

instructions
string | null
required
max_output_tokens
integer<int64> | null
required
max_tool_calls
integer<int64> | null
required
metadata
object
required
model
string
required
object
string
required

Always "response"

output
any[] | null
required

Array of output items (messages, function calls, reasoning, etc.)

parallel_tool_calls
boolean
required
presence_penalty
number<double>
required
previous_response_id
string | null
required
prompt_cache_key
string | null
required
prompt_cache_retention
string | null
required
reasoning
object
required
safety_identifier
string | null
required
service_tier
enum<string>
required
Available options:
auto,
default,
flex,
priority
status
enum<string>
required
Available options:
queued,
in_progress,
completed,
failed,
incomplete,
requires_action
store
boolean
required
temperature
number<double>
required
text
any
required

Text output configuration including format and verbosity

tool_choice
any
required

Tool choice setting: "auto", "none", "required", or a specific function

tools
any[] | null
required

Array of tool configurations used in this response

top_logprobs
integer<int64>
required
top_p
number<double>
required
truncation
enum<string>
required
Available options:
disabled,
auto
usage
object
required
user
string | null
required
conversation
object
memory
object
variables
object