Orq MCP is live: Use natural language to interrogate traces, spot regressions, and experiment your way to optimal AI configurations. Available in Claude Desktop, Claude Code, Cursor, and more. Start now →
Creates a model response for the given input. Returns a response object or a stream of server-sent events.
curl -X POST https://api.orq.ai/v3/router/responses \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"input": "What is the capital of France?"
}'
{
"id": "resp_01KP6DFC5FB7K7K10TVP60PF81",
"object": "response",
"created_at": 1776184439,
"completed_at": 1776184439,
"status": "completed",
"model": "openai/gpt-4o",
"output": [
{
"type": "message",
"id": "msg_01KP6DFCG3RF80BEBXP06XX258",
"role": "assistant",
"status": "completed",
"content": [
{
"type": "output_text",
"text": "The capital of France is Paris.",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 83,
"output_tokens": 54,
"total_tokens": 137,
"input_tokens_details": {
"cached_tokens": 0,
"cache_creation_tokens": 0
},
"output_tokens_details": {
"reasoning_tokens": 38
}
}
}
This endpoint implements the OpenResponses specification — a multi-provider, interoperable LLM interface. orq.ai extends the spec with platform features like variables, memory, identity, and orq.ai tools. For a comprehensive guide with examples, see the Responses API documentation. For function tool continuation, see the step-by-step guide.Documentation Index
Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
Use this file to discover all available pages before exploring further.
curl -X POST https://api.orq.ai/v3/router/responses \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"input": "What is the capital of France?"
}'
{
"id": "resp_01KP6DFC5FB7K7K10TVP60PF81",
"object": "response",
"created_at": 1776184439,
"completed_at": 1776184439,
"status": "completed",
"model": "openai/gpt-4o",
"output": [
{
"type": "message",
"id": "msg_01KP6DFCG3RF80BEBXP06XX258",
"role": "assistant",
"status": "completed",
"content": [
{
"type": "output_text",
"text": "The capital of France is Paris.",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 83,
"output_tokens": 54,
"total_tokens": 137,
"input_tokens_details": {
"cached_tokens": 0,
"cache_creation_tokens": 0
},
"output_tokens_details": {
"reasoning_tokens": 38
}
}
}
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Conversation context for multi-turn interactions.
Show child attributes
Fallback models to try if the primary model fails. Each entry specifies a model in provider/model format.
Show child attributes
Penalize new tokens based on their frequency in the text so far. Between -2.0 and 2.0.
Identity/contact information for the end-user.
Show child attributes
Input to the model: a string or an array of input items (messages, files, etc.).
System prompt / instructions for the model.
Bound agent-loop execution. Fields: max_iterations (LLM turns), max_execution_time (seconds), max_cost (USD; send 0 to disable a manifest-configured cap), max_depth (sub-agent nesting), tool_timeout (seconds). Body values override agent-manifest defaults.
Show child attributes
Maximum number of tokens in the response output.
Maximum number of tool call rounds in the agentic loop.
Attach a memory store entity to enable persistent memory across requests. See Memory Stores documentation for setup.
Show child attributes
Developer-defined key-value pairs attached to the response (OpenAI spec: Map<string, string>). Non-string values are rejected with a 400.
Show child attributes
The model to use in provider/model format (e.g. openai/gpt-4o). Use agent/ to invoke a pre-configured agent from the orq.ai platform.
Whether to allow parallel tool calls.
Penalize new tokens based on their presence in the text so far. Between -2.0 and 2.0.
The ID of a previous response to continue from. Requires store to be true (default) on the original response.
Key for prompt caching across requests.
Configure reasoning behavior. Set effort (none, minimal, low, medium, high, xhigh) to control how much the model thinks before answering. Higher effort means more reasoning tokens and better answers for complex tasks, at higher cost.
Show child attributes
Retry configuration. Specify the number of retries and which HTTP status codes should trigger a retry.
Show child attributes
Safety identifier for content filtering.
Whether to persist the response (default: true). When false, the response cannot be retrieved later and previous_response_id will not work for follow-up requests.
If true, returns a stream of server-sent events.
Show child attributes
Sampling temperature between 0 and 2.
Template engine for variable substitution in instructions. Defaults to the agent manifest's engine when invoking an agent, otherwise text.
text, jinja, mustache Configuration for text output.
Show child attributes
Thread for grouping related requests.
Show child attributes
How the model should use the provided tools. Can be a string shorthand or a specific function selector.
auto, none, required Tools available to the model.
A tool definition. The "type" field determines the tool kind.
Show child attributes
Number of most likely tokens to return at each position.
Nucleus sampling parameter.
Template variables for prompt substitution. Plain values fill {{variable}} placeholders in instructions. For secrets, use {"secret": true, "value": "sensitive-data"} — secrets are automatically passed to platform tools (Python, HTTP, MCP) and redacted from traces.
Show child attributes
Returns a response object or a stream of events.
Show child attributes
Show child attributes
Array of input items (messages, function call outputs, etc.)
Developer-defined key-value pairs attached to the response (OpenAI spec: Map<string, string>).
Show child attributes
Always "response"
Array of output items (messages, function calls, reasoning, etc.)
Show child attributes
auto, default, flex, priority queued, in_progress, completed, failed, incomplete, requires_action Text output configuration including format and verbosity
Tool choice setting: "auto", "none", "required", or a specific function
Array of tool configurations used in this response
disabled, auto Show child attributes
Show child attributes
Show child attributes
Show child attributes
Was this page helpful?
curl -X POST https://api.orq.ai/v3/router/responses \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"input": "What is the capital of France?"
}'
{
"id": "resp_01KP6DFC5FB7K7K10TVP60PF81",
"object": "response",
"created_at": 1776184439,
"completed_at": 1776184439,
"status": "completed",
"model": "openai/gpt-4o",
"output": [
{
"type": "message",
"id": "msg_01KP6DFCG3RF80BEBXP06XX258",
"role": "assistant",
"status": "completed",
"content": [
{
"type": "output_text",
"text": "The capital of France is Paris.",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 83,
"output_tokens": 54,
"total_tokens": 137,
"input_tokens_details": {
"cached_tokens": 0,
"cache_creation_tokens": 0
},
"output_tokens_details": {
"reasoning_tokens": 38
}
}
}