Create response
Creates a model response for the given input. Returns a response object or a stream of server-sent events.
This endpoint implements the OpenResponses specification — a multi-provider, interoperable LLM interface. orq.ai extends the spec with platform features like variables, memory, identity, and orq.ai tools. For a comprehensive guide with examples, see the Responses API documentation. For function tool continuation, see the step-by-step guide.Documentation Index
Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
Conversation context for multi-turn interactions.
Fallback models to try if the primary model fails. Each entry specifies a model in provider/model format.
Penalize new tokens based on their frequency in the text so far. Between -2.0 and 2.0.
Identity/contact information for the end-user.
Input to the model: a string or an array of input items (messages, files, etc.).
System prompt / instructions for the model.
Bound agent-loop execution. Fields: max_iterations (LLM turns), max_execution_time (seconds), max_cost (USD; send 0 to disable a manifest-configured cap), max_depth (sub-agent nesting), tool_timeout (seconds). Body values override agent-manifest defaults.
Maximum number of tokens in the response output.
Maximum number of tool call rounds in the agentic loop.
Attach a memory store entity to enable persistent memory across requests. See Memory Stores documentation for setup.
Developer-defined key-value pairs attached to the response (OpenAI spec: Map<string, string>). Non-string values are rejected with a 400.
The model to use in provider/model format (e.g. openai/gpt-4o). Use agent/ to invoke a pre-configured agent from the orq.ai platform.
Whether to allow parallel tool calls.
Penalize new tokens based on their presence in the text so far. Between -2.0 and 2.0.
The ID of a previous response to continue from. Requires store to be true (default) on the original response.
Key for prompt caching across requests.
Configure reasoning behavior. Set effort (none, minimal, low, medium, high, xhigh) to control how much the model thinks before answering. Higher effort means more reasoning tokens and better answers for complex tasks, at higher cost.
Retry configuration. Specify the number of retries and which HTTP status codes should trigger a retry.
Safety identifier for content filtering.
Whether to persist the response (default: true). When false, the response cannot be retrieved later and previous_response_id will not work for follow-up requests.
If true, returns a stream of server-sent events.
Sampling temperature between 0 and 2.
Template engine for variable substitution in instructions. Defaults to the agent manifest's engine when invoking an agent, otherwise text.
text, jinja, mustache Configuration for text output.
Thread for grouping related requests.
How the model should use the provided tools. Can be a string shorthand or a specific function selector.
auto, none, required Tools available to the model.
A tool definition. The "type" field determines the tool kind.
- Function
- orq.ai Tool
- MCP Tool
Number of most likely tokens to return at each position.
Nucleus sampling parameter.
Template variables for prompt substitution. Plain values fill {{variable}} placeholders in instructions. For secrets, use {"secret": true, "value": "sensitive-data"} — secrets are automatically passed to platform tools (Python, HTTP, MCP) and redacted from traces.
Response
Returns a response object or a stream of events.
Array of input items (messages, function call outputs, etc.)
Developer-defined key-value pairs attached to the response (OpenAI spec: Map<string, string>).
Always "response"
Array of output items (messages, function calls, reasoning, etc.)
auto, default, flex, priority queued, in_progress, completed, failed, incomplete, requires_action Text output configuration including format and verbosity
Tool choice setting: "auto", "none", "required", or a specific function
Array of tool configurations used in this response
disabled, auto