> ## Documentation Index > Fetch the complete documentation index at: https://docs.orq.ai/llms.txt > Use this file to discover all available pages before exploring further. # Create a Deployment > Create Orq.ai Deployments to ship LLM use cases to production. Configure model routing, invoke them via API or SDK, and monitor calls in real time. **Deployments** ship Gen AI use cases to production with **Orq.ai** as an AI Gateway. All calls route through the platform, providing routing, monitoring, and security in one place. Connect with a single line of code, iterate without a code release, and benefit from full observability throughout. Common use cases include customer support bots, RAG-powered document Q\&A, content generation pipelines, and any LLM feature that needs reliable model routing, versioning, and production monitoring. Set up a Deployment with a key, model, and system prompt in AI Studio or via MCP. Set the model, fallbacks, variables, knowledge base, tools, caching, and guardrails per Variant. Route traffic across Variants by environment, context attributes, or percentage split. Deploy and roll back configurations without a code release. Call a Deployment via API or SDK and pass identity, usage tracking, and extra parameters. Monitor requests, filter logs by Variant, and inspect full request details. ## Create a Deployment Choose a [Project](/docs/ai-studio/get-started/projects) and folder, then select the button. Select **Deployment** from the entity picker. Create Deployment dialog with fields for Deployment Key set to key123 and Model set to claude-3-7-sonnet-20250219.

Create Deployment dialog with fields for Deployment Key set to key123 and Model set to claude-3-7-sonnet-20250219.

Set the deployment key (alphanumeric) and select the primary model for the first Variant. The Variant editor opens. Use the [Orq MCP server](/docs/integrations/code-assistants/mcp) to manage deployments directly from an AI code assistant. **Find an existing deployment:** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Search for the "support-bot" deployment in my workspace ``` The assistant uses `search_entities` with `type: "deployment"` to locate deployments by name or key. *** **Retrieve deployment configuration:** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Get the full configuration of the "support-bot" deployment ``` The assistant uses `get_deployment` to return the key, description, model, messages, and variant settings. *** **Create a deployment:** ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}} Create a customer support deployment called "support-bot" in the Default/Deployments project. Use GPT-4o with a professional, concise system prompt. ``` The assistant uses `create_deployment` with the specified `key`, `path`, and `variant` (model and messages). Use `list_models` first to find valid model IDs. ## Configure a Variant **Variants** are different prompt and model configurations available behind one Deployment. A Deployment can hold any number of Variants. On creation, the **Variant** screen opens for model and prompt setup. A Variant Prompt is similar to any other prompt. To learn how to configure a Prompt, see [Creating a Prompt](/docs/ai-studio/prompts/prompts). ### Primary Model, Retries, and Fallback The **Primary Model** panel defines the first model queried through this Variant. **Retries** In case of failure, configure how many times a query is retried with this model. Retries are only triggered when a retry count greater than 0 is configured in the Variant settings. When retries are enabled, **Orq.ai** automatically retries the model provider API call if it returns one of the following HTTP status codes: * 429 Rate Limit Exceeded * 500 Internal Server Error * 501 Not Implemented * 502 Bad Gateway * 503 Service Unavailable **Error handling flow:** 1. If an error code above is returned and retries are configured (retry count > 0), **Orq.ai** retries the Primary Model. 2. If all retry attempts fail (or no retries are configured) AND a Fallback Model is configured, **Orq.ai** routes to the Fallback Model. 3. If the Fallback Model also fails, the error is returned to the calling application. **Fallback Model** The Fallback Model is triggered only if the Primary Model fails after all configured retries are exhausted. Fallback Models can have a different configuration from the Primary Model. Primary Model section showing claude-opus-4-20250514 with Fallback Models configured to gpt-5.2 with reasoning effort, verbosity, and response format settings.

Primary Model section showing claude-opus-4-20250514 with Fallback Models configured to gpt-5.2 with reasoning effort, verbosity, and response format settings.

Multiple fallback models can be configured in a Deployment. They fall back to one another in order of configuration. Use the **Add extra fallback** button to declare another model. See how fallbacks and retries work together in a production system. Read our cookbook [Customer Support Chat](/docs/tutorials/buildingcustomersupportchatwithaigateway). **API invocation behavior** When invoking a Deployment via the API, response timing depends on the retry and fallback configuration: * **Success on first try**: Response returned immediately. * **Retry scenario**: Response may be delayed by up to `base_latency × (retry_count + 1)` to account for the initial attempt plus all configured retries. * **Fallback invoked**: Additional latency as the Fallback Model processes the request. * **All retries and fallback failed**: Error returned to the calling application. Set appropriate timeouts on API calls to account for retry and fallback latency. ### Structured Outputs Configure **structured outputs** to ensure consistent and reliable responses from a Deployment. Structured outputs specify the exact format the model should follow when generating a response. Two modes are available: * **JSON Mode**: the model automatically returns a valid JSON object for every generation. * **JSON Schema**: define a schema that explicitly describes the fields, types, and structure of the model output. Once defined, a schema can be saved to the directory for reuse across multiple variants or deployments. Primary Model settings with Response Format set to JSON Schema, showing a schema selector dropdown with get_weather and json_p3ft options.

Primary Model settings with Response Format set to JSON Schema, showing a schema selector dropdown with get_weather and json_p3ft options.

### Variables and Prompt Templating Reference dynamic values in the prompt using double braces: `{{variable_name}}`. Pass a key-value map to the `inputs` field when invoking and **Orq.ai** substitutes each variable before sending the prompt to the model.