- Choose a Project and Folder and select the
+button. - Choose Deployment

The modal shown during creation of a Deployment. Here you can set a key for the deployment (alphanumeric) and select the primary model of the first Variant used in this deployment. All parameters can be changed later.
Configuring a Variant
Variants are different prompt / model configurations available behind one deployment. A Deployment can hold any number of Variants. For example, you can have different variants dedicated to answering in different languages, routing your Spanish users to Spanish variants and English users to an English one. When you first create a Deployment you will be redirected to the Variant screen, where you can setup and configure your model and prompt.Primary Model, Retries and Fallback
The main model configuration appears in the Primary Model panel. The Primary Model defines the first model that will be queried through this Variant.Retries
A unique configuration is available for a Primary Model: Retries. In case of failure, configure here how many times a query is going to be retried with this model. After the defined amount of Retries on your Primary Model, orq.ai will seamlessly use your Fallback Model.Fallback Model
A common use-case is having a Fallback Model able to handle a bigger context window than the Primary Model. The Fallback Model is potentially more costly than the Primary Model. It will be triggered only if the user query cannot be handled within the Primary Model’s context window. This makes for an optimized configuration: your spending will be controlled depending on user inputs, and your users won’t see any error if their query is sent to your fallback model. The Fallback Model can have a different configuration than the Primary Model.
The Model configuration is right below the Prompt, you can choose any model you desire and configure them independently from one another.
Structured Outputs
When configuring your model, you can now define structured outputs to ensure consistent and reliable responses from your Deployment.Structured outputs let you specify the exact format the model should follow when generating its response, helping you enforce predictable data structures and simplifying integration with downstream systems. You can choose between two structured output modes:
- JSON Mode – the model automatically returns a valid JSON object for every generation.
- JSON Schema – define a schema that explicitly describes the fields, types, and structure of the model output. This provides full control over how responses are formatted.
This makes it easy to maintain consistent response formats across projects and ensures that all variants referencing the schema stay aligned when it’s updated.

Notes
Here you can store notes on the current Variant configuration. Notes are only visible to yourself and your colleagues, they will never be sent to the model. Notes are especially useful when collaborating, to write down some important information that is relevant to current model configuration.Preview
Preview lets you see the configuration payload for the corresponding variant. This payload will be used when you’re using orq.ai as a configuration manager.Integrating a Deployment
Code Snippet
By selecting the Code Snippet button at the top-right of the Variant page, you will see all code snippets to integrate the current variant within your application. You can also generate the code snippet by right-clicking on a variant in the routing tab.
The code snippet button at the top-right of the Variant page.

The Code Snippet panel.
Adding a new Variant to a Deployment
A single Deployment can hold multiple Variants. Multiple Variants can help you handle different use cases and scenarios within one Deployment. Multiple Variants can be used at the same time through Routing, which we will see in the next chapter. At any time you can choose to add a new Variant to your Deployment by selecting the Variant name at the top-left of your screen and choosing Add variant.
At any time you can switch between Variants and add a new Variant to your Deployment.
Routing
Once a variant is ready to be deployed, you need to configure the routing variables to reach the variant. To open the Routing page, select the Routing title at the top-left of the panel. Within the Routing panel, you can configure the contexts and conditions for which a Variant will be reached by your users.Viewing Routing Configuration
Within the Routing panel, you can configure the contexts and conditions for which a variant will be reached by your users. A Variant represents a version of a language model configured for your deployment, it can have its unique model configuration, prompt or parameters. The following is the visualization of the Routing configuration:The table creates correspondence between Variants and Contexts. Each row represents a single Variant. Each column represents a single Context field. Each cell represents a Value for a Context field to be matched with a Variant.

An example for a routing table. Here, for instance, if the context field locale value is german then the deployment will route the user to Variant 2.
Default variant
The first row (0) of your Routing is the default variant for your deployment. If all other rules cannot be matched, or if no context values are provided, the user will be routed to the Variant 0.Code Snippets
You can quickly access code snippets for each of the variants in your routing table. These snippets will contain the correct context environment to reach your Variant. To do so, simply Right-Click on the Variant you want to see code snippets for and select Generate Code Snippets, A pop-up will open containing all necessary Code Snippets for integration within your systems.
Here we opened the code snippet for the Variant 3, you can see the metadata being correctly filled within the snippets.
Adding a new context field
To add a new context field, press the+ button a the top right of the Routing table.
You can set a name for your field as well as type, chosen betweenboolean,date,list,number, andstring

Routing Conditions
You can create a custom routing condition for each field, with any Variant. To create a condition, simply enter a value in the corresponding cell.Condition Operators
By default the= operator will be chosen to test your condition (context value must be equal to the routing condition value).
You can change the operator by clicking on the = symbol, then select the desired operator for your condition.

Different operators are available depending on the field type.
Using the Simulator to test Routing
At any time you can test your routing by opening the Simulator by selecting the Simulator Icon at the top-right of the panel. The following modal will open:Here you can enter values for all of the field configuration and select Simulate to see which model the query will be routed to
Versioning
Version Control lets you track all changes made to your model prompt configuration. At each deployment a new commit is made and history is kept throughout. You are able to look back at all changes and revert to any version you desire.Deploying a new version
When you configuration is complete and ready to be integrated in your systems, you can press the Deploy button on the Deployment’s Variant screen.
The Deploy button will be enabled once there are changes to commit.

Here you can define the new Version for your Variant (with a Major or Minor version change) as well as entering a description to the changes made to the configuration. Choose then to Deploy the changes immediately or Save as Draft of the current changes.
Comparing Changes
To compare changes between your configuration and the previous one, select the Compare Changes button at the top-right of the screen. Visualize changes to a configuration in a side-by-side JSON view.
An example of side-by-side visualization of two versions of the same .
Enabling Cache on a Deployment
Deployment generation can be cached to reduce processing time and cost. When an input is received and cached already within the Deployment, the stored response will be sent back directly without triggering a new generation. To enable caching head to a Deployment > Settings. Select Enable Caching.
Caching happens Deployment-wide and currently doesn't support image models.
Configuring TTL
TTL (time to live) corresponds to the amount of time a cached response will be stored and used before being invalidated. Once invalidated, a new LLM generation will be triggered. You can configure the time-to-live once Caching is enabled choosing from the drop-down.Invalidate Cache
At any time you can choose to invalidate the cache by pressing the Invalidate button.Evaluators and Guardrails
In Deployments Settings it is possible to set Evaluators on the Inputs and Outputs of a Generation.Workflow

Guardrails will be executed synchronously if they exists while Evaluators will be non-blocking to ensure quickest response-time to the user.
Input & Output Evaluators
You can add Evaluators available in your Library as Input Evaluator for a Deployment. When adding an Evaluator here, you can intercept and evaluate asynchronously the Input sent to the configured model or the Ouput generated and sent back to the user. You can configure a Sample Rate (percentage) and define the frequency at which the evaluator will be used.
The Sample Rate goes from 0 (0%) to 100 (100%) rate.
Guardrails
If an Evaluators has a Guardrail capability, it can be used as an Input Guardrail or Output Guardrail in a Deployment. A Guardrail will effectively Deny a generation if its evaluation fails.In this case, an error will be sent back to the user.
Guardrail Behaviour
You can decide here two behaviour in case your Guardrail does not pass on a generation:- Retry the current Generation
- use the Fallback behaviour, if the configured model has a fallback configured, the call will be tried there.
Security and Privacy
Input Masking
In your deployment, when using input, you can decide to flag the created input as a PII (Personally Identifiable Information). This is recommended when processing sensitive data from your users (e.g. name, email, phone number). To do so, when configuring your input, choose the PII (Personally Identifiable Information) from the Privacy drop-down.
Once deployed, the input value won't be logged within our systems.
Effects on logs
Flagging an Input as PII will remove its values being logged within our systems. When opening a log, the input will be shown in red, showing that it wasn’t logged within our systems. This ensures that your client’s data stays private.
Output masking
In Deployments you can add configuration to make generated outputs hidden from logs. Head to the Settings tab and enable the Output masking toggle:
Effect on logs
When Output Masking is enabled, logs won’t store the generated response. The following is displayed in place of the response:
Analytics
Once a Deployment is running and called from within your systems, you’ll be able to see detailed analytics of all requests made.Logs
On top of analytics, you are able to visualize logs for all Variants. You have the following filters available:- Variant to select a single variant to see logs for.
- Evaluation: Matched (Routing rule was matched to a variant) or Default Matched (No routing rule was matched, default variant was chosen).
- Source: API, SDK, or Simulator to identify logs coming from different systems.

Logs overview