- Choose a Project and Folder and select the
+button. - Choose Deployment

Configuring a Variant
Variants are different prompt / model configurations available behind one deployment. A Deployment can hold any number of Variants. For example, you can have different variants dedicated to answering in different languages, routing your Spanish users to Spanish variants and English users to an English one. When you first create a Deployment you will be redirected to the Variant screen, where you can setup and configure your model and prompt.A Variant Prompt is similar to any other prompt, to learn how to configure a Prompt, see Creating a Prompt.
Primary Model, Retries and Fallback
The main model configuration appears in the Primary Model panel. The Primary Model defines the first model that will be queried through this Variant.Retries
A unique configuration is available for a Primary Model: Retries. In case of failure, configure here how many times a query is going to be retried with this model. After the defined amount of Retries on your Primary Model, orq.ai will seamlessly use your Fallback Model.Fallback Model
A common use-case is having a Fallback Model able to handle a bigger context window than the Primary Model. The Fallback Model is potentially more costly than the Primary Model. It will be triggered only if the user query cannot be handled within the Primary Model’s context window. This makes for an optimized configuration: your spending will be controlled depending on user inputs, and your users won’t see any error if their query is sent to your fallback model. The Fallback Model can have a different configuration than the Primary Model.
Structured Outputs
When configuring your model, you can now define structured outputs to ensure consistent and reliable responses from your Deployment.Structured outputs let you specify the exact format the model should follow when generating its response, helping you enforce predictable data structures and simplifying integration with downstream systems. You can choose between two structured output modes:
- JSON Mode – the model automatically returns a valid JSON object for every generation.
- JSON Schema – define a schema that explicitly describes the fields, types, and structure of the model output. This provides full control over how responses are formatted.
This makes it easy to maintain consistent response formats across projects and ensures that all variants referencing the schema stay aligned when it’s updated.

Using Knowledge Base in Deployment
You can ground your deployment’s responses in domain-specific knowledge by adding a Knowledge Base. To add a Knowledge Base to your deployment, open the Knowledge Base tab in the Configuration screen and select Knowledge Base.Knowledge Bases enable RAG (Retrieval-Augmented Generation), allowing your model to retrieve and use relevant information from your documentation or data sources to provide more accurate and contextual responses.
Configuration Options
When editing a Knowledge Base using the... menu, you can choose between two query types:
- Last User Message – The user’s latest message is automatically used as a query to retrieve relevant chunks from the Knowledge Base
- Query – A predefined query is used to retrieve chunks. You can also use Input Variables like
{{query}}to make it dynamic at runtime

{{knowledge_base_key}} syntax where knowledge_base_key is the identifier of your Knowledge Base. If the Knowledge Base is not explicitly referenced in the prompt, the retrieved chunks are automatically appended to the end of the system message.

To learn more about creating and configuring Knowledge Bases, see Creating a Knowledge Base.
Using Tools in Deployments
Tools can only be added and configured at the deployment level. In Orq, only Function tools are supported in deployments, enabling your model to call external functions during execution. To add a Function tool to a deployment, open the Tools tab in the deployment configuration and click Tool. You can then choose to:- Create a new Tool – Define a custom function directly within the deployment
- Import an existing Tool – Select a previously created Function tool from your Resource library
Function tools enable structured function calling, allowing the model to invoke predefined business logic based on its responses.

To learn more about creating Function tools, see Creating Tools.
Integrating a Deployment
Code Snippet
By selecting the Code Snippet button at the top-right of the Variant page, you will see all code snippets to integrate the current variant within your application. You can also generate the code snippet by right-clicking on a variant in the routing tab.

Adding a new Variant to a Deployment
A single Deployment can hold multiple Variants. Multiple Variants can help you handle different use cases and scenarios within one Deployment. Multiple Variants can be used at the same time through Routing, which we will see in the next chapter. At any time you can choose to add a new Variant to your Deployment by selecting the Variant name at the top-left of your screen and choosing Add variant.
Routing
Once a variant is ready to be deployed, you need to configure the routing variables to reach the variant. To open the Routing page, select the Routing title at the top-left of the panel. Within the Routing panel, you can configure the contexts and conditions for which a Variant will be reached by your users.Viewing Routing Configuration
Within the Routing panel, you can configure the contexts and conditions for which a variant will be reached by your users. A Variant represents a version of a language model configured for your deployment, it can have its unique model configuration, prompt or parameters. The following is the visualization of the Routing configuration:The table creates correspondence between Variants and Contexts. Each row represents a single Variant. Each column represents a single Context field. Each cell represents a Value for a Context field to be matched with a Variant.

Default variant
The first row (0) of your Routing is the default variant for your deployment. If all other rules cannot be matched, or if no context values are provided, the user will be routed to the Variant 0.Code Snippets
You can quickly access code snippets for each of the variants in your routing table. These snippets will contain the correct context environment to reach your Variant. To do so, simply Right-Click on the Variant you want to see code snippets for and select Generate Code Snippets, A pop-up will open containing all necessary Code Snippets for integration within your systems.
Adding a new context field
To add a new context field, press the+ button a the top right of the Routing table.
You can set a name for your field as well as type, chosen betweenboolean,date,list,number, andstring

Routing Conditions
You can create a custom routing condition for each field, with any Variant. To create a condition, simply enter a value in the corresponding cell.Condition Operators
By default the= operator will be chosen to test your condition (context value must be equal to the routing condition value).
You can change the operator by clicking on the = symbol, then select the desired operator for your condition.

Using the Simulator to test Routing
At any time you can test your routing by opening the Simulator by selecting the Simulator Icon at the top-right of the panel. The following modal will open:Here you can enter values for all of the field configuration and select Simulate to see which model the query will be routed to
Versioning
Version Control lets you track all changes made to your model prompt configuration. At each deployment a new commit is made and history is kept throughout. You are able to look back at all changes and revert to any version you desire.Deploying a new version
When you configuration is complete and ready to be integrated in your systems, you can press the Deploy button on the Deployment’s Variant screen.

Comparing Changes
To compare changes between your configuration and the previous one, select the Compare Changes button at the top-right of the screen. Visualize changes to a configuration in a side-by-side JSON view.
Enabling Cache on a Deployment
Deployment generation can be cached to reduce processing time and cost. When an input is received and cached already within the Deployment, the stored response will be sent back directly without triggering a new generation. To enable caching head to a Deployment > Settings. Select Enable Caching.
The cache only works when there is an exact match
Configuring TTL
TTL (time to live) corresponds to the amount of time a cached response will be stored and used before being invalidated. Once invalidated, a new LLM generation will be triggered. You can configure the time-to-live once Caching is enabled choosing from the drop-down.Invalidate Cache
At any time you can choose to invalidate the cache by pressing the Invalidate button.Evaluators and Guardrails
In Deployments Settings it is possible to set Evaluators on the Inputs and Outputs of a Generation.Workflow

Input & Output Evaluators
You can add Evaluators available in your Library as Input Evaluator for a Deployment. When adding an Evaluator here, you can intercept and evaluate asynchronously the Input sent to the configured model or the Ouput generated and sent back to the user. You can configure a Sample Rate (percentage) and define the frequency at which the evaluator will be used.
Guardrails
If an Evaluators has a Guardrail capability, it can be used as an Input Guardrail or Output Guardrail in a Deployment. A Guardrail will effectively Deny a generation if its evaluation fails.In this case, an error will be sent back to the user.
Guardrail Behaviour
You can decide here two behaviour in case your Guardrail does not pass on a generation:- Retry the current Generation
- use the Fallback behaviour, if the configured model has a fallback configured, the call will be tried there.
Security and Privacy
Input Masking
In your deployment, when using input, you can decide to flag the created input as a PII (Personally Identifiable Information). This is recommended when processing sensitive data from your users (e.g. name, email, phone number). To do so, when configuring your input, choose the Personally Identifiable Information (PII) from the Privacy drop-down.
Effects on logs
Flagging an Input as PII will remove its values being logged within our systems. When opening a log, the input will be shown in red, showing that it wasn’t logged within our systems. This ensures that your client’s data stays private.
Output masking
In Deployments you can add configuration to make generated outputs hidden from logs. Head to the Settings tab and enable the Output masking toggle:
Effect on logs
When Output Masking is enabled, logs won’t store the generated response. The following is displayed in place of the response:
Analytics
Once a Deployment is running and called from within your systems, you’ll be able to see detailed analytics of all requests made.To learn more about Analytics, see Deployment Analytics.
Logs
On top of analytics, you are able to visualize logs for all Variants. You have the following filters available:- Variant to select a single variant to see logs for.
- Evaluation: Matched (Routing rule was matched to a variant) or Default Matched (No routing rule was matched, default variant was chosen).
- Source: API, SDK, or Simulator to identify logs coming from different systems.
