Creating a Deployment

Creating a Deployment of a model to production

To create an Deployment, head to your Panel:

  • Choose a Project and Folder and select the + button.
  • Choose Deployment

You should see the following modal to configure your initial deployment:

The modal shown during creation of a Deployment. Here you can set a key for the playground (alphanumeric), choose which [domain](doc:domains) it belongs to, and select the primary model to use within this deployment. All parameters can be changed later.

The modal shown during creation of a Deployment. Here you can set a key for the deployment (alphanumeric) and select the primary model of the first Variant used in this deployment. All parameters can be changed later.

Configuring a Variant

Variants are different prompt / model configurations available behind one deployment.

A Deployment can hold any number of Variants.

For example, you can have different variants dedicated to answering in different languages, routing your Spanish users to Spanish variants and English users to an English one.

When you first create a Deployment you will be redirected to the Variant screen, where you can setup and configure your model and prompt.


A Variant Prompt is similar to any other prompt, to learn how to configure a Prompt, see Creating a Prompt.

Primary Model, Retries and Fallback

The main model configuration appears in the Primary Model panel. The Primary Model defines the first model that will be queried through this Variant.


A unique configuration is available for a Primary Model: Retries.

In case of failure, configure here how many times a query is going to be retried with this model.

After the defined amount of Retries on your Primary Model, will seamlessly use your Fallback Model.

Fallback Model

A common use-case is having a Fallback Model able to handle a bigger context window than the Primary Model.

The Fallback Model is potentially more costly than the Primary Model. It will be triggered only if the user query cannot be handled within the Primary Model's context window. This makes for an optimized configuration: your spending will be controlled depending on user inputs, and your users won't see any error if their query is sent to your fallback model.

The Fallback Model can have a different configuration than the Primary Model.

The **Fallback Model** configuration is right below the **Primary Model**, you can choose any model you desire and configure them independently from one another.

The Fallback Model configuration is right below the Primary Model, you can choose any model you desire and configure them independently from one another.


Here you can store notes on the current Variant configuration. Notes are only visible to yourself and your colleagues, they will never be sent to the model. Notes are especially useful when collaborating, to write down some important information that is relevant to current model configuration.


Preview lets you see the configuration payload for the corresponding variant. This payload will be used when you're using as a configuration manager.

Deploying a Variant

Deploying a new version of a Variant updates your Model configuration. All changes are tracked through Version control.


To learn more see Deployment Versioning

Integrating a Deployment

Code Snippet

By selecting the Code Snippet button at the top-right of the Variant page, you will see all code snippets to integrate the current variant within your application. You can also generate the code snippet by right-clicking on a variant in the routing tab.

The code snippet button at the top-right of the Variant page.

The code snippet button at the top-right of the Variant page.

The Code Snippet panel.

The Code Snippet panel.

Python, Node and cURL (shell script) are available for integration. All snippets will contain keys and context variables needed for the current variant to be reached.


To learn more see Integrating a Deployment

Adding a new Variant to a Deployment

A single Deployment can hold multiple Variants.

Multiple Variants can help you handle different use cases and scenarios within one Deployment. Multiple Variants can be used at the same time through Routing, which we will see in the next chapter.

At any time you can choose to add a new Variant to your Deployment by selecting the Variant name at the top-left of your screen and choosing Add variant.

At any time you can switch between Variants and add a new Variant to your Deployment.

At any time you can switch between Variants and add a new Variant to your Deployment.


Once a variant is ready to be deployed, you need to configure the routing variables to reach the variant. To open the Routing page, select the Routing title at the top-left of the panel.

Within the Routing panel, you can configure the contexts and conditions for which a Variant will be reached by your users.


To learn more, see Deployment Routing.


Once a Deployment is running and called from within your systems, you'll be able to see detailed analytics of all requests made.

Here you will be able to see metrics for requests coming into all your variants, including cost, latency (P95, P99), and error rate.

You can select a specific variant to see metrics for by using the Variant drop-down menu at the top-left of the page. You can also select a time window, which defaults to 30 days.


On top of analytics, you are able to visualize logs for all Variants.

You have the following filters available:

  • Variant to select a single variant to see logs for.
  • Evaluation: Matched (Routing rule was matched to a variant) or Default Matched (No routing rule was matched, default variant was chosen).
  • Source: API, SDK, or Simulator to identify logs coming from different systems.

You can view details for a single log by clicking on a log line. This opens a panel containing all the details for the log, including context, requests, and parameters sent to your Deployment.

Logs overview

Logs overview