Setting up a Deployment
A walkthrough of all functionalities within Deployments.
Within the Deployment module, you can connect your prompt variants to your systems. Deployments handle all integrations, operations, and monitoring.
On this page, we will walk through the ways to create a new Deployment and configure variants, models, and routings.
Create a new Deployment
To create a new deployment, select Create Deployment.
You should see the following modal to configure your initial deployment:
Configuring a Variant
Variants are different model or prompt configurations available behind one deployment. You can have as many variants as needed based on your customization needs. For example, you can have different variants dedicated to answering in different languages, routing your Spanish users to Spanish variants and English users to an English one.
When you first create a Deployment you will be redirected to the Variant screen, where you can setup and configure your model and prompt.
Primary Model
The main model configuration appears in the Primary Model panel. The Primary Model defines the first model that will be queried through this Variant.
By toggling the panel, you can configure all Model Parameters for the language model.
To change the model used, click on the model name to open a list of available models.
Only the models toggled on in the Model Garden will be available
Setting up the right parameters is important. Especially the Max Tokens parameter, since you want to make sure that you allow the model enough tokens to use for the input & output.
Retries and Fallbacks
A unique configuration is available for a Primary Model: Retries.
In case of failure, you can configure how many times a query is going to be retried with this model.
After the defined amount of Retries on your Primary Model, orq.ai will seamlessly use your Fallback Model.
One common use-case is having a Fallback Model able to handle a bigger context window than the Primary Model. The Fallback Model is potentially more costly than the Primary Model. It will be triggered only if the user query cannot be handled within the Primary Model's context window. This makes for an optimized configuration: your spending will be controlled depending on user inputs, and your users won't see any error if their query is sent to your fallback model.
The Fallback Model can have a different configuration than the Primary Model.
Variables
You can use variables within your Prompt Template to make them dynamic.
To add a variable, simply type {{variable_key}}
within a message, a new variable entry will appear within your panel:
When editing your variable configuration, you are able to define the following:
- Privacy, if your variable contains PII (Personal Identifiable Information), you can configure the variable so that orq.ai won't log and retain its value.
- Default value that will be used during generation if no value is given.
Tools
With tools you can use function calling within your LLM call.
Function Calling lets you reliably generate structured output with a language model. This is especially helpful when integrating between your language model and other systems.
To learn more about Function Calling, see Function calling in Deployments
Notes
Here you can store notes on the current Variant configuration. Notes are only visible to yourself and your colleagues, they will never be sent to the model. Notes are especially useful when collaborating, to write down some important information that is relevant to current model configuration.
Preview
Preview lets you see the configuration payload for the corresponding variant. This payload will be used when you're using Orq.ai as a configuration manager.
Prompt
On the right side of the Variant screen, you can configure the prompts for your model. At least one message is required to prepare your variant for deployment.
Add Message to Prompt
Here, you can enter a message that the model will receive before generating responses. To add more messages, select the Add Message button.
You can set a role for the message. The following are the available roles.
Role | Description | Example |
---|---|---|
System | A guideline or context for the language model, directing how it should interpret and respond to requests. | "You are an expert botanist. Respond briefly to questions with one-line answers." |
User | An actual query posed by the user. | "Which plants thrive in shady environments?" |
Assistant | Responses to user queries by the language model. | "Ferns, Hostas, and Hydrangeas are some plants that thrive in shady environments." |
Prompt Generator
You can choose to use AI to generate your prompt, to learn more see Prompt Generator.
Tokens and Cost
Above your messages, you will be able to see the estimated number of tokens and costs for each generation. Token count and cost are calculated using the provided default prompt variables. These tokens and costs are only calculated for the input. After the LLM call is executed, the full token count and costs will be shown in the Logs.
Opening Prompt in Playground
At any time, you can choose to open your current prompt configuration within the Playground. This lets you test the exact same configuration in an offline environment.
To do so, select the Open Playground button at the top-right of the panel.
Deploying a Variant
Variant Versioning
Deploying a new version of a Variant updates your Model configuration. All changes are tracked through Version control.
To learn more see Prompt version control
Integrating a Deployment
Code Snippet
By selecting the Code Snippet button at the top-right of the Variant page, you will see all code snippets to integrate the current variant within your application. You can also generate the code snippet by right-clicking on a variant in the routing tab.
Python, Node and cURL (shell script) are available for integration. All snippets will contain keys and context variables needed for the current variant to be reached.
To learn more see Integrating a Deployment
Adding a new Variant to a Deployment
A single Deployment can hold multiple Variants.
Multiple Variants can help you handle different use cases and scenarios within one Deployment. Multiple Variants can be used at the same time through Routing, which we will see in the next chapter.
At any time you can choose to add a new Variant to your Deployment by selecting the Variant name at the top-left of your screen and choosing Add variant.
Routing
Once a variant is ready to be deployed, you need to configure the routing variables to reach the variant. To open the Routing page, select the Routing title at the top-left of the panel.
Within the Routing panel, you can configure the contexts and conditions for which a Variant will be reached by your users.
To learn more, see Routing with the Business Rules Engine.
Analytics
Once a Deployment is running and called from within your systems, you'll be able to see detailed analytics of all requests made.
Here you will be able to see metrics for requests coming into all your variants, including cost, latency (P95, P99), and error rate.
You can select a specific variant to see metrics for by using the Variant
drop-down menu at the top-left of the page. You can also select a time window, which defaults to 30 days.
Logs
On top of analytics, you are able to visualize logs for all Variants.
You have the following filters available:
- Variant to select a single variant to see logs for.
- Evaluation: Matched (Routing rule was matched to a variant) or Default Matched (No routing rule was matched, default variant was chosen).
- Source: API, SDK, or Simulator to identify logs coming from different systems.
You can view details for a single log by clicking on a log line. This opens a panel containing all the details for the log, including context, requests, and parameters sent to your Deployment.
Updated 3 months ago