> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Create a Deployment

> Create Orq.ai Deployments to ship LLM use cases to production. Configure model routing, invoke them via API or SDK, and monitor calls in real time.

**Deployments** ship Gen AI use cases to production with **Orq.ai** as an AI Gateway. All calls route through the platform, providing routing, monitoring, and security in one place. Connect with a single line of code, iterate without a code release, and benefit from full observability throughout.

Common use cases include customer support bots, RAG-powered document Q\&A, content generation pipelines, and any LLM feature that needs reliable model routing, versioning, and production monitoring.

<CardGroup cols={3}>
  <Card title="Create" icon="rocket" href="#create-a-deployment">
    Set up a Deployment with a key, model, and system prompt in AI Studio or via MCP.
  </Card>

  <Card title="Configure" icon="sliders" href="#configure-a-variant">
    Set the model, fallbacks, variables, knowledge base, tools, caching, and guardrails per Variant.
  </Card>

  <Card title="Routing" icon="code-fork" href="#routing">
    Route traffic across Variants by environment, context attributes, or percentage split.
  </Card>

  <Card title="Versioning" icon="code-branch" href="#versioning">
    Deploy and roll back configurations without a code release.
  </Card>

  <Card title="Invoke" icon="code" href="#invoke-a-deployment">
    Call a Deployment via API or SDK and pass identity, usage tracking, and extra parameters.
  </Card>

  <Card title="Analytics" icon="chart-line" href="#analytics-and-logs">
    Monitor requests, filter logs by Variant, and inspect full request details.
  </Card>
</CardGroup>

## Create a Deployment

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    <Steps>
      <Step title="Open the AI Studio">
        Choose a [Project](/docs/ai-studio/get-started/projects) and folder, then select the <kbd><Icon icon="plus" /></kbd> button.
      </Step>

      <Step title="Choose Deployment">
        Select **Deployment** from the entity picker.

        <Frame caption="Configure the deployment key and primary model. All settings can be changed later.">
          <img src="https://mintcdn.com/orqai/E8L3R46ivX7g9-QI/images/docs/e01c6d159a6f92fbb93734e5fb0dc220e1a6723eb92d1457eec46f8e55e78adc-iScreen_Shoter_-_Google_Chrome_-_250307122502.jpg?fit=max&auto=format&n=E8L3R46ivX7g9-QI&q=85&s=ead029bf4f4bae9ff7fde53b77ce1dd7" alt="Create Deployment dialog with fields for Deployment Key set to key123 and Model set to claude-3-7-sonnet-20250219." width="1068" height="607" data-path="images/docs/e01c6d159a6f92fbb93734e5fb0dc220e1a6723eb92d1457eec46f8e55e78adc-iScreen_Shoter_-_Google_Chrome_-_250307122502.jpg" />
        </Frame>
      </Step>

      <Step title="Configure the initial Variant">
        Set the deployment key (alphanumeric) and select the primary model for the first Variant. The Variant editor opens.
      </Step>
    </Steps>
  </Tab>

  <Tab title="MCP" icon="https://mintcdn.com/orqai/i7ZhKI7LFRfXU7ox/images/logos/mcp.svg?fit=max&auto=format&n=i7ZhKI7LFRfXU7ox&q=85&s=cef7916eb5fe1f6bb97541398d3f7639" width="16" height="16" data-path="images/logos/mcp.svg">
    Use the [Orq MCP server](/docs/integrations/code-assistants/mcp) to manage deployments directly from an AI code assistant.

    **Find an existing deployment:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Search for the "support-bot" deployment in my workspace
    ```

    The assistant uses `search_entities` with `type: "deployment"` to locate deployments by name or key.

    ***

    **Retrieve deployment configuration:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Get the full configuration of the "support-bot" deployment
    ```

    The assistant uses `get_deployment` to return the key, description, model, messages, and variant settings.

    ***

    **Create a deployment:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create a customer support deployment called "support-bot" in the Default/Deployments project. Use GPT-4o with a professional, concise system prompt.
    ```

    The assistant uses `create_deployment` with the specified `key`, `path`, and `variant` (model and messages). Use `list_models` first to find valid model IDs.
  </Tab>
</Tabs>

## Configure a Variant

**Variants** are different prompt and model configurations available behind one Deployment. A Deployment can hold any number of Variants.

On creation, the **Variant** screen opens for model and prompt setup.

<Info>
  A Variant Prompt is similar to any other prompt. To learn how to configure a Prompt, see [Creating a Prompt](/docs/ai-studio/prompts/prompts).
</Info>

### Primary Model, Retries, and Fallback

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    The **Primary Model** panel defines the first model queried through this Variant.

    **Retries**

    In case of failure, configure how many times a query is retried with this model.

    <Info>
      Retries are only triggered when a retry count greater than 0 is configured in the Variant settings.

      When retries are enabled, **Orq.ai** automatically retries the model provider API call if it returns one of the following HTTP status codes:

      * 429 Rate Limit Exceeded
      * 500 Internal Server Error
      * 501 Not Implemented
      * 502 Bad Gateway
      * 503 Service Unavailable
    </Info>

    **Error handling flow:**

    1. If an error code above is returned and retries are configured (retry count > 0), **Orq.ai** retries the Primary Model.
    2. If all retry attempts fail (or no retries are configured) AND a Fallback Model is configured, **Orq.ai** routes to the Fallback Model.
    3. If the Fallback Model also fails, the error is returned to the calling application.

    **Fallback Model**

    The Fallback Model is triggered only if the Primary Model fails after all configured retries are exhausted. Fallback Models can have a different configuration from the Primary Model.

    <Frame caption="The Fallback Model configuration is right below the main model configuration. Configure them independently.">
      <img src="https://mintcdn.com/orqai/HI0EZ1zMxSbxMGnn/images/fallback-model.png?fit=max&auto=format&n=HI0EZ1zMxSbxMGnn&q=85&s=80f6fbef01891992fb00345d745b998d" alt="Primary Model section showing claude-opus-4-20250514 with Fallback Models configured to gpt-5.2 with reasoning effort, verbosity, and response format settings." width="626" height="452" data-path="images/fallback-model.png" />
    </Frame>

    <Tip>
      Multiple fallback models can be configured in a Deployment. They fall back to one another in order of configuration. Use the **Add extra fallback** button to declare another model.
    </Tip>

    <Callout icon="hat-chef" color="#7ecece">
      See how fallbacks and retries work together in a production system. Read our cookbook [Customer Support Chat](/docs/tutorials/buildingcustomersupportchatwithaigateway).
    </Callout>
  </Tab>
</Tabs>

**API invocation behavior**

When invoking a Deployment via the API, response timing depends on the retry and fallback configuration:

* **Success on first try**: Response returned immediately.
* **Retry scenario**: Response may be delayed by up to `base_latency × (retry_count + 1)` to account for the initial attempt plus all configured retries.
* **Fallback invoked**: Additional latency as the Fallback Model processes the request.
* **All retries and fallback failed**: Error returned to the calling application.

Set appropriate timeouts on API calls to account for retry and fallback latency.

### Structured Outputs

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Configure **structured outputs** to ensure consistent and reliable responses from a Deployment. Structured outputs specify the exact format the model should follow when generating a response.

    Two modes are available:

    * **JSON Mode**: the model automatically returns a valid JSON object for every generation.
    * **JSON Schema**: define a schema that explicitly describes the fields, types, and structure of the model output.

    Once defined, a schema can be saved to the directory for reuse across multiple variants or deployments.

    <Frame caption="Set the Response Format to JSON Schema to configure structured output.">
      <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-json-schema-configuration.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=f0cef30df85bffdfe6209f96a911e1dd" alt="Primary Model settings with Response Format set to JSON Schema, showing a schema selector dropdown with get_weather and json_p3ft options." width="591" height="304" data-path="images/deployment-json-schema-configuration.png" />
    </Frame>
  </Tab>
</Tabs>

### Variables and Prompt Templating

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Reference dynamic values in the prompt using double braces: `{{variable_name}}`. Pass a key-value map to the `inputs` field when invoking and **Orq.ai** substitutes each variable before sending the prompt to the model.

    <div className="max-w-md">
      <iframe src="https://www.youtube.com/embed/lpXHUBAd-eU" title="How to Use Variables and Fallback Models in Orq.ai Deployments" frameborder="0" className="w-full aspect-video rounded-xl" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen />
    </div>

    **Orq.ai** supports three template engines. Select the **Template Engine** from the Variant Settings panel:

    * **Text** (default): variables use `{{double_braces}}` syntax.
    * **Jinja**: full templating with conditionals, loops, filters, and more.
    * **Mustache**: logic-less templating with sections.

    <Frame caption="Select a Template Engine in the Variant Settings panel.">
      <img src="https://mintcdn.com/orqai/NVptt7X89f0374MS/images/template-engine.png?fit=max&auto=format&n=NVptt7X89f0374MS&q=85&s=91e6906e1ad39dab6684a3e2080b74db" alt="Template Engine dropdown with Text currently selected and options for Jinja and Mustache." width="649" height="254" data-path="images/template-engine.png" />
    </Frame>

    **Example: support bot that adapts by subscription tier**

    <AccordionGroup>
      <Accordion title="Jinja example" icon="code">
        <Steps>
          <Step title="Prompt template">
            ```jinja Jinja theme={"theme":{"light":"github-light","dark":"github-dark"}}
            You are a support assistant for {{company_name}}.

            {% if user_tier == "premium" %}
            {{customer_name}} is a premium customer. Greet them by name and let them know they have priority support with a 2-hour response SLA.
            {% else %}
            {{customer_name}} is on the free plan. Let them know the standard response time is 24 hours.
            {% endif %}
            ```
          </Step>

          <Step title="Template in the Studio">
            <Frame>
              <img src="https://mintcdn.com/orqai/HVm7-3vBg7cwVv2-/images/jinja-studio.png?fit=max&auto=format&n=HVm7-3vBg7cwVv2-&q=85&s=8762ea2c3ebdcb5f539ec314a09fad8f" alt="System prompt in the Studio editor showing a Jinja template with if/else blocks for premium and free tier customers using is_premium, customer_name, and company_name variables." width="819" height="337" data-path="images/jinja-studio.png" />
            </Frame>
          </Step>

          <Step title="Call the deployment">
            <CodeGroup>
              ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
              response = client.deployments.invoke(
                  key="support-bot",
                  inputs={
                      "company_name": "Acme",
                      "customer_name": "Sarah",
                      "user_tier": "premium",
                  }
              )
              ```

              ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
              const response = await client.deployments.invoke({
                key: "support-bot",
                inputs: {
                  company_name: "Acme",
                  customer_name: "Sarah",
                  user_tier: "premium",
                },
              });
              ```
            </CodeGroup>
          </Step>

          <Step title="Trace">
            <Frame>
              <img src="https://mintcdn.com/orqai/HVm7-3vBg7cwVv2-/images/jinja-studio-trace.png?fit=max&auto=format&n=HVm7-3vBg7cwVv2-&q=85&s=652b936523154b0ddf2f59d86bcc2cda" alt="Trace view showing a rendered Jinja template for gpt-3.5-turbo with company_name set to Acme, customer_name to Sarah, and is_premium to true, generating a priority support greeting." width="1019" height="748" data-path="images/jinja-studio-trace.png" />
            </Frame>
          </Step>
        </Steps>
      </Accordion>

      <Accordion title="Mustache example" icon="code">
        <Steps>
          <Step title="Prompt template">
            ```handlebars Mustache theme={"theme":{"light":"github-light","dark":"github-dark"}}
            You are a support assistant for {{company_name}}.

            {{! Pass is_premium: true for premium customers, false for free plan }}
            {{# is_premium}}
            {{customer_name}} is a premium customer. Greet them by name with priority support and a 2-hour SLA.
            {{/ is_premium}}
            {{^ is_premium}}
            {{customer_name}} is on the free plan. Standard response time is 24 hours.
            {{/ is_premium}}
            ```
          </Step>

          <Step title="Template in the Studio">
            <Frame>
              <img src="https://mintcdn.com/orqai/vUxywKg0A2hpKhUw/images/mustache-studio.png?fit=max&auto=format&n=vUxywKg0A2hpKhUw&q=85&s=d55d988a9c8bf51003a165c85f70f618" alt="System prompt in the Studio editor showing a Mustache template with {{#is_premium}} and {{^is_premium}} sections for premium and free plan customers." width="1075" height="550" data-path="images/mustache-studio.png" />
            </Frame>
          </Step>

          <Step title="Call the deployment">
            <CodeGroup>
              ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
              response = client.deployments.invoke(
                  key="support-bot",
                  inputs={
                      "company_name": "Acme",
                      "customer_name": "Sarah",
                      "is_premium": True,
                  }
              )
              ```

              ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
              const response = await client.deployments.invoke({
                key: "support-bot",
                inputs: {
                  company_name: "Acme",
                  customer_name: "Sarah",
                  is_premium: true,
                },
              });
              ```
            </CodeGroup>
          </Step>

          <Step title="Trace">
            <Frame>
              <img src="https://mintcdn.com/orqai/vUxywKg0A2hpKhUw/images/mustache-studio-trace.png?fit=max&auto=format&n=vUxywKg0A2hpKhUw&q=85&s=0fe50b725aa551f9f6b431808f3ece04" alt="Trace view showing a rendered Mustache template for gpt-3.5-turbo with company_name set to Acme, customer_name to Sarah, and is_premium to true, with the assistant greeting Sarah as a premium customer." width="1018" height="671" data-path="images/mustache-studio-trace.png" />
            </Frame>
          </Step>
        </Steps>
      </Accordion>
    </AccordionGroup>
  </Tab>

  <Tab title="API & SDK" icon="code">
    Add `{{variable_name}}` placeholders to the prompt and pass the corresponding values in the `inputs` field at invoke time. **Orq.ai** substitutes each key before sending the prompt to the model.

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
        --url https://api.orq.ai/v2/deployments/invoke \
        --header 'Authorization: Bearer <ORQ_API_KEY>' \
        --header 'Content-Type: application/json' \
        --data '{
          "key": "my-deployment",
          "context": {"environments": "production"},
          "inputs": {
            "customer_name": "John Smith",
            "user_tier": "premium"
          }
        }'
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="my-deployment",
          context={"environments": "production"},
          inputs={
              "customer_name": "John Smith",
              "user_tier": "premium",
          },
      )

      print(generation.choices[0].message.content)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const generation = await client.deployments.invoke({
        key: 'my-deployment',
        context: { environments: 'production' },
        inputs: {
          customer_name: 'John Smith',
          user_tier: 'premium',
        },
      });

      console.log(generation.choices[0].message.content);
      ```
    </CodeGroup>
  </Tab>
</Tabs>

<Info>
  For a complete reference of all template features including filters, macros, nested objects, and more, see [Prompt Templating](/docs/ai-studio/prompts/prompt-templating).
</Info>

<Info>
  To prevent sensitive input values from appearing in traces and logs, see [Security and Privacy](/docs/ai-studio/ai-engineering/deployments#security-and-privacy).
</Info>

### Knowledge Base

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Ground a Deployment's responses in domain-specific knowledge by adding a [Knowledge Base](/docs/ai-studio/ai-engineering/knowledge-bases-memory-stores).

    Open the deployment configuration, go to **Knowledge Bases**, then select <kbd className="key"><Icon icon="circle-plus" color="#fff" /> Knowledge Base</kbd>.

    <Info>
      Knowledge Bases enable RAG (Retrieval-Augmented Generation), allowing the model to retrieve and use relevant information from documentation or data sources to provide more accurate and contextual responses.
    </Info>

    **Configuration options** (via the `...` menu on an attached Knowledge Base):

    * **Last User Message**: the user's latest message is automatically used as a query to retrieve relevant chunks.
    * **Query**: a predefined query is used to retrieve chunks. Use Input Variables like `{{query}}` to make it dynamic at runtime.

    <Frame caption="Configure which Knowledge Base to use and how it should be queried in the deployment.">
      <img src="https://mintcdn.com/orqai/mkK-RgpJxyAg_Wxr/images/knowledge-deployment-config.png?fit=max&auto=format&n=mkK-RgpJxyAg_Wxr&q=85&s=2e16bf5933b1dac3c9d6865a60cb3148" alt="Edit Knowledge Base dialog with Knowledge Base set to knowledge and Type set to Last User Message." width="566" height="354" data-path="images/knowledge-deployment-config.png" />
    </Frame>

    <Info>
      To learn more about creating and configuring Knowledge Bases, see [Knowledge Bases](/docs/ai-studio/ai-engineering/knowledge-bases-memory-stores).
    </Info>

    Reference the Knowledge Base in the prompt using the `{{knowledge_base_key}}` syntax, where `knowledge_base_key` is the identifier of the Knowledge Base. If the Knowledge Base is not explicitly referenced in the prompt, retrieved chunks are automatically appended to the end of the system message.

    <Frame caption="Using a Knowledge Base in a prompt.">
      <img src="https://mintcdn.com/orqai/ffoLsHE_4rLFpEJ6/images/knowledge-deployment-use.png?fit=max&auto=format&n=ffoLsHE_4rLFpEJ6&q=85&s=628a972376853efaf126474f91fbf783" alt="Deployment settings showing a Knowledge Base named knowledge in the settings panel, with the {knowledge} variable highlighted in the system prompt." width="1454" height="761" data-path="images/knowledge-deployment-use.png" />
    </Frame>

    <Callout icon="hat-chef" color="#7ecece">
      See knowledge base retrieval used end-to-end in a working deployment. Read our cookbook [Multilingual FAQ Bot](/docs/tutorials/multilingual-faq-bot).
    </Callout>
  </Tab>

  <Tab title="API & SDK" icon="code">
    When invoking a Deployment that uses a Knowledge Base, set `include_retrievals: true` in `invoke_options` to embed the retrieval chunks in the response.

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --location 'https://api.orq.ai/v2/deployments/invoke' \
      --header 'Content-Type: application/json' \
      --header 'Accept: application/json' \
      --header 'Authorization: Bearer xxxxx' \
      --data '{
          "key": "deployment_key",
          "messages": [
              {
                  "role": "user",
                  "content": ""
              }
          ],
          "invoke_options": {
              "include_retrievals": true
          }
      }'
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="deployment_key",
          messages=[
            {
              "role": "user",
              "content": ""
            }
          ],
          invoke_options={"include_retrievals": True}
      )

      print(generation.choices[0].message.content)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const deployment = await client.deployments.invoke({
         key: "deployment_key",
         messages: [
           {
             "role": "user",
             "content": ""
           }
         ],
        invokeOptions: { includeRetrievals: true },
      });
      ```
    </CodeGroup>

    Retrievals are returned in the `retrievals` field of the response. Each chunk includes source details and scores:

    <Frame caption="Retrieval results are embedded in the retrieval_metadata field, containing document details, file metadata, and search scores.">
      <img src="https://mintcdn.com/orqai/x_6IXnot9ETOc_0g/images/docs/618f20adfcf1bf3fe7f8bb74d695ea9376ee7530445fe689d0de294de2b6e1a7-690fef20-7a3a-4b17-a444-e46decc3958f.png?fit=max&auto=format&n=x_6IXnot9ETOc_0g&q=85&s=6a96fefaab8b58ee2d90682a07619216" alt="API response in a REST client showing the retrievals array with document metadata including file names, file type, page number, and search score for apple_annual_report_2023.pdf." width="4096" height="1878" data-path="images/docs/618f20adfcf1bf3fe7f8bb74d695ea9376ee7530445fe689d0de294de2b6e1a7-690fef20-7a3a-4b17-a444-e46decc3958f.png" />
    </Frame>

    ```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
    {
        "retrievals": [
            {
                "document": "<chunk_data>",
                "metadata": {
                    "file_name": "<filename>",
                    "file_type": "application/pdf",
                    "page_number": 24,
                    "search_score": 0.7886787056922913,
                    "rerank_score": 0.19868536
                }
            }
        ]
    }
    ```

    <Callout icon="hat-chef" color="#7ecece">
      See knowledge base retrievals wired into a complete application. Read our cookbook [Multilingual FAQ Bot](/docs/tutorials/multilingual-faq-bot).
    </Callout>
  </Tab>
</Tabs>

### Tools

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Tools can only be added and configured at the **deployment** level. Only **Function tools** are supported in Deployments, enabling the model to call external functions during execution.

    To add a Function tool, open the **Tools** tab in the deployment configuration and click <kbd className="key"><Icon icon="circle-plus" color="#fff" /> Tool</kbd>:

    * **Create a new Tool**: define a custom function directly within the deployment.
    * **Import an existing Tool**: select a previously created Function tool from the resource library.

    <Frame caption="Configure function tools the model can call during deployment execution.">
      <img src="https://mintcdn.com/orqai/R7t6xmeUretxbcfc/images/tools-deployment-config.png?fit=max&auto=format&n=R7t6xmeUretxbcfc&q=85&s=0926d22591e9ee13d85f05a37ed6fb5a" alt="Tools section with a CurrentDate tool listed and an Add Tool button." width="557" height="150" data-path="images/tools-deployment-config.png" />
    </Frame>

    <Info>
      To learn more about creating Function tools, see [Creating Tools](/docs/ai-studio/ai-engineering/create-tools).
    </Info>
  </Tab>
</Tabs>

### Cache

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Variant generation can be cached to reduce processing time and cost. When an input is received that matches a cached entry within the Variant, the stored response is returned directly without triggering a new generation.

    To enable caching, open the **Variant Settings** tab and select **Enabled** in the Caching section. The cache can be manually invalidated at any time by clicking the configuration icon.

    <Frame caption="Configure the cache expiration time.">
      <img src="https://mintcdn.com/orqai/vOJWhuVD9oBrDhcx/images/deployment-variant-caching.png?fit=max&auto=format&n=vOJWhuVD9oBrDhcx&q=85&s=19274aa9dca644973c478d57f446605b" alt="Cache settings with the Enabled toggle on and an Expires in dropdown open, showing options from 1 hour to 2 weeks." width="554" height="469" data-path="images/deployment-variant-caching.png" />
    </Frame>

    **TTL (time to live)** corresponds to the amount of time a cached response is stored before being invalidated. Once invalidated, a new LLM generation is triggered. Configure the TTL from the drop-down once Caching is enabled.

    <Info>
      The cache only works when there is an exact match. Image models are not supported.
    </Info>
  </Tab>
</Tabs>

### Evaluators and Guardrails

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    [Evaluators](/docs/ai-studio/optimize/evaluators) and Guardrails are configured as separate sections in the variant settings. Both operate on the generation pipeline but with different behaviours.

    <Frame caption="Guardrails execute synchronously and can block a generation, while Evaluators run asynchronously and never block the response.">
      <img src="https://mintcdn.com/orqai/E8L3R46ivX7g9-QI/images/docs/a6be2fe5b5e1290b3d2132212a9ec6e74287d7b6cb896d86f8f4838b5a9bcf73-Guardrails_and_Evaluators_-_Deployment.png?fit=max&auto=format&n=E8L3R46ivX7g9-QI&q=85&s=1c40c78aac37b6819e2ab2bfceaba12e" alt="Flow diagram showing a user query passing through Input Guardrails synchronously, then Deployment Model Generation, then Output Guardrails, with Input and Output Evaluators running asynchronously and fail paths returning an Error Response." width="3278" height="1779" data-path="images/docs/a6be2fe5b5e1290b3d2132212a9ec6e74287d7b6cb896d86f8f4838b5a9bcf73-Guardrails_and_Evaluators_-_Deployment.png" />
    </Frame>

    **Evaluators**

    Click <kbd className="key"><Icon icon="circle-plus" color="#fff" /> Evaluator</kbd> to add an evaluator from the Library. Configure each evaluator as:

    * <kbd><Icon icon="arrow-up-right" color="#22c55e" /></kbd> **Input evaluator**: runs evaluation on the input sent to the model.
    * <kbd><Icon icon="arrow-down-left" color="#ef4444" /></kbd> **Output evaluator**: runs evaluation on the output generated by the model.

    Evaluators run **asynchronously** and never block the response.

    <Frame caption="Configure a Sample Rate (0–100%) on each evaluator to control how frequently it runs.">
      <img src="https://mintcdn.com/orqai/4f-ka8j82TWkynBc/images/evaluator-variant.png?fit=max&auto=format&n=4f-ka8j82TWkynBc&q=85&s=2b05a024b244f5fc74171dfe78e2b71c" alt="Guardrails section listing input_contains_pii and output_toxicity, and Evaluators section showing HTTP Evaluator at 15%, with a Sample Rate popover displaying 15%." width="589" height="357" data-path="images/evaluator-variant.png" />
    </Frame>

    <Note>
      Evaluators do not run when using the [**Test** panel](#test-a-deployment) in AI Studio. To trigger evaluators, invoke the Deployment externally via the [API or SDK](#invoke-a-deployment).
    </Note>

    **Guardrails**

    Click <kbd className="key"><Icon icon="circle-plus" color="#fff" /> Guardrail</kbd> to add a guardrail-capable evaluator from the Library.

    A Guardrail runs **synchronously** and will **deny** the generation if its evaluation fails, returning an error to the user. Guardrails can be configured as:

    * <kbd><Icon icon="arrow-up-right" color="#22c55e" /></kbd> **Input Guardrail**: runs **before** the input is sent to the model.
    * <kbd><Icon icon="arrow-down-left" color="#ef4444" /></kbd> **Output Guardrail**: runs **after** generation, before client response.

    **Guardrail behavior when a guardrail fails:**

    | Behavior     | Description                                                                                                         |
    | ------------ | ------------------------------------------------------------------------------------------------------------------- |
    | **Retry**    | Triggers a new generation attempt. Use this when a transient or non-deterministic failure may resolve on retry.     |
    | **Fallback** | Executes the fallback model configured on the Deployment. Use this for a safe default response instead of retrying. |

    Guardrail behavior is configured per Deployment and applies to all guardrails attached to it.

    <Warning>
      **Output Guardrails and Streaming**: When a deployment is invoked with streaming enabled, output guardrails will be deactivated as they cannot be run effectively on chunks only.
    </Warning>

    <Callout icon="hat-chef" color="#7ecece">
      See guardrails put to the test against adversarial inputs. Read our cookbook [Red Teaming](/docs/tutorials/red-teaming).
    </Callout>
  </Tab>
</Tabs>

### Security and Privacy

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    **Input Masking**

    Inputs in a Variant can be flagged as PII (Personally Identifiable Information). This is recommended when processing sensitive user data such as names, email addresses, or phone numbers.

    To configure this, open the **Security** tab when editing an input and choose **Personally Identifiable Information (PII)** from the Privacy drop-down.

    <Frame caption="Once deployed, the input value will not be logged within Orq systems.">
      <img src="https://mintcdn.com/orqai/R7t6xmeUretxbcfc/images/variable-pii.png?fit=max&auto=format&n=R7t6xmeUretxbcfc&q=85&s=7b66c02cebefb3ccb499e95410e5cc34" alt="Variables section with a Question variable and a privacy dropdown showing None and Personal Identifiable Information (PII) options." width="758" height="194" data-path="images/variable-pii.png" />
    </Frame>

    Flagging an input as PII removes its values from logs and traces. When opening a log or trace, the input is shown in red to indicate it was not logged. The API response itself still includes the PII value.

    <img src="https://mintcdn.com/orqai/E8L3R46ivX7g9-QI/images/docs/955b175e6f005d7f112a98e54d3468acb0b72f001a150ced0ac4262128321179-iScreen_Shoter_-_Google_Chrome_-_250317122231.jpg?fit=max&auto=format&n=E8L3R46ivX7g9-QI&q=85&s=700d1089a333f51107572e98bb4a0a05" alt="Trace detail for gpt-4o showing a user message say hello to {name} and the assistant reply Hello, [name]! How are you today?" width="1113" height="823" data-path="images/docs/955b175e6f005d7f112a98e54d3468acb0b72f001a150ced0ac4262128321179-iScreen_Shoter_-_Google_Chrome_-_250317122231.jpg" />

    <Note>The API response will include the PII, but input and output logs and traces will not be logged in **Orq.ai**.</Note>

    **Output Masking**

    Enable output masking to hide generated outputs from logs and traces. Head to the **Security tab** in the Variant and enable the **Output masking** toggle.

    <img src="https://mintcdn.com/orqai/55N7ogp78VJHeSpN/images/variant-output-masking.png?fit=max&auto=format&n=55N7ogp78VJHeSpN&q=85&s=2c5239c4d02963abfc4b9a71ae3af464" alt="Variables section with city and date variables, and a Masking section with the Output Masking toggle enabled." width="545" height="259" data-path="images/variant-output-masking.png" />

    When Output Masking is enabled, logs and traces will not store the generated response.

    <img src="https://mintcdn.com/orqai/4YNqGRNpuZNyo0_T/images/output-masking-410.png?fit=max&auto=format&n=4YNqGRNpuZNyo0_T&q=85&s=51d32de03bcf8b72edd8a37c9de5b35b" alt="A masked output field with a striped pattern and a tooltip reading The response from the model was masked due to your deployment settings." width="294" height="137" data-path="images/output-masking-410.png" />
  </Tab>
</Tabs>

## Add a Variant

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    A single Deployment can hold multiple Variants. Multiple Variants can handle different use cases and scenarios within one Deployment, and can be served simultaneously through Routing.

    To add a new Variant, select the Variant name at the top-left of the screen and choose **Add variant**.

    <Frame caption="Switch between Variants and add a new Variant to a Deployment at any time.">
      <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-add-variant.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=afdaca319e1fcede466a40fc56f296a6" alt="Variant context menu with options including Edit, Duplicate, Share, Create Variant, Change, and Delete." width="259" height="324" data-path="images/deployment-add-variant.png" />
    </Frame>
  </Tab>
</Tabs>

## Routing

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Once a Variant is ready to be deployed, configure the routing variables to control which Variant is reached. Open the **Routing** page by selecting **Routing** at the top-left of the panel.

    <div className="max-w-md">
      <iframe src="https://www.youtube.com/embed/ROst-LlR2tk" title="YouTube video player" frameborder="0" className="w-full aspect-video rounded-xl" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen />
    </div>

    The Routing panel maps Variants to Context field values:

    > Each row represents a single Variant.
    >
    > Each column represents a single Context field.
    >
    > Each cell represents a Value for a Context field to be matched with a Variant.

    <Frame caption="An example routing table.">
      <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-routing.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=f3f854020535eae375ec8d9fb6f749fd" alt="Routing table for city_weather_experiment showing four variants: default matching all contexts, v1 uk for production/en, v1 germany for production/de, and v1 france for develop/fr with is_admin true." width="1170" height="312" data-path="images/deployment-routing.png" />
    </Frame>

    **Default variant:** The first row (0) is the default variant. If no routing rules match, or no context values are provided, the user is routed to Variant 0.

    **Code Snippets**

    Right-click on any Variant in the Routing table and select **Generate Code Snippets** to get ready-to-use code for that specific Variant. Snippets include the correct context environment to reach the selected Variant.

    <Frame caption="Right-click on the variant path to generate code snippets.">
      <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-routing-code-snippet.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=5609ab7bb4db0f1d57413a9d1efda7df" alt="Routing table with a right-click context menu open on the v1 uk row, showing options including Generate code snippet." width="1162" height="454" data-path="images/deployment-routing-code-snippet.png" />
    </Frame>

    **Context Fields**

    To add a new context field, press the <kbd><Icon icon="plus" /></kbd> button at the top right of the Routing table. Set a name and type for the field: `boolean`, `date`, `list`, `number`, or `string`.

    <img src="https://mintcdn.com/orqai/83k4_RKHhrJhLScC/images/deployments-context-field-type-dropdown.png?fit=max&auto=format&n=83k4_RKHhrJhLScC&q=85&s=ef30bea9b43b34d1e1bd91b8630362d6" alt="Context field creation dropdown with field_name entered and type options including Boolean, Date, List, Number, and String." width="2178" height="1084" data-path="images/deployments-context-field-type-dropdown.png" />

    **Routing Conditions**

    Create a custom routing condition for each field and Variant by entering a value in the corresponding cell. By default, the `=` operator is used. Click `=` to change the operator.

    <Frame caption="Different operators are available depending on the field type.">
      <img src="https://mintcdn.com/orqai/83k4_RKHhrJhLScC/images/deployments-routing-condition-operator-dropdown.png?fit=max&auto=format&n=83k4_RKHhrJhLScC&q=85&s=96fc3088bfeb8f8c626cbdc6e67f34ef" alt="Operator dropdown showing options: Is, Is not, Less than, Greater than, Less than or equal, and Greater than or equal." width="2108" height="856" data-path="images/deployments-routing-condition-operator-dropdown.png" />
    </Frame>

    **Simulator**

    Routing can be tested at any time by opening the Simulator via the Simulator icon at the top-right of the Routing panel. Enter values for all field configurations and select **Simulate** to see which Variant the query routes to.
  </Tab>
</Tabs>

## Versioning

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Version control tracks all changes to the model and prompt configuration. A new commit is made on each deployment and history is preserved throughout. All changes can be viewed, and any prior version can be restored.

    <div className="max-w-md">
      <iframe src="https://www.youtube.com/embed/5AJA-KHWu9Y" title="How to Use Version Control, SDK Snippets, and PII Filtering in Orq.ai Deployments" frameborder="0" className="w-full aspect-video rounded-xl" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen />
    </div>

    **Deploying a New Version**

    When the configuration is ready, press the **Deploy** button on the Variant screen.

    <Frame caption="The Deploy button is enabled once there are changes to commit.">
      <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-code-snippet-button.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=5b63d03a432251cdd9bf531d6e6728b2" alt="Variant toolbar showing share, code snippet, history, and external link buttons alongside the Deploy button." width="266" height="48" data-path="images/deployment-code-snippet-button.png" />
    </Frame>

    The deployment modal prompts for the new version (Major or Minor bump), a description of the changes, and whether to deploy immediately or save as a draft.

    **Saving a Draft** commits the changes on a new version without making them publicly available. They become public on the next deployment.

    **Comparing Changes**

    Select the **Compare Changes** button at the top-right to visualize changes between configurations in a side-by-side JSON view. Restore a previous version by selecting it in the left panel and clicking **Restore**.

    <Frame caption="Side-by-side visualization of two versions of the same Variant.">
      <img src="https://mintcdn.com/orqai/MGpeNVsx2VMjZoDd/images/deployment-version-compare.png?fit=max&auto=format&n=MGpeNVsx2VMjZoDd&q=85&s=f697d32cd10550b3d3f40c266fd0bcec" alt="Prompt template changes dialog showing a side-by-side diff between Base v1.1 and Compare v1.0 Published, highlighting a tools array added in the newer version." width="2950" height="1968" data-path="images/deployment-version-compare.png" />
    </Frame>
  </Tab>
</Tabs>

## Test a Deployment

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Click <kbd>Test</kbd> in the Deployment toolbar to open the Test panel. Enter values for any configured variables and submit to see the model response inline. No code required.

    This is useful for quickly checking prompt content and model behavior during development.

    <Frame caption="The Test panel with variable inputs and the model response.">
      <div className="max-w-[70%]">
        <img src="https://mintcdn.com/orqai/Vvtq1CVIs20Zcvqz/images/deployment-test.png?fit=max&auto=format&n=Vvtq1CVIs20Zcvqz&q=85&s=626b2ff7c209eee43c374f7b5ceb9fea" alt="Test panel showing a Variables section with company_name, user_question, and tone inputs, and the model response below." width="589" height="1164" data-path="images/deployment-test.png" />
      </div>
    </Frame>

    <Warning>
      Evaluators configured on the Deployment do not run in the Test panel. To trigger evaluators, invoke the Deployment via the [API or SDK](#invoke-a-deployment).
    </Warning>
  </Tab>
</Tabs>

## Invoke a Deployment

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Use the **Code Snippet** button at the top-right of the Variant page to get ready-to-use integration code for Python, Node.js, and cURL. All snippets include the keys and context variables needed to reach the current Variant.

    <div className="max-w-md">
      <iframe src="https://www.youtube.com/embed/Qa-aNeUlYPw" title="From Invocation to Insight: Running Orq.ai Deployments and Reading the Logs" frameborder="0" className="w-full aspect-video rounded-xl" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen />
    </div>

    <Frame caption="The Code Snippet button at the top-right of the Variant page.">
      <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-code-snippet-button.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=5b63d03a432251cdd9bf531d6e6728b2" alt="Variant toolbar showing share, code snippet, history, and external link buttons alongside the Deploy button." width="266" height="48" data-path="images/deployment-code-snippet-button.png" />
    </Frame>

    <Frame caption="The Code Snippet panel with all integration languages.">
      <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-code-snippet.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=9f1c49e029e46ab2213fd8d0cbf06761" alt="Invoke a Deployment dialog with cURL, Python, and TypeScript tabs, showing a curl command for city_weather_experiment_c3jt_49 with city and date as inputs." width="834" height="706" data-path="images/deployment-code-snippet.png" />
    </Frame>

    Code snippets per Variant are also accessible from the Routing page:

    1. Open a Deployment and go to the **Routing** page.

           <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-routing-menu.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=49bfc17061ab17199b560f737fecde59" alt="The routing context menu on the Routing page showing options including Generate Code Snippet." width="898" height="57" data-path="images/deployment-routing-menu.png" />

    2. Right-click the target Variant and select **Generate Code Snippet**.
  </Tab>

  <Tab title="API & SDK" icon="code">
    Invoke a Deployment by sending a request to the `/v2/deployments/invoke` endpoint. **Orq.ai** routes the request to the correct Variant, applies all configured settings, and returns the model's response.

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
        --url https://api.orq.ai/v2/deployments/invoke \
        --header 'Authorization: Bearer <ORQ_API_KEY>' \
        --header 'Content-Type: application/json' \
        --data '{
          "key": "my-deployment",
          "context": {"environments": "production"}
        }'
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="my-deployment",
          context={"environments": "production"},
      )

      print(generation.choices[0].message.content)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const generation = await client.deployments.invoke({
        key: 'my-deployment',
        context: { environments: 'production' },
      });

      console.log(generation.choices[0].message.content);
      ```
    </CodeGroup>

    <Tip>See the full [Invoke API reference](/reference/deployments/invoke).</Tip>

    **Usage Tracking**

    Track token consumption for every deployment call by including usage metrics in the API response.

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
           --url https://api.orq.ai/v2/deployments/invoke \
           --header 'accept: application/json' \
           --header 'authorization: Bearer <orq-api-key>' \
           --header 'content-type: application/json' \
           --data '
      {
        "key": "my-deployment",
        "context": {
          "environment": "production"
        },
        "invoke_options": {
          "include_usage": true
        }
      }
      '
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="my-deployment",
          context={"environments": "production"},
          invoke_options={"include_usage": True}
      )

      print(f"Prompt tokens: {generation.usage.prompt_tokens}")
      print(f"Completion tokens: {generation.usage.completion_tokens}")
      print(f"Total tokens: {generation.usage.total_tokens}")
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const deployment = await client.deployments.invoke({
        key: 'my-deployment',
        context: { environments: 'production' },
        invokeOptions: { includeUsage: true },
      });

      console.log(`Prompt tokens: ${deployment.usage?.promptTokens}`);
      console.log(`Completion tokens: ${deployment.usage?.completionTokens}`);
      console.log(`Total tokens: ${deployment.usage?.totalTokens}`);
      ```
    </CodeGroup>

    The response includes `prompt_tokens`, `completion_tokens`, and `total_tokens`.

    **Identity**

    Associate an identity with deployment invocations for tracking and personalization.

    **Identity fields:**

    * `id`: Unique identifier for the identity (required).
    * `display_name`: Display name of the identity.
    * `email`: Email address of the identity.
    * `logo_url`: URL to the identity's avatar or logo.
    * `tags`: List of tags associated with the identity.

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
           --url https://api.orq.ai/v2/deployments/invoke \
           --header 'accept: application/json' \
           --header 'authorization: Bearer <orq-api-key>' \
           --header 'content-type: application/json' \
           --data '
      {
        "key": "my-deployment",
        "identity": {
          "id": "contact_01ARZ3NDEKTSV4RRFFQ69G5FAV",
          "display_name": "Jane Doe",
          "email": "jane.doe@example.com",
          "logo_url": "https://example.com/avatars/jane-doe.jpg",
          "tags": ["hr", "engineering"]
        }
      }
      '
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="my-deployment",
          identity={
              "id": "contact_01ARZ3NDEKTSV4RRFFQ69G5FAV",
              "display_name": "Jane Doe",
              "email": "jane.doe@example.com",
              "logo_url": "https://example.com/avatars/jane-doe.jpg",
              "tags": ["hr", "engineering"]
          }
      )

      print(generation.choices[0].message.content)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const deployment = await client.deployments.invoke({
        key: 'my-deployment',
        identity: {
          id: 'contact_01ARZ3NDEKTSV4RRFFQ69G5FAV',
          displayName: 'Jane Doe',
          email: 'jane.doe@example.com',
          logoUrl: 'https://example.com/avatars/jane-doe.jpg',
          tags: ['hr', 'engineering'],
        },
      });

      console.log(deployment?.choices[0].message.content);
      ```
    </CodeGroup>
  </Tab>
</Tabs>

### Extra Parameters

<Tabs>
  <Tab title="API & SDK" icon="code">
    Use `extra_params` to pass parameters not directly exposed by the **Orq.ai** panel, or to override existing model configuration at runtime.

    **Passing an unsupported parameter:**

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
           --url https://api.orq.ai/v2/deployments/invoke \
           --header 'accept: application/json' \
           --header 'authorization: Bearer <orq-api-key>' \
           --header 'content-type: application/json' \
           --data '
      {
        "key": "my-deployment",
        "context": { "environment": "production" },
        "extra_params": { "presence_penalty": 1.0 }
      }
      '
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="my-deployment",
          context={"environments": "production"},
          extra_params={"presence_penalty": 1.0}
      )

      print(generation.choices[0].message.content)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const deployment = await client.deployments.invoke({
        key: 'my-deployment',
        context: { environments: 'production' },
        extraParams: { presencePenalty: 1.0 },
      });

      console.log(deployment?.choices[0].message.content);
      ```
    </CodeGroup>

    <Warning>
      Overwriting existing parameters can impact the model configuration. Use with caution.
    </Warning>

    **Overwriting an existing parameter at runtime:**

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
           --url https://api.orq.ai/v2/deployments/invoke \
           --header 'accept: application/json' \
           --header 'content-type: application/json' \
           --data '
      {
        "key": "my-deployment",
        "context": { "environment": "production" },
        "extra_params": { "temperature": 0.4 }
      }
      '
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="my-deployment",
          context={"environments": "production"},
          extra_params={"temperature": 0.4}
      )

      print(generation.choices[0].message.content)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const deployment = await client.deployments.invoke({
        key: 'my-deployment',
        context: { environments: 'production' },
        extraParams: { temperature: 0.4 },
      });

      console.log(deployment?.choices[0].message.content);
      ```
    </CodeGroup>
  </Tab>
</Tabs>

### Attach Files

<Tabs>
  <Tab title="API & SDK" icon="code">
    <Note>
      The `file_ids` / `fileIds` parameter on deployment invocations is deprecated and will be removed in a future release. Use native file attachment instead.
    </Note>

    Two options are available for attaching files to a Deployment:

    1. Send PDFs directly to the model in the invocation payload.
    2. Attach a [Knowledge Base](/docs/ai-studio/ai-engineering/knowledge-bases-memory-stores) to the Deployment.

    **Sending PDFs Directly to the Model**

    <Warning>
      This feature is only supported with OpenAI, Anthropic, and Google Gemini models.
    </Warning>

    Embed files directly in the [Invoke](/reference/deployments/invoke) payload using a `file` type message with a standard data URI scheme: `data:content/type;base64` followed by the base64-encoded file data.

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
           --url https://api.orq.ai/v2/deployments/invoke \
           --header 'accept: application/json' \
           --header 'authorization: Bearer <orq-api-key>' \
           --header 'content-type: application/json' \
           --data '
      {
        "key": "key",
        "messages": [
          {
            "role": "user",
            "content": [
              { "type": "text", "text": "prompt" },
              {
                "type": "file",
                "file": {
                  "file_data": "data:application/pdf;base64,<base64-encoded-data>"
                }
              }
            ]
          }
        ]
      }
      '
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="deployment_key",
          messages=[
              {
                  "role": "user",
                  "content": [
                      { "type": "text", "text": "prompt" },
                      {
                          "type": "file",
                          "file": {
                              "file_data": "data:application/pdf;base64,<base64-encoded-data>",
                              "filename": "filename"
                          }
                      }
                  ]
              }
          ],
          metadata={
              "user_id": "123",
              "session_id": "456",
          }
      )
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const generation = await client.deployments.invoke({
        key: 'deployment_key',
        messages: [
          {
            role: 'user',
            content: [
              { type: 'text', text: 'prompt' },
              {
                type: 'file',
                file: {
                  fileData: 'data:application/pdf;base64,<base64-encoded-data>',
                  filename: 'filename.pdf'
                }
              }
            ]
          }
        ],
        metadata: { userId: '123', sessionId: '456' }
      });
      ```
    </CodeGroup>

    <Callout icon="hat-chef" color="#7ecece">
      See PDF inputs used to extract structured data end-to-end. Read our cookbook [PDF Extraction](/docs/tutorials/pdf-extraction).
    </Callout>

    **Knowledge Base vs. Direct File Attachment**

    **Use a Knowledge Base when:** the information is reused across many requests and RAG (targeted chunk retrieval) is sufficient. Knowledge Bases retrieve relevant chunks but not the full document.

    **Use direct file attachment when:** the task requires full-document understanding (e.g. summarization, legal review, detailed analysis), the document is ad-hoc or session-specific, or the data is too sensitive for a shared knowledge repository.

    <Info>
      Read how to set up a [Knowledge Base](/docs/ai-studio/ai-engineering/knowledge-bases-memory-stores) or [use a Knowledge Base in a prompt](/docs/ai-studio/ai-engineering/knowledge-bases-memory-stores#search-a-knowledge-base).
    </Info>
  </Tab>
</Tabs>

## Analytics and Logs

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Once a Deployment is running and receiving traffic, detailed analytics of all requests are available.

    **Logs** show requests per Variant. Filters available:

    * **Variant**: select a single Variant to filter logs.
    * **Evaluation**: **Matched** (a routing rule was matched) or **Default Matched** (no routing rule matched, default Variant was used).
    * **Source**: **API**, **SDK**, or **Simulator**.

    Click any log line to open a detail panel showing context, requests, and parameters sent to the Deployment.

    <Frame caption="Logs overview.">
      <img src="https://mintcdn.com/orqai/E8L3R46ivX7g9-QI/images/docs/cce77f539b0004784a155a2f329a5cde60cc2652d5c6e4e4b664d1b20a24aaca-Screenshot_2025-03-25_at_13.25.18.png?fit=max&auto=format&n=E8L3R46ivX7g9-QI&q=85&s=5b89f0d360ceb40a0846bff23192021d" alt="Logs tab for the NPS_functioncall deployment showing five entries for variant 4o using gpt-4o via OpenAI, all with status 200." width="2372" height="816" data-path="images/docs/cce77f539b0004784a155a2f329a5cde60cc2652d5c6e4e4b664d1b20a24aaca-Screenshot_2025-03-25_at_13.25.18.png" />
    </Frame>
  </Tab>
</Tabs>
