> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Deployments

> Ship LLM use cases to production with Orq.ai. Configure model routing, invoke via API or SDK, and monitor in real time.

**Deployments** ship Gen AI use cases to production with **Orq.ai** as an AI Gateway. All calls route through the platform, providing routing, monitoring, and security in one place. Connect with a single line of code, iterate without a code release, and benefit from full observability throughout.

Common use cases include customer support bots, RAG-powered document Q\&A, content generation pipelines, and any LLM feature that needs reliable model routing, versioning, and production monitoring.

<CardGroup cols={3}>
  <Card title="Create" icon="rocket" href="#create-a-deployment">
    Set up a Deployment with a key, model, and system prompt in AI Studio or via MCP.
  </Card>

  <Card title="Configure" icon="sliders" href="#configure-a-variant">
    Set the model, fallbacks, variables, knowledge base, tools, caching, and guardrails per Variant.
  </Card>

  <Card title="Routing" icon="code-fork" href="#routing">
    Route traffic across Variants by environment, context attributes, or percentage split.
  </Card>

  <Card title="Versioning" icon="code-branch" href="#versioning">
    Deploy and roll back configurations without a code release.
  </Card>

  <Card title="Invoke" icon="code" href="#invoke-a-deployment">
    Call a Deployment via API or SDK and pass identity, usage tracking, and extra parameters.
  </Card>

  <Card title="Analytics" icon="chart-line" href="#analytics-and-logs">
    Monitor requests, filter logs by Variant, and inspect full request details.
  </Card>
</CardGroup>

## Create a Deployment

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    <Steps>
      <Step title="Open the AI Studio">
        Choose a [Project](/docs/projects/overview) and folder, then select the `+` button.
      </Step>

      <Step title="Choose Deployment">
        Select **Deployment** from the entity picker.

        <Frame caption="Configure the deployment key and primary model. All settings can be changed later.">
          <img src="https://mintcdn.com/orqai/E8L3R46ivX7g9-QI/images/docs/e01c6d159a6f92fbb93734e5fb0dc220e1a6723eb92d1457eec46f8e55e78adc-iScreen_Shoter_-_Google_Chrome_-_250307122502.jpg?fit=max&auto=format&n=E8L3R46ivX7g9-QI&q=85&s=ead029bf4f4bae9ff7fde53b77ce1dd7" alt="Create Deployment dialog with fields for Deployment Key set to key123 and Model set to claude-3-7-sonnet-20250219." width="1068" height="607" data-path="images/docs/e01c6d159a6f92fbb93734e5fb0dc220e1a6723eb92d1457eec46f8e55e78adc-iScreen_Shoter_-_Google_Chrome_-_250307122502.jpg" />
        </Frame>
      </Step>

      <Step title="Configure the initial Variant">
        Set the deployment key (alphanumeric) and select the primary model for the first Variant. The Variant editor opens.
      </Step>
    </Steps>
  </Tab>

  <Tab title="MCP" icon="https://mintcdn.com/orqai/i7ZhKI7LFRfXU7ox/images/logos/mcp.svg?fit=max&auto=format&n=i7ZhKI7LFRfXU7ox&q=85&s=cef7916eb5fe1f6bb97541398d3f7639" width="16" height="16" data-path="images/logos/mcp.svg">
    Use the [Orq MCP server](/docs/integrations/code-assistants/mcp) to manage deployments directly from an AI code assistant.

    **Find an existing deployment:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Search for the "support-bot" deployment in my workspace
    ```

    The assistant uses `search_entities` with `type: "deployment"` to locate deployments by name or key.

    ***

    **Retrieve deployment configuration:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Get the full configuration of the "support-bot" deployment
    ```

    The assistant uses `get_deployment` to return the key, description, model, messages, and variant settings.

    ***

    **Create a deployment:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create a customer support deployment called "support-bot" in the Default/Deployments project. Use GPT-4o with a professional, concise system prompt.
    ```

    The assistant uses `create_deployment` with the specified `key`, `path`, and `variant` (model and messages). Use `list_models` first to find valid model IDs.
  </Tab>
</Tabs>

## Configure a Variant

**Variants** are different prompt and model configurations available behind one Deployment. A Deployment can hold any number of Variants.

On creation, the **Variant** screen opens for model and prompt setup.

<Info>
  A Variant Prompt is similar to any other prompt. To learn how to configure a Prompt, see [Creating a Prompt](/docs/prompts/creating).
</Info>

### Primary Model, Retries, and Fallback

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    The **Primary Model** panel defines the first model queried through this Variant.

    **Retries**

    In case of failure, configure how many times a query is retried with this model.

    <Info>
      Retries are only triggered when a retry count greater than 0 is configured in the Variant settings.

      When retries are enabled, **Orq.ai** automatically retries the model provider API call if it returns one of the following HTTP status codes:

      * 429 Rate Limit Exceeded
      * 500 Internal Server Error
      * 501 Not Implemented
      * 502 Bad Gateway
      * 503 Service Unavailable
    </Info>

    **Error handling flow:**

    1. If an error code above is returned and retries are configured (retry count > 0), **Orq.ai** retries the Primary Model.
    2. If all retry attempts fail (or no retries are configured) AND a Fallback Model is configured, **Orq.ai** routes to the Fallback Model.
    3. If the Fallback Model also fails, the error is returned to the calling application.

    **Fallback Model**

    The Fallback Model is triggered only if the Primary Model fails after all configured retries are exhausted. Fallback Models can have a different configuration from the Primary Model.

    <Frame caption="The Fallback Model configuration is right below the main model configuration. Configure them independently.">
      <img src="https://mintcdn.com/orqai/HI0EZ1zMxSbxMGnn/images/fallback-model.png?fit=max&auto=format&n=HI0EZ1zMxSbxMGnn&q=85&s=80f6fbef01891992fb00345d745b998d" alt="Primary Model section showing claude-opus-4-20250514 with Fallback Models configured to gpt-5.2 with reasoning effort, verbosity, and response format settings." width="626" height="452" data-path="images/fallback-model.png" />
    </Frame>

    <Tip>
      Multiple fallback models can be configured in a Deployment. They fall back to one another in order of configuration. Use the **Add extra fallback** button to declare another model.
    </Tip>

    <Callout icon="hat-chef" color="#7ecece">
      See how fallbacks and retries work together in a production system. Read our cookbook [Customer Support Chat](/docs/tutorials/buildingcustomersupportchatwithaigateway).
    </Callout>
  </Tab>
</Tabs>

**API invocation behavior**

When invoking a Deployment via the API, response timing depends on the retry and fallback configuration:

* **Success on first try**: Response returned immediately.
* **Retry scenario**: Response may be delayed by up to `base_latency × (retry_count + 1)` to account for the initial attempt plus all configured retries.
* **Fallback invoked**: Additional latency as the Fallback Model processes the request.
* **All retries and fallback failed**: Error returned to the calling application.

Set appropriate timeouts on API calls to account for retry and fallback latency.

### Structured Outputs

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Configure **structured outputs** to ensure consistent and reliable responses from a Deployment. Structured outputs specify the exact format the model should follow when generating a response.

    Two modes are available:

    * **JSON Mode**: the model automatically returns a valid JSON object for every generation.
    * **JSON Schema**: define a schema that explicitly describes the fields, types, and structure of the model output.

    Once defined, a schema can be saved to the directory for reuse across multiple variants or deployments.

    <Frame caption="Set the Response Format to JSON Schema to configure structured output.">
      <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-json-schema-configuration.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=f0cef30df85bffdfe6209f96a911e1dd" alt="Primary Model settings with Response Format set to JSON Schema, showing a schema selector dropdown with get_weather and json_p3ft options." width="591" height="304" data-path="images/deployment-json-schema-configuration.png" />
    </Frame>
  </Tab>
</Tabs>

### Variables and Prompt Templating

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Reference dynamic values in the prompt using double braces: `{{variable_name}}`. Pass a key-value map to the `inputs` field when invoking and **Orq.ai** substitutes each variable before sending the prompt to the model.

    **Orq.ai** supports three template engines. Select the **Template Engine** from the Variant Settings panel:

    * **Text** (default): variables use `{{double_braces}}` syntax.
    * **Jinja**: full templating with conditionals, loops, filters, and more.
    * **Mustache**: logic-less templating with sections.

    <Frame caption="Select a Template Engine in the Variant Settings panel.">
      <img src="https://mintcdn.com/orqai/NVptt7X89f0374MS/images/template-engine.png?fit=max&auto=format&n=NVptt7X89f0374MS&q=85&s=91e6906e1ad39dab6684a3e2080b74db" alt="Template Engine dropdown with Text currently selected and options for Jinja and Mustache." width="649" height="254" data-path="images/template-engine.png" />
    </Frame>

    **Example: support bot that adapts by subscription tier**

    <AccordionGroup>
      <Accordion title="Jinja example" icon="code">
        <Steps>
          <Step title="Prompt template">
            ```jinja Jinja theme={"theme":{"light":"github-light","dark":"github-dark"}}
            You are a support assistant for {{company_name}}.

            {% if user_tier == "premium" %}
            {{customer_name}} is a premium customer. Greet them by name and let them know they have priority support with a 2-hour response SLA.
            {% else %}
            {{customer_name}} is on the free plan. Let them know the standard response time is 24 hours.
            {% endif %}
            ```
          </Step>

          <Step title="Template in the Studio">
            <Frame>
              <img src="https://mintcdn.com/orqai/HVm7-3vBg7cwVv2-/images/jinja-studio.png?fit=max&auto=format&n=HVm7-3vBg7cwVv2-&q=85&s=8762ea2c3ebdcb5f539ec314a09fad8f" alt="System prompt in the Studio editor showing a Jinja template with if/else blocks for premium and free tier customers using is_premium, customer_name, and company_name variables." width="819" height="337" data-path="images/jinja-studio.png" />
            </Frame>
          </Step>

          <Step title="Call the deployment">
            <CodeGroup>
              ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
              response = client.deployments.invoke(
                  key="support-bot",
                  inputs={
                      "company_name": "Acme",
                      "customer_name": "Sarah",
                      "user_tier": "premium",
                  }
              )
              ```

              ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
              const response = await client.deployments.invoke({
                key: "support-bot",
                inputs: {
                  company_name: "Acme",
                  customer_name: "Sarah",
                  user_tier: "premium",
                },
              });
              ```
            </CodeGroup>
          </Step>

          <Step title="Trace">
            <Frame>
              <img src="https://mintcdn.com/orqai/HVm7-3vBg7cwVv2-/images/jinja-studio-trace.png?fit=max&auto=format&n=HVm7-3vBg7cwVv2-&q=85&s=652b936523154b0ddf2f59d86bcc2cda" alt="Trace view showing a rendered Jinja template for gpt-3.5-turbo with company_name set to Acme, customer_name to Sarah, and is_premium to true, generating a priority support greeting." width="1019" height="748" data-path="images/jinja-studio-trace.png" />
            </Frame>
          </Step>
        </Steps>
      </Accordion>

      <Accordion title="Mustache example" icon="code">
        <Steps>
          <Step title="Prompt template">
            ```handlebars Mustache theme={"theme":{"light":"github-light","dark":"github-dark"}}
            You are a support assistant for {{company_name}}.

            {{! Pass is_premium: true for premium customers, false for free plan }}
            {{# is_premium}}
            {{customer_name}} is a premium customer. Greet them by name with priority support and a 2-hour SLA.
            {{/ is_premium}}
            {{^ is_premium}}
            {{customer_name}} is on the free plan. Standard response time is 24 hours.
            {{/ is_premium}}
            ```
          </Step>

          <Step title="Template in the Studio">
            <Frame>
              <img src="https://mintcdn.com/orqai/vUxywKg0A2hpKhUw/images/mustache-studio.png?fit=max&auto=format&n=vUxywKg0A2hpKhUw&q=85&s=d55d988a9c8bf51003a165c85f70f618" alt="System prompt in the Studio editor showing a Mustache template with {{#is_premium}} and {{^is_premium}} sections for premium and free plan customers." width="1075" height="550" data-path="images/mustache-studio.png" />
            </Frame>
          </Step>

          <Step title="Call the deployment">
            <CodeGroup>
              ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
              response = client.deployments.invoke(
                  key="support-bot",
                  inputs={
                      "company_name": "Acme",
                      "customer_name": "Sarah",
                      "is_premium": True,
                  }
              )
              ```

              ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
              const response = await client.deployments.invoke({
                key: "support-bot",
                inputs: {
                  company_name: "Acme",
                  customer_name: "Sarah",
                  is_premium: true,
                },
              });
              ```
            </CodeGroup>
          </Step>

          <Step title="Trace">
            <Frame>
              <img src="https://mintcdn.com/orqai/vUxywKg0A2hpKhUw/images/mustache-studio-trace.png?fit=max&auto=format&n=vUxywKg0A2hpKhUw&q=85&s=0fe50b725aa551f9f6b431808f3ece04" alt="Trace view showing a rendered Mustache template for gpt-3.5-turbo with company_name set to Acme, customer_name to Sarah, and is_premium to true, with the assistant greeting Sarah as a premium customer." width="1018" height="671" data-path="images/mustache-studio-trace.png" />
            </Frame>
          </Step>
        </Steps>
      </Accordion>
    </AccordionGroup>
  </Tab>

  <Tab title="API & SDK" icon="code">
    Add `{{variable_name}}` placeholders to the prompt and pass the corresponding values in the `inputs` field at invoke time. **Orq.ai** substitutes each key before sending the prompt to the model.

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
        --url https://api.orq.ai/v2/deployments/invoke \
        --header 'Authorization: Bearer <ORQ_API_KEY>' \
        --header 'Content-Type: application/json' \
        --data '{
          "key": "my-deployment",
          "context": {"environments": "production"},
          "inputs": {
            "customer_name": "John Smith",
            "user_tier": "premium"
          }
        }'
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="my-deployment",
          context={"environments": "production"},
          inputs={
              "customer_name": "John Smith",
              "user_tier": "premium",
          },
      )

      print(generation.choices[0].message.content)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const generation = await client.deployments.invoke({
        key: 'my-deployment',
        context: { environments: 'production' },
        inputs: {
          customer_name: 'John Smith',
          user_tier: 'premium',
        },
      });

      console.log(generation.choices[0].message.content);
      ```
    </CodeGroup>
  </Tab>
</Tabs>

<Info>
  For a complete reference of all template features including filters, macros, nested objects, and more, see [Prompt Templating](/docs/prompts/templating).
</Info>

<Info>
  To prevent sensitive input values from appearing in traces and logs, see [Security and Privacy](/docs/deployments/creating#security-and-privacy).
</Info>

### Knowledge Base

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Ground a Deployment's responses in domain-specific knowledge by adding a [Knowledge Base](/docs/knowledge/overview).

    Open the deployment configuration, go to **Knowledge Bases**, then select <Icon icon="circle-plus" /> **Knowledge Base**.

    <Info>
      Knowledge Bases enable RAG (Retrieval-Augmented Generation), allowing the model to retrieve and use relevant information from documentation or data sources to provide more accurate and contextual responses.
    </Info>

    **Configuration options** (via the `...` menu on an attached Knowledge Base):

    * **Last User Message**: the user's latest message is automatically used as a query to retrieve relevant chunks.
    * **Query**: a predefined query is used to retrieve chunks. Use Input Variables like `{{query}}` to make it dynamic at runtime.

    <Frame caption="Configure which Knowledge Base to use and how it should be queried in the deployment.">
      <img src="https://mintcdn.com/orqai/mkK-RgpJxyAg_Wxr/images/knowledge-deployment-config.png?fit=max&auto=format&n=mkK-RgpJxyAg_Wxr&q=85&s=2e16bf5933b1dac3c9d6865a60cb3148" alt="Edit Knowledge Base dialog with Knowledge Base set to knowledge and Type set to Last User Message." width="566" height="354" data-path="images/knowledge-deployment-config.png" />
    </Frame>

    <Info>
      To learn more about creating and configuring Knowledge Bases, see [Knowledge Bases](/docs/knowledge/overview).
    </Info>

    Reference the Knowledge Base in the prompt using the `{{knowledge_base_key}}` syntax, where `knowledge_base_key` is the identifier of the Knowledge Base. If the Knowledge Base is not explicitly referenced in the prompt, retrieved chunks are automatically appended to the end of the system message.

    <Frame caption="Using a Knowledge Base in a prompt.">
      <img src="https://mintcdn.com/orqai/ffoLsHE_4rLFpEJ6/images/knowledge-deployment-use.png?fit=max&auto=format&n=ffoLsHE_4rLFpEJ6&q=85&s=628a972376853efaf126474f91fbf783" alt="Deployment settings showing a Knowledge Base named knowledge in the settings panel, with the {knowledge} variable highlighted in the system prompt." width="1454" height="761" data-path="images/knowledge-deployment-use.png" />
    </Frame>

    <Callout icon="hat-chef" color="#7ecece">
      See knowledge base retrieval used end-to-end in a working deployment. Read our cookbook [Multilingual FAQ Bot](/docs/tutorials/multilingual-faq-bot).
    </Callout>
  </Tab>

  <Tab title="API & SDK" icon="code">
    When invoking a Deployment that uses a Knowledge Base, set `include_retrievals: true` in `invoke_options` to embed the retrieval chunks in the response.

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --location 'https://api.orq.ai/v2/deployments/invoke' \
      --header 'Content-Type: application/json' \
      --header 'Accept: application/json' \
      --header 'Authorization: Bearer xxxxx' \
      --data '{
          "key": "deployment_key",
          "messages": [
              {
                  "role": "user",
                  "content": ""
              }
          ],
          "invoke_options": {
              "include_retrievals": true
          }
      }'
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="deployment_key",
          messages=[
            {
              "role": "user",
              "content": ""
            }
          ],
          invoke_options={"include_retrievals": True}
      )

      print(generation.choices[0].message.content)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const deployment = await client.deployments.invoke({
         key: "deployment_key",
         messages: [
           {
             "role": "user",
             "content": ""
           }
         ],
        invokeOptions: { includeRetrievals: true },
      });
      ```
    </CodeGroup>

    Retrievals are returned in the `retrievals` field of the response. Each chunk includes source details and scores:

    <Frame caption="Retrieval results are embedded in the retrieval_metadata field, containing document details, file metadata, and search scores.">
      <img src="https://mintcdn.com/orqai/x_6IXnot9ETOc_0g/images/docs/618f20adfcf1bf3fe7f8bb74d695ea9376ee7530445fe689d0de294de2b6e1a7-690fef20-7a3a-4b17-a444-e46decc3958f.png?fit=max&auto=format&n=x_6IXnot9ETOc_0g&q=85&s=6a96fefaab8b58ee2d90682a07619216" alt="API response in a REST client showing the retrievals array with document metadata including file names, file type, page number, and search score for apple_annual_report_2023.pdf." width="4096" height="1878" data-path="images/docs/618f20adfcf1bf3fe7f8bb74d695ea9376ee7530445fe689d0de294de2b6e1a7-690fef20-7a3a-4b17-a444-e46decc3958f.png" />
    </Frame>

    ```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
    {
        "retrievals": [
            {
                "document": "<chunk_data>",
                "metadata": {
                    "file_name": "<filename>",
                    "file_type": "application/pdf",
                    "page_number": 24,
                    "search_score": 0.7886787056922913,
                    "rerank_score": 0.19868536
                }
            }
        ]
    }
    ```

    <Callout icon="hat-chef" color="#7ecece">
      See knowledge base retrievals wired into a complete application. Read our cookbook [Multilingual FAQ Bot](/docs/tutorials/multilingual-faq-bot).
    </Callout>
  </Tab>
</Tabs>

### Tools

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Tools can only be added and configured at the **deployment** level. Only **Function tools** are supported in Deployments, enabling the model to call external functions during execution.

    To add a Function tool, open the **Tools** tab in the deployment configuration and click <Icon icon="circle-plus" /> **Tool**:

    * **Create a new Tool**: define a custom function directly within the deployment.
    * **Import an existing Tool**: select a previously created Function tool from the resource library.

    <Frame caption="Configure function tools the model can call during deployment execution.">
      <img src="https://mintcdn.com/orqai/R7t6xmeUretxbcfc/images/tools-deployment-config.png?fit=max&auto=format&n=R7t6xmeUretxbcfc&q=85&s=0926d22591e9ee13d85f05a37ed6fb5a" alt="Tools section with a CurrentDate tool listed and an Add Tool button." width="557" height="150" data-path="images/tools-deployment-config.png" />
    </Frame>

    <Info>
      To learn more about creating Function tools, see [Creating Tools](/docs/tools/overview).
    </Info>
  </Tab>
</Tabs>

### Cache

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Variant generation can be cached to reduce processing time and cost. When an input is received that matches a cached entry within the Variant, the stored response is returned directly without triggering a new generation.

    To enable caching, open the **Variant Settings** tab and select **Enabled** in the Caching section. The cache can be manually invalidated at any time by clicking the configuration icon.

    <Frame caption="Configure the cache expiration time.">
      <img src="https://mintcdn.com/orqai/vOJWhuVD9oBrDhcx/images/deployment-variant-caching.png?fit=max&auto=format&n=vOJWhuVD9oBrDhcx&q=85&s=19274aa9dca644973c478d57f446605b" alt="Cache settings with the Enabled toggle on and an Expires in dropdown open, showing options from 1 hour to 2 weeks." width="554" height="469" data-path="images/deployment-variant-caching.png" />
    </Frame>

    **TTL (time to live)** corresponds to the amount of time a cached response is stored before being invalidated. Once invalidated, a new LLM generation is triggered. Configure the TTL from the drop-down once Caching is enabled.

    <Info>
      The cache only works when there is an exact match. Image models are not supported.
    </Info>
  </Tab>
</Tabs>

### Evaluators and Guardrails

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    [Evaluators](/docs/evaluators/build) and Guardrails are configured as separate sections in the variant settings. Both operate on the generation pipeline but with different behaviours.

    <Frame caption="Guardrails execute synchronously and can block a generation, while Evaluators run asynchronously and never block the response.">
      <img src="https://mintcdn.com/orqai/E8L3R46ivX7g9-QI/images/docs/a6be2fe5b5e1290b3d2132212a9ec6e74287d7b6cb896d86f8f4838b5a9bcf73-Guardrails_and_Evaluators_-_Deployment.png?fit=max&auto=format&n=E8L3R46ivX7g9-QI&q=85&s=1c40c78aac37b6819e2ab2bfceaba12e" alt="Flow diagram showing a user query passing through Input Guardrails synchronously, then Deployment Model Generation, then Output Guardrails, with Input and Output Evaluators running asynchronously and fail paths returning an Error Response." width="3278" height="1779" data-path="images/docs/a6be2fe5b5e1290b3d2132212a9ec6e74287d7b6cb896d86f8f4838b5a9bcf73-Guardrails_and_Evaluators_-_Deployment.png" />
    </Frame>

    **Evaluators**

    Click <Icon icon="circle-plus" /> **Evaluator** to add an evaluator from the Library. Configure each evaluator as:

    * <Icon icon="arrow-up-right" color="#22c55e" /> **Input evaluator**: runs evaluation on the input sent to the model.
    * <Icon icon="arrow-down-left" color="#ef4444" /> **Output evaluator**: runs evaluation on the output generated by the model.

    Evaluators run **asynchronously** and never block the response.

    <Frame caption="Configure a Sample Rate (0–100%) on each evaluator to control how frequently it runs.">
      <img src="https://mintcdn.com/orqai/4f-ka8j82TWkynBc/images/evaluator-variant.png?fit=max&auto=format&n=4f-ka8j82TWkynBc&q=85&s=2b05a024b244f5fc74171dfe78e2b71c" alt="Guardrails section listing input_contains_pii and output_toxicity, and Evaluators section showing HTTP Evaluator at 15%, with a Sample Rate popover displaying 15%." width="589" height="357" data-path="images/evaluator-variant.png" />
    </Frame>

    **Guardrails**

    Click <Icon icon="circle-plus" /> **Guardrail** to add a guardrail-capable evaluator from the Library.

    A Guardrail runs **synchronously** and will **deny** the generation if its evaluation fails, returning an error to the user. Guardrails can be configured as:

    * <Icon icon="arrow-up-right" color="#22c55e" /> **Input Guardrail**: runs **before** the input is sent to the model.
    * <Icon icon="arrow-down-left" color="#ef4444" /> **Output Guardrail**: runs **after** generation, before client response.

    **Guardrail behavior when a guardrail fails:**

    | Behavior     | Description                                                                                                         |
    | ------------ | ------------------------------------------------------------------------------------------------------------------- |
    | **Retry**    | Triggers a new generation attempt. Use this when a transient or non-deterministic failure may resolve on retry.     |
    | **Fallback** | Executes the fallback model configured on the Deployment. Use this for a safe default response instead of retrying. |

    Guardrail behavior is configured per Deployment and applies to all guardrails attached to it.

    <Warning>
      **Output Guardrails and Streaming**: When a deployment is invoked with streaming enabled, output guardrails will be deactivated as they cannot be run effectively on chunks only.
    </Warning>

    <Callout icon="hat-chef" color="#7ecece">
      See guardrails put to the test against adversarial inputs. Read our cookbook [Red Teaming](/docs/tutorials/red-teaming).
    </Callout>
  </Tab>
</Tabs>

### Security and Privacy

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    **Input Masking**

    Inputs in a Variant can be flagged as PII (Personally Identifiable Information). This is recommended when processing sensitive user data such as names, email addresses, or phone numbers.

    To configure this, open the **Security** tab when editing an input and choose **Personally Identifiable Information (PII)** from the Privacy drop-down.

    <Frame caption="Once deployed, the input value will not be logged within Orq systems.">
      <img src="https://mintcdn.com/orqai/R7t6xmeUretxbcfc/images/variable-pii.png?fit=max&auto=format&n=R7t6xmeUretxbcfc&q=85&s=7b66c02cebefb3ccb499e95410e5cc34" alt="Variables section with a Question variable and a privacy dropdown showing None and Personal Identifiable Information (PII) options." width="758" height="194" data-path="images/variable-pii.png" />
    </Frame>

    Flagging an input as PII removes its values from logs and traces. When opening a log or trace, the input is shown in red to indicate it was not logged. The API response itself still includes the PII value.

    <img src="https://mintcdn.com/orqai/E8L3R46ivX7g9-QI/images/docs/955b175e6f005d7f112a98e54d3468acb0b72f001a150ced0ac4262128321179-iScreen_Shoter_-_Google_Chrome_-_250317122231.jpg?fit=max&auto=format&n=E8L3R46ivX7g9-QI&q=85&s=700d1089a333f51107572e98bb4a0a05" alt="Trace detail for gpt-4o showing a user message say hello to {name} and the assistant reply Hello, [name]! How are you today?" width="1113" height="823" data-path="images/docs/955b175e6f005d7f112a98e54d3468acb0b72f001a150ced0ac4262128321179-iScreen_Shoter_-_Google_Chrome_-_250317122231.jpg" />

    <Note>The API response will include the PII, but input and output logs and traces will not be logged in **Orq.ai**.</Note>

    **Output Masking**

    Enable output masking to hide generated outputs from logs and traces. Head to the **Security tab** in the Variant and enable the **Output masking** toggle.

    <img src="https://mintcdn.com/orqai/55N7ogp78VJHeSpN/images/variant-output-masking.png?fit=max&auto=format&n=55N7ogp78VJHeSpN&q=85&s=2c5239c4d02963abfc4b9a71ae3af464" alt="Variables section with city and date variables, and a Masking section with the Output Masking toggle enabled." width="545" height="259" data-path="images/variant-output-masking.png" />

    When Output Masking is enabled, logs and traces will not store the generated response.

    <img src="https://mintcdn.com/orqai/E8L3R46ivX7g9-QI/images/docs/d8df910b899ccafe149c36a4119f7ab8e3c912ee79f48bade9176f05282e5aa3-Screenshot_2024-10-11_at_15.59.17.png?fit=max&auto=format&n=E8L3R46ivX7g9-QI&q=85&s=d6f7ee856c1146038e967f8fa4911f0a" alt="A masked output field with a striped pattern and a tooltip reading The response from the model was masked due to your deployment settings." width="2244" height="402" data-path="images/docs/d8df910b899ccafe149c36a4119f7ab8e3c912ee79f48bade9176f05282e5aa3-Screenshot_2024-10-11_at_15.59.17.png" />
  </Tab>
</Tabs>

## Add a Variant

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    A single Deployment can hold multiple Variants. Multiple Variants can handle different use cases and scenarios within one Deployment, and can be served simultaneously through Routing.

    To add a new Variant, select the Variant name at the top-left of the screen and choose **Add variant**.

    <Frame caption="Switch between Variants and add a new Variant to a Deployment at any time.">
      <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-add-variant.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=afdaca319e1fcede466a40fc56f296a6" alt="Variant context menu with options including Edit, Duplicate, Share, Create Variant, Change, and Delete." width="259" height="324" data-path="images/deployment-add-variant.png" />
    </Frame>
  </Tab>
</Tabs>

## Routing

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Once a Variant is ready to be deployed, configure the routing variables to control which Variant is reached. Open the **Routing** page by selecting **Routing** at the top-left of the panel.

    <iframe width="560" height="315" src="https://www.youtube.com/embed/ROst-LlR2tk" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

    The Routing panel maps Variants to Context field values:

    > Each row represents a single Variant.
    >
    > Each column represents a single Context field.
    >
    > Each cell represents a Value for a Context field to be matched with a Variant.

    <Frame caption="An example routing table.">
      <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-routing.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=f3f854020535eae375ec8d9fb6f749fd" alt="Routing table for city_weather_experiment showing four variants: default matching all contexts, v1 uk for production/en, v1 germany for production/de, and v1 france for develop/fr with is_admin true." width="1170" height="312" data-path="images/deployment-routing.png" />
    </Frame>

    **Default variant:** The first row (0) is the default variant. If no routing rules match, or no context values are provided, the user is routed to Variant 0.

    **Code Snippets**

    Right-click on any Variant in the Routing table and select **Generate Code Snippets** to get ready-to-use code for that specific Variant. Snippets include the correct context environment to reach the selected Variant.

    <Frame caption="Right-click on the variant path to generate code snippets.">
      <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-routing-code-snippet.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=5609ab7bb4db0f1d57413a9d1efda7df" alt="Routing table with a right-click context menu open on the v1 uk row, showing options including Generate code snippet." width="1162" height="454" data-path="images/deployment-routing-code-snippet.png" />
    </Frame>

    **Context Fields**

    To add a new context field, press the `+` button at the top right of the Routing table. Set a name and type for the field: `boolean`, `date`, `list`, `number`, or `string`.

    <img src="https://mintcdn.com/orqai/x_6IXnot9ETOc_0g/images/docs/65c573f-Screenshot_2024-07-18_at_17.00.34.png?fit=max&auto=format&n=x_6IXnot9ETOc_0g&q=85&s=4bad19d7dc34b749521bd5e06e4878b0" alt="Context field creation dropdown with field_name entered and type options including Boolean, Date, List, Number, and String." width="2178" height="1084" data-path="images/docs/65c573f-Screenshot_2024-07-18_at_17.00.34.png" />

    **Routing Conditions**

    Create a custom routing condition for each field and Variant by entering a value in the corresponding cell. By default, the `=` operator is used. Click `=` to change the operator.

    <Frame caption="Different operators are available depending on the field type.">
      <img src="https://mintcdn.com/orqai/x_6IXnot9ETOc_0g/images/docs/79ac6ff-Screenshot_2024-07-18_at_17.00.17.png?fit=max&auto=format&n=x_6IXnot9ETOc_0g&q=85&s=43d9d5d1b5516ed5f88eca7133be0e52" alt="Operator dropdown showing options: Is, Is not, Less than, Greater than, Less than or equal, and Greater than or equal." width="2108" height="856" data-path="images/docs/79ac6ff-Screenshot_2024-07-18_at_17.00.17.png" />
    </Frame>

    **Simulator**

    Routing can be tested at any time by opening the Simulator via the Simulator icon at the top-right of the Routing panel. Enter values for all field configurations and select **Simulate** to see which Variant the query routes to.
  </Tab>
</Tabs>

## Versioning

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Version control tracks all changes to the model and prompt configuration. A new commit is made on each deployment and history is preserved throughout. All changes can be viewed, and any prior version can be restored.

    **Deploying a New Version**

    When the configuration is ready, press the **Deploy** button on the Variant screen.

    <Frame caption="The Deploy button is enabled once there are changes to commit.">
      <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-code-snippet-button.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=5b63d03a432251cdd9bf531d6e6728b2" alt="Variant toolbar showing share, code snippet, history, and external link buttons alongside the Deploy button." width="266" height="48" data-path="images/deployment-code-snippet-button.png" />
    </Frame>

    The deployment modal prompts for the new version (Major or Minor bump), a description of the changes, and whether to deploy immediately or save as a draft.

    **Saving a Draft** commits the changes on a new version without making them publicly available. They become public on the next deployment.

    **Comparing Changes**

    Select the **Compare Changes** button at the top-right to visualize changes between configurations in a side-by-side JSON view. Restore a previous version by selecting it in the left panel and clicking **Restore**.

    <Frame caption="Side-by-side visualization of two versions of the same Variant.">
      <img src="https://mintcdn.com/orqai/x_6IXnot9ETOc_0g/images/docs/0ff5557-Screenshot_2024-05-05_at_17.32.13.png?fit=max&auto=format&n=x_6IXnot9ETOc_0g&q=85&s=3d1194edd9cb8920f8c3c68e6bffe0d5" alt="Prompt template changes dialog showing a side-by-side diff between Base v1.1 and Compare v1.0 Published, highlighting a tools array added in the newer version." width="2950" height="1968" data-path="images/docs/0ff5557-Screenshot_2024-05-05_at_17.32.13.png" />
    </Frame>
  </Tab>
</Tabs>

## Invoke a Deployment

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Use the **Code Snippet** button at the top-right of the Variant page to get ready-to-use integration code for Python, Node.js, and cURL. All snippets include the keys and context variables needed to reach the current Variant.

    <Frame caption="The Code Snippet button at the top-right of the Variant page.">
      <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-code-snippet-button.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=5b63d03a432251cdd9bf531d6e6728b2" alt="Variant toolbar showing share, code snippet, history, and external link buttons alongside the Deploy button." width="266" height="48" data-path="images/deployment-code-snippet-button.png" />
    </Frame>

    <Frame caption="The Code Snippet panel with all integration languages.">
      <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-code-snippet.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=9f1c49e029e46ab2213fd8d0cbf06761" alt="Invoke a Deployment dialog with cURL, Python, and TypeScript tabs, showing a curl command for city_weather_experiment_c3jt_49 with city and date as inputs." width="834" height="706" data-path="images/deployment-code-snippet.png" />
    </Frame>

    Code snippets per Variant are also accessible from the Routing page:

    1. Open a Deployment and go to the **Routing** page.

           <img src="https://mintcdn.com/orqai/aNCOui-yQmuILSqI/images/deployment-routing-menu.png?fit=max&auto=format&n=aNCOui-yQmuILSqI&q=85&s=49bfc17061ab17199b560f737fecde59" alt="The routing context menu on the Routing page showing options including Generate Code Snippet." width="898" height="57" data-path="images/deployment-routing-menu.png" />

    2. Right-click the target Variant and select **Generate Code Snippet**.
  </Tab>

  <Tab title="API & SDK" icon="code">
    Invoke a Deployment by sending a request to the `/v2/deployments/invoke` endpoint. **Orq.ai** routes the request to the correct Variant, applies all configured settings, and returns the model's response.

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
        --url https://api.orq.ai/v2/deployments/invoke \
        --header 'Authorization: Bearer <ORQ_API_KEY>' \
        --header 'Content-Type: application/json' \
        --data '{
          "key": "my-deployment",
          "context": {"environments": "production"}
        }'
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="my-deployment",
          context={"environments": "production"},
      )

      print(generation.choices[0].message.content)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const generation = await client.deployments.invoke({
        key: 'my-deployment',
        context: { environments: 'production' },
      });

      console.log(generation.choices[0].message.content);
      ```
    </CodeGroup>

    <Tip>See the full [Invoke API reference](/reference/deployments/invoke).</Tip>

    **Usage Tracking**

    Track token consumption for every deployment call by including usage metrics in the API response.

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
           --url https://api.orq.ai/v2/deployments/invoke \
           --header 'accept: application/json' \
           --header 'authorization: Bearer <orq-api-key>' \
           --header 'content-type: application/json' \
           --data '
      {
        "key": "my-deployment",
        "context": {
          "environment": "production"
        },
        "invoke_options": {
          "include_usage": true
        }
      }
      '
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="my-deployment",
          context={"environments": "production"},
          invoke_options={"include_usage": True}
      )

      print(f"Prompt tokens: {generation.usage.prompt_tokens}")
      print(f"Completion tokens: {generation.usage.completion_tokens}")
      print(f"Total tokens: {generation.usage.total_tokens}")
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const deployment = await client.deployments.invoke({
        key: 'my-deployment',
        context: { environments: 'production' },
        invokeOptions: { includeUsage: true },
      });

      console.log(`Prompt tokens: ${deployment.usage?.promptTokens}`);
      console.log(`Completion tokens: ${deployment.usage?.completionTokens}`);
      console.log(`Total tokens: ${deployment.usage?.totalTokens}`);
      ```
    </CodeGroup>

    The response includes `prompt_tokens`, `completion_tokens`, and `total_tokens`.

    **Identity**

    Associate an identity with deployment invocations for tracking and personalization.

    **Identity fields:**

    * `id`: Unique identifier for the identity (required).
    * `display_name`: Display name of the identity.
    * `email`: Email address of the identity.
    * `logo_url`: URL to the identity's avatar or logo.
    * `tags`: List of tags associated with the identity.

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
           --url https://api.orq.ai/v2/deployments/invoke \
           --header 'accept: application/json' \
           --header 'authorization: Bearer <orq-api-key>' \
           --header 'content-type: application/json' \
           --data '
      {
        "key": "my-deployment",
        "identity": {
          "id": "contact_01ARZ3NDEKTSV4RRFFQ69G5FAV",
          "display_name": "Jane Doe",
          "email": "jane.doe@example.com",
          "logo_url": "https://example.com/avatars/jane-doe.jpg",
          "tags": ["hr", "engineering"]
        }
      }
      '
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="my-deployment",
          identity={
              "id": "contact_01ARZ3NDEKTSV4RRFFQ69G5FAV",
              "display_name": "Jane Doe",
              "email": "jane.doe@example.com",
              "logo_url": "https://example.com/avatars/jane-doe.jpg",
              "tags": ["hr", "engineering"]
          }
      )

      print(generation.choices[0].message.content)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const deployment = await client.deployments.invoke({
        key: 'my-deployment',
        identity: {
          id: 'contact_01ARZ3NDEKTSV4RRFFQ69G5FAV',
          displayName: 'Jane Doe',
          email: 'jane.doe@example.com',
          logoUrl: 'https://example.com/avatars/jane-doe.jpg',
          tags: ['hr', 'engineering'],
        },
      });

      console.log(deployment?.choices[0].message.content);
      ```
    </CodeGroup>
  </Tab>
</Tabs>

### Extra Parameters

<Tabs>
  <Tab title="API & SDK" icon="code">
    Use `extra_params` to pass parameters not directly exposed by the **Orq.ai** panel, or to override existing model configuration at runtime.

    **Passing an unsupported parameter:**

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
           --url https://api.orq.ai/v2/deployments/invoke \
           --header 'accept: application/json' \
           --header 'authorization: Bearer <orq-api-key>' \
           --header 'content-type: application/json' \
           --data '
      {
        "key": "my-deployment",
        "context": { "environment": "production" },
        "extra_params": { "presence_penalty": 1.0 }
      }
      '
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="my-deployment",
          context={"environments": "production"},
          extra_params={"presence_penalty": 1.0}
      )

      print(generation.choices[0].message.content)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const deployment = await client.deployments.invoke({
        key: 'my-deployment',
        context: { environments: 'production' },
        extraParams: { presencePenalty: 1.0 },
      });

      console.log(deployment?.choices[0].message.content);
      ```
    </CodeGroup>

    <Warning>
      Overwriting existing parameters can impact the model configuration. Use with caution.
    </Warning>

    **Overwriting an existing parameter at runtime:**

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
           --url https://api.orq.ai/v2/deployments/invoke \
           --header 'accept: application/json' \
           --header 'content-type: application/json' \
           --data '
      {
        "key": "my-deployment",
        "context": { "environment": "production" },
        "extra_params": { "temperature": 0.4 }
      }
      '
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="my-deployment",
          context={"environments": "production"},
          extra_params={"temperature": 0.4}
      )

      print(generation.choices[0].message.content)
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const deployment = await client.deployments.invoke({
        key: 'my-deployment',
        context: { environments: 'production' },
        extraParams: { temperature: 0.4 },
      });

      console.log(deployment?.choices[0].message.content);
      ```
    </CodeGroup>
  </Tab>
</Tabs>

### Attach Files

<Tabs>
  <Tab title="API & SDK" icon="code">
    <Note>
      The `file_ids` / `fileIds` parameter on deployment invocations is deprecated and will be removed in a future release. Use native file attachment instead.
    </Note>

    Two options are available for attaching files to a Deployment:

    1. Send PDFs directly to the model in the invocation payload.
    2. Attach a [Knowledge Base](/docs/knowledge/overview) to the Deployment.

    **Sending PDFs Directly to the Model**

    <Warning>
      This feature is only supported with OpenAI, Anthropic, and Google Gemini models.
    </Warning>

    Embed files directly in the [Invoke](/reference/deployments/invoke) payload using a `file` type message with a standard data URI scheme: `data:content/type;base64` followed by the base64-encoded file data.

    <CodeGroup>
      ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
      curl --request POST \
           --url https://api.orq.ai/v2/deployments/invoke \
           --header 'accept: application/json' \
           --header 'authorization: Bearer <orq-api-key>' \
           --header 'content-type: application/json' \
           --data '
      {
        "key": "key",
        "messages": [
          {
            "role": "user",
            "content": [
              { "type": "text", "text": "prompt" },
              {
                "type": "file",
                "file": {
                  "file_data": "data:application/pdf;base64,<base64-encoded-data>"
                }
              }
            ]
          }
        ]
      }
      '
      ```

      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      generation = client.deployments.invoke(
          key="deployment_key",
          messages=[
              {
                  "role": "user",
                  "content": [
                      { "type": "text", "text": "prompt" },
                      {
                          "type": "file",
                          "file": {
                              "file_data": "data:application/pdf;base64,<base64-encoded-data>",
                              "filename": "filename"
                          }
                      }
                  ]
              }
          ],
          metadata={
              "user_id": "123",
              "session_id": "456",
          }
      )
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      const generation = await client.deployments.invoke({
        key: 'deployment_key',
        messages: [
          {
            role: 'user',
            content: [
              { type: 'text', text: 'prompt' },
              {
                type: 'file',
                file: {
                  fileData: 'data:application/pdf;base64,<base64-encoded-data>',
                  filename: 'filename.pdf'
                }
              }
            ]
          }
        ],
        metadata: { userId: '123', sessionId: '456' }
      });
      ```
    </CodeGroup>

    <Callout icon="hat-chef" color="#7ecece">
      See PDF inputs used to extract structured data end-to-end. Read our cookbook [PDF Extraction](/docs/tutorials/pdf-extraction).
    </Callout>

    **Knowledge Base vs. Direct File Attachment**

    **Use a Knowledge Base when:** the information is reused across many requests and RAG (targeted chunk retrieval) is sufficient. Knowledge Bases retrieve relevant chunks but not the full document.

    **Use direct file attachment when:** the task requires full-document understanding (e.g. summarization, legal review, detailed analysis), the document is ad-hoc or session-specific, or the data is too sensitive for a shared knowledge repository.

    <Info>
      Read how to set up a [Knowledge Base](/docs/knowledge/overview) or [use a Knowledge Base in a prompt](/docs/knowledge/overview#search-a-knowledge-base).
    </Info>
  </Tab>
</Tabs>

## Analytics and Logs

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Once a Deployment is running and receiving traffic, detailed analytics of all requests are available.

    **Logs** show requests per Variant. Filters available:

    * **Variant**: select a single Variant to filter logs.
    * **Evaluation**: **Matched** (a routing rule was matched) or **Default Matched** (no routing rule matched, default Variant was used).
    * **Source**: **API**, **SDK**, or **Simulator**.

    Click any log line to open a detail panel showing context, requests, and parameters sent to the Deployment.

    <Frame caption="Logs overview.">
      <img src="https://mintcdn.com/orqai/E8L3R46ivX7g9-QI/images/docs/cce77f539b0004784a155a2f329a5cde60cc2652d5c6e4e4b664d1b20a24aaca-Screenshot_2025-03-25_at_13.25.18.png?fit=max&auto=format&n=E8L3R46ivX7g9-QI&q=85&s=5b89f0d360ceb40a0846bff23192021d" alt="Logs tab for the NPS_functioncall deployment showing five entries for variant 4o using gpt-4o via OpenAI, all with status 200." width="2372" height="816" data-path="images/docs/cce77f539b0004784a155a2f329a5cde60cc2652d5c6e4e4b664d1b20a24aaca-Screenshot_2025-03-25_at_13.25.18.png" />
    </Frame>
  </Tab>
</Tabs>
