> ## Documentation Index
> Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Build Experiments

> Test prompts and models at scale. Compare performance metrics, evaluate outputs, and iterate on configurations via the AI Studio, API, or Orq MCP.

**Experiments** run model generations across a [Dataset](/docs/datasets/build) and record **Latency**, **Cost**, and **Time to First Token** for each generation. Results can be reviewed manually or scored automatically with [Evaluators](/docs/evaluators/build) and Human Reviews. For code-driven experiments, **Orq.ai** provides the **[evaluatorq](https://github.com/orq-ai/orqkit)** framework to define jobs, evaluators, and data sources programmatically and sync results back to the AI Studio.

<iframe src="https://www.youtube.com/embed/jSYIO4wuhrs" title="YouTube video player" frameborder="0" className="w-full aspect-video rounded-xl" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen />

## Use Cases

<AccordionGroup>
  <Accordion title="Compare models side by side" icon="sparkles">
    Run the same dataset through multiple models to compare output quality, cost, and latency. Works for newly released models, fine-tuned models, and private models added to the [AI Router](/docs/model-garden/overview).
  </Accordion>

  <Accordion title="Optimise prompts" icon="pen-to-square">
    Test multiple prompt variants on the same dataset. Use evaluators like Cosine Similarity to quantitatively assess which version produces the best results.
  </Accordion>

  <Accordion title="Pre-deployment and regression testing" icon="flask">
    Run experiments against your current prompt configuration before shipping changes. Use historical datasets to verify that updates haven't degraded performance in any area.
  </Accordion>

  <Accordion title="Security and red teaming" icon="shield-halved">
    Test how your model responds to jailbreak attempts and adversarial inputs in a controlled environment before putting it into production.
  </Accordion>
</AccordionGroup>

## Prerequisites

<CardGroup cols={3}>
  <Card title="Dataset" icon="database" href="/docs/datasets/build">
    A Dataset with Inputs, Messages, and/or Expected Outputs
  </Card>

  <Card title="AI Router" icon="code-fork" href="/docs/model-garden/overview">
    Models added to the AI Router
  </Card>

  <Card title="API Key" icon="key" href="/docs/administer/api-keys">
    An API Key (API and MCP only)
  </Card>
</CardGroup>

## Create an Experiment

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    In the AI Studio, choose a [Project](/docs/projects/overview) and folder, click the `+` button, and select **Experiment**.

    Select a [Dataset](/docs/datasets/build) and one or more models, then click **Create**. Use the search field to find datasets quickly.

    You are taken to the Experiment Studio where you configure data entries and tasks before running.
  </Tab>

  <Tab title="API & SDK" icon="code">
    Use the **[evaluatorq framework](https://github.com/orq-ai/orqkit)** to run experiments from code.

    **Install:**

    <CodeGroup>
      ```bash Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      pip install orq-ai-sdk
      pip install evaluatorq
      ```

      ```bash Node.js theme={"theme":{"light":"github-light","dark":"github-dark"}}
      npm install @orq-ai/evaluatorq
      npm install @orq-ai/node
      ```
    </CodeGroup>

    **Configure environment:**

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    export ORQ_API_KEY="your-api-key"
    export ORQ_ENV="production"
    export ORQ_EVALUATOR_ID="your-evaluator-ulid"  # optional
    ```

    <Warning>
      `ORQ_API_KEY` is required to invoke Deployments and Agents, run Evaluators, and sync results to the **Orq.ai** UI. Without it, experiments run locally only.
    </Warning>

    **Define your data.** Choose one of three approaches:

    <AccordionGroup>
      <Accordion title="Reference an existing Dataset (recommended)">
        <CodeGroup>
          ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
          from evaluatorq import DatasetIdInput

          dataset_id = "01ARZ3NDEKTSV4RRFFQ69G5FAV"
          # Pass DatasetIdInput directly to evaluatorq in the Run step
          ```

          ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
          import { DatasetIdInput } from "@orq-ai/evaluatorq";

          const datasetId = "01ARZ3NDEKTSV4RRFFQ69G5FAV";
          // Pass DatasetIdInput directly to evaluatorq in the Run step
          ```
        </CodeGroup>
      </Accordion>

      <Accordion title="Load from CSV or JSON">
        <CodeGroup>
          ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
          import csv, json
          from evaluatorq import DataPoint

          with open("test_data.csv", "r") as f:
              test_data = [DataPoint(inputs=row) for row in csv.DictReader(f)]

          # or from JSON
          with open("test_data.json", "r") as f:
              test_data = [DataPoint(inputs=item) for item in json.load(f)]
          ```

          ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
          import { DataPoint } from "@orq-ai/evaluatorq";
          import * as fs from "fs";
          import csv from "csv-parser";

          const data = JSON.parse(fs.readFileSync("test_data.json", "utf-8"));
          const testData = data.map((item: any) => ({ inputs: item }));
          ```
        </CodeGroup>
      </Accordion>

      <Accordion title="Define inline">
        <CodeGroup>
          ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
          from evaluatorq import DataPoint

          test_data = [
              DataPoint(inputs={"text": "Cinderella tells the story of a kind young woman..."}),
              DataPoint(inputs={"text": "Little Red Riding Hood follows a girl traveling..."}),
          ]
          ```

          ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
          import { DataPoint } from "@orq-ai/evaluatorq";

          const testData: DataPoint[] = [
              { inputs: { text: "Cinderella tells the story of a kind young woman..." } },
              { inputs: { text: "Little Red Riding Hood follows a girl traveling..." } },
          ];
          ```
        </CodeGroup>
      </Accordion>
    </AccordionGroup>

    <Tip>See the [evaluatorq Tutorial](/docs/tutorials/evaluator-q) for advanced patterns including third-party framework integration and CI/CD setup.</Tip>
  </Tab>

  <Tab title="MCP" icon="https://mintcdn.com/orqai/i7ZhKI7LFRfXU7ox/images/logos/mcp.svg?fit=max&auto=format&n=i7ZhKI7LFRfXU7ox&q=85&s=cef7916eb5fe1f6bb97541398d3f7639" width="16" height="16" data-path="images/logos/mcp.svg">
    **Create an experiment from an existing dataset:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create an experiment comparing GPT-5.2 and Claude Sonnet 4.6 using the "user-queries" dataset
    ```

    The assistant uses `search_entities` to find the dataset, then `create_experiment` with two model configurations and `auto_run` enabled.

    ***

    **Compare two prompt strategies:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create an experiment using the "customer-feedback" dataset with two prompts: one focused on empathy and one on brevity. Run it and summarize the results.
    ```

    The assistant uses `create_experiment` with two prompt variants and `auto_run` enabled, then `get_experiment_run` to retrieve and summarise the evaluation metrics.
  </Tab>
</Tabs>

### Configure Tasks

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    The left side of the Experiment table shows the loaded Dataset entries. Each row runs separately against each configured task.

    Add new test rows with the **Add Row** button. Edit Inputs, Messages, and Expected Outputs by selecting any cell.

    <Tip>
      Columns can be reorganised and hidden using the <Icon icon="ellipsis" /> menu.
    </Tip>

    <img src="https://mintcdn.com/orqai/E8L3R46ivX7g9-QI/images/docs/d22119f2c29008b097fa145ed1f71d86af4a37dc69acb0725774aa3f7b912673-iScreen_Shoter_-_Google_Chrome_-_250210113359.jpg?fit=max&auto=format&n=E8L3R46ivX7g9-QI&q=85&s=45278776ebcc821500e8971d89f72c23" alt="CS_demo experiment grid in Draft state showing Inputs, Messages, Expected Output, and Response columns with gpt-4o and claude-3-5-sonnet variants and 10 dataset rows." width="3542" height="1295" data-path="images/docs/d22119f2c29008b097fa145ed1f71d86af4a37dc69acb0725774aa3f7b912673-iScreen_Shoter_-_Google_Chrome_-_250210113359.jpg" />

    To add a task, open the sidebar and select **+Task**:

    <AccordionGroup>
      <Accordion title="Configure a Model" icon="cubes">
        Select a model to open the Prompt panel. Configure the prompt template using:

        * The **Messages** column from the dataset.
        * A configured **Prompt**.
        * A combination of both.

        <Frame caption="Open the Prompt panel by selecting the model name on the left panel.">
          <img src="https://mintcdn.com/orqai/kym08_pOTNRFhXF_/images/experiment-prompt-model.png?fit=max&auto=format&n=kym08_pOTNRFhXF_&q=85&s=a232f223f23af3981843d5166e9fb806" alt="Experiment view with the Prompt panel open on the right, showing model settings for gpt-4.2 including temperature, max tokens, and messaging column configuration." width="1629" height="960" data-path="images/experiment-prompt-model.png" />
        </Frame>

        <Info>
          To learn more about Prompt Template configuration, see [Creating a Prompt](/docs/prompts/creating).
        </Info>
      </Accordion>

      <Accordion title="Configure an Agent" icon="robot">
        Choose an Agent from the **+Task** menu. Its configuration is automatically loaded as a new column.

        The agent prompt can use:

        * **Instructions + Messages** only.
        * **Instructions + Dataset Messages** column.

        <Frame caption="Open the Prompt panel by selecting the Agent name on the left panel.">
          <img src="https://mintcdn.com/orqai/kym08_pOTNRFhXF_/images/experiment-agent.png?fit=max&auto=format&n=kym08_pOTNRFhXF_&q=85&s=4bd216afead03e625a6757fa34897cf9" alt="Experiment view with the Agent panel open on the right, showing the bank_creditcard_agent_gpt_4.2 agent with instructions for Dutch Royal Bank Credit Card Support." width="1632" height="955" data-path="images/experiment-agent.png" />
        </Frame>

        <Info>
          To learn more about Agent configuration, see [Build Agents](/docs/agents/build).
        </Info>
      </Accordion>
    </AccordionGroup>
  </Tab>

  <Tab title="API & SDK" icon="code">
    Define jobs using the `@job` decorator (Python) or `job()` function (TypeScript). Each job defines one variant to test.

    <CodeGroup>
      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      import asyncio, os
      from evaluatorq import job, DataPoint
      from orq_ai_sdk import Orq

      orq_client = Orq(
          api_key=os.getenv("ORQ_API_KEY"),
          server_url=os.getenv("ORQ_SERVER_URL", "https://my.orq.ai")
      )

      def extract_response_text(response):
          if hasattr(response, "output") and response.output:
              if isinstance(response.output, list) and len(response.output) > 0:
                  part = response.output[0]
                  if hasattr(part, "parts") and part.parts:
                      return part.parts[0].text if hasattr(part.parts[0], "text") else str(part.parts[0])
          if hasattr(response, "content"):
              if isinstance(response.content, list):
                  return " ".join(part.text if hasattr(part, "text") else str(part) for part in response.content)
              return str(response.content)
          return str(response)

      @job("summarize-variant-a")
      async def summarize_variant_a(data: DataPoint, row: int):
          response = await asyncio.to_thread(
              orq_client.deployments.invoke,
              key="summarization_v2",
              context={"environments": [], "reasoning": ["minimal"]},
              inputs={"text": data.inputs["text"]},
          )
          return {"variant": "variant-a", "input": data.inputs["text"], "summary": extract_response_text(response)}

      @job("summarize-variant-b")
      async def summarize_variant_b(data: DataPoint, row: int):
          response = await asyncio.to_thread(
              orq_client.deployments.invoke,
              key="summarization_v2",
              context={"environments": [], "reasoning": ["medium"]},
              inputs={"text": data.inputs["text"]},
          )
          return {"variant": "variant-b", "input": data.inputs["text"], "summary": extract_response_text(response)}
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      import { job, DataPoint } from "@orq-ai/evaluatorq";
      import { Orq } from "@orq-ai/node";

      const orqClient = new Orq({
          apiKey: process.env.ORQ_API_KEY,
          serverUrl: process.env.ORQ_SERVER_URL || "https://my.orq.ai",
      });

      function extractResponseText(response: any): string {
          if (response?.output?.[0]?.parts?.[0]?.text) return response.output[0].parts[0].text;
          if (Array.isArray(response?.content)) return response.content.map((p: any) => p.text || String(p)).join(" ");
          if (response?.content) return String(response.content);
          return String(response);
      }

      const summarizeVariantA = job("summarize-variant-a", async (data: DataPoint) => {
          const response = await orqClient.deployments.invoke({
              key: "summarization_v2",
              context: { environments: [], reasoning: ["minimal"] },
              inputs: { text: data.inputs.text as string },
          });
          return { variant: "variant-a", input: data.inputs.text, summary: extractResponseText(response) };
      });

      const summarizeVariantB = job("summarize-variant-b", async (data: DataPoint) => {
          const response = await orqClient.deployments.invoke({
              key: "summarization_v2",
              context: { environments: [], reasoning: ["medium"] },
              inputs: { text: data.inputs.text as string },
          });
          return { variant: "variant-b", input: data.inputs.text, summary: extractResponseText(response) };
      });
      ```
    </CodeGroup>

    <Tip>Jobs can invoke [Deployments](/docs/deployments/overview), [Agents](/docs/agents/build), or [Prompts](/docs/prompts/overview). Third-party frameworks (LangGraph, CrewAI, LlamaIndex, AutoGen) can be integrated to compare against Orq features side-by-side.</Tip>
  </Tab>
</Tabs>

#### Variables and Prompt Templating

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Reference dataset inputs in your prompt using `{{variable_name}}`. Values come from the **Inputs** column and are substituted per row when the experiment runs.

    Select the **Template Engine** from the Prompt Settings panel:

    * **Text** (default): `{{double_braces}}` syntax.
    * **Jinja**: conditionals, loops, filters, and more.
    * **Mustache**: logic-less templating with sections.

    <Frame caption="Select a Template Engine in the Prompt Settings panel.">
      <img src="https://mintcdn.com/orqai/HVm7-3vBg7cwVv2-/images/experiment-engine.png?fit=max&auto=format&n=HVm7-3vBg7cwVv2-&q=85&s=52c933788aa84e0eed529e5f66fbfe54" alt="Engine dropdown in the Prompt panel with Jinja selected and options for Text, Jinja, and Mustache." width="553" height="243" data-path="images/experiment-engine.png" />
    </Frame>

    <Tabs>
      <Tab title="Jinja">
        <Steps>
          <Step title="Prompt template">
            ```jinja theme={"theme":{"light":"github-light","dark":"github-dark"}}
            You are a support assistant for {{company_name}}.

            {% if user_tier == "premium" %}
            {{customer_name}} is a premium customer. Greet them by name with priority support and a 2-hour SLA.
            {% else %}
            {{customer_name}} is on the free plan. Standard response time is 24 hours.
            {% endif %}
            ```
          </Step>

          <Step title="Dataset inputs">
            ```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
            { "company_name": "Acme", "customer_name": "Sarah", "user_tier": "premium" }
            ```
          </Step>

          <Step title="Rendered prompt">
            ```text wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
            You are a support assistant for Acme.

            Sarah is a premium customer. Greet them by name with priority support and a 2-hour SLA.
            ```
          </Step>
        </Steps>
      </Tab>

      <Tab title="Mustache">
        <Steps>
          <Step title="Prompt template">
            ```handlebars theme={"theme":{"light":"github-light","dark":"github-dark"}}
            You are a support assistant for {{company_name}}.

            {{# is_premium}}
            {{customer_name}} is a premium customer. Priority support with a 2-hour SLA.
            {{/ is_premium}}
            {{^ is_premium}}
            {{customer_name}} is on the free plan. Standard response time is 24 hours.
            {{/ is_premium}}
            ```
          </Step>

          <Step title="Dataset inputs">
            ```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
            { "company_name": "Acme", "customer_name": "Sarah", "is_premium": true }
            ```
          </Step>

          <Step title="Rendered prompt">
            ```text wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
            You are a support assistant for Acme.

            Sarah is a premium customer. Priority support with a 2-hour SLA.
            ```
          </Step>
        </Steps>
      </Tab>
    </Tabs>

    <Info>
      For a complete reference of template features, see [Prompt Templating](/docs/prompts/templating).
    </Info>
  </Tab>
</Tabs>

#### Tool Calls for Agents

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    When using agents, attach **executable tools** that run in real-time during the experiment. These perform actual operations (HTTP requests, Python code, MCP calls).

    1. Open the agent configuration panel.
    2. Select **Add Tool** in the **Tools** section.
    3. Choose from available tools in your project.

    <Info>
      See [Build Agents](/docs/agents/build) for full tool configuration options.
    </Info>
  </Tab>
</Tabs>

#### Tool Calls for Prompts (Historical Testing)

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Add a **historical Tool Call** chain to a model's execution to test how it handles specific tool payloads or error scenarios.

    <Warning>
      These tool calls are **simulated and do not execute**. They provide historical context to test function calling behaviour. For real executable tools, use [Tool Calls for Agents](#tool-calls-for-agents) above.
    </Warning>

    Use the <Icon icon="wrench" /> button to add a tool call to any message. Configure:

    * **Function Name**: which tool was called.
    * **Input**: the payload sent to the tool.
    * **Output**: the response the tool returned.

    <Frame caption="Configuring a tool call input and output.">
      <img src="https://mintcdn.com/orqai/598O1ftLlq3U7tj-/images/add-tool-call-experiment.png?fit=max&auto=format&n=598O1ftLlq3U7tj-&q=85&s=ee6c44fc7842df5a6cbcc89d8906085e" alt="Add Tool Call Experiment" className="mx-auto" style={{width:"79%"}} width="501" height="760" data-path="images/add-tool-call-experiment.png" />
    </Frame>
  </Tab>
</Tabs>

### Configure Evaluators

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    To add an Evaluator, go to the right of the Experiment table and select **Add new Column > Evaluator**.

    The panel shows all Evaluators available in the current [Project](/docs/projects/overview). Enable the toggle to add an Evaluator as a new column.

    <img src="https://mintcdn.com/orqai/x_6IXnot9ETOc_0g/images/docs/51205d5b61a55af182a21c5c3e85f2e86ad55e31736f56963d6481ba50689285-Screenshot_2025-02-10_at_11.47.17.png?fit=max&auto=format&n=x_6IXnot9ETOc_0g&q=85&s=90a01b49ad854f7054610f5bba07c059" alt="Evaluators selection panel showing available evaluators including Contains Any, Contains None, Context Recall, Cosine Similarity, demo-evaluator, demo-json, Fact Checking Knowledge Base, and Factchecker with toggle controls." width="2486" height="1706" data-path="images/docs/51205d5b61a55af182a21c5c3e85f2e86ad55e31736f56963d6481ba50689285-Screenshot_2025-02-10_at_11.47.17.png" />

    <Info>
      To add Evaluators to your project, see [Evaluators](/docs/evaluators/build). Import from the [Hub](/docs/hub/overview#evaluators) or create a custom [LLM Evaluator](/docs/evaluators/build#llm-evaluator).
    </Info>
  </Tab>

  <Tab title="API & SDK" icon="code">
    Define evaluators as async functions that return an `EvaluationResult` with a score (0.0 to 1.0) and an explanation.

    <AccordionGroup>
      <Accordion title="Local evaluator" icon="code">
        <CodeGroup>
          ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
          from evaluatorq import EvaluationResult

          async def word_count_scorer(params):
              word_count = len(params["output"].get("summary", "").split())
              if word_count >= 10:
                  return EvaluationResult(value=1.0, explanation=f"Sufficient ({word_count} words)")
              elif word_count >= 5:
                  return EvaluationResult(value=0.5, explanation=f"Partial ({word_count} words)")
              else:
                  return EvaluationResult(value=0.0, explanation=f"Too short ({word_count} words)")
          ```

          ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
          const wordCountScorer = async (params: any) => {
              const wordCount = (params?.output?.summary || "").split(" ").filter((w: string) => w.length > 0).length;
              return {
                  value: wordCount >= 10 ? 1.0 : wordCount >= 5 ? 0.5 : 0.0,
                  explanation: `Word count: ${wordCount}`,
              };
          };
          ```
        </CodeGroup>
      </Accordion>

      <Accordion title="Orq Evaluator" icon="brain">
        <CodeGroup>
          ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
          import asyncio, os
          from evaluatorq import EvaluationResult

          EVAL_ID = os.environ.get("ORQ_EVALUATOR_ID", "your-evaluator-id")

          async def summarization_quality_scorer(params):
              data, output = params["data"], params["output"]
              source_text = (data.inputs.get("text") or "").strip()
              summary = (output.get("summary") or "").strip()
              if not summary or not source_text:
                  return EvaluationResult(value=0.0, explanation="Missing source or summary")
              evaluation = await asyncio.to_thread(
                  orq_client.evals.invoke,
                  id=EVAL_ID, query=source_text, output=summary,
                  reference=None, messages=[], retrievals=[],
              )
              return EvaluationResult(value=float(evaluation.value.value), explanation=str(evaluation.value.explanation or ""))
          ```

          ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
          const EVAL_ID = process.env.ORQ_EVALUATOR_ID || "your-evaluator-id";

          const summarizationQualityScorer = async (params: any) => {
              const sourceText = (params.data?.inputs?.text || "").trim();
              const summary = (params.output?.summary || "").trim();
              if (!summary || !sourceText) return { value: 0.0, explanation: "Missing source or summary" };
              const evaluation = await orqClient.evals.invoke({
                  id: EVAL_ID, query: sourceText, output: summary,
                  reference: undefined, messages: [], retrievals: [],
              });
              return { value: parseFloat(evaluation.value.value), explanation: evaluation.value.explanation || "" };
          };
          ```
        </CodeGroup>
      </Accordion>

      <Accordion title="Third-party evaluator (DeepEval)" icon="puzzle">
        <CodeGroup>
          ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
          from evaluatorq import EvaluationResult

          async def deepeval_relevancy_scorer(params):
              from deepeval.metrics import AnswerRelevancyMetric
              from deepeval.test_case import LLMTestCase
              source_text = (params["data"].inputs.get("text") or "").strip()
              summary = (params["output"].get("summary") or "").strip()
              if not summary or not source_text:
                  return EvaluationResult(value=0.0, explanation="Missing source or summary")
              metric = AnswerRelevancyMetric(threshold=0.5)
              test_case = LLMTestCase(input=source_text, actual_output=summary)
              result = await asyncio.to_thread(metric.measure, test_case)
              return EvaluationResult(value=float(result.score), explanation=f"DeepEval relevancy: {result.score:.2f}")
          ```
        </CodeGroup>
      </Accordion>
    </AccordionGroup>

    <Tip>See the [evaluatorq Tutorial](/docs/tutorials/evaluator-q) for more evaluator patterns including Ragas and other frameworks.</Tip>
  </Tab>

  <Tab title="MCP" icon="https://mintcdn.com/orqai/i7ZhKI7LFRfXU7ox/images/logos/mcp.svg?fit=max&auto=format&n=i7ZhKI7LFRfXU7ox&q=85&s=cef7916eb5fe1f6bb97541398d3f7639" width="16" height="16" data-path="images/logos/mcp.svg">
    **Create an experiment with an evaluator:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create an experiment from the "qa-dataset" dataset with the "tone-scorer" evaluator attached
    ```

    The assistant uses `search_entities` to find the dataset and evaluator, then `create_experiment` with both the dataset ID and evaluator ID, with `auto_run` enabled.

    ***

    **Create an evaluator first, then run an experiment:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create an LLM-as-a-Judge evaluator that scores responses on tone, then run an experiment on the "customer-feedback" dataset using that evaluator
    ```

    The assistant uses `create_llm_eval` to create the evaluator, then `create_experiment` with the returned evaluator key.
  </Tab>
</Tabs>

#### Human Reviews

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    To add a Human Review column, find the **Human Review** panel and select **Add Human Review**.

    <Frame caption="Human Reviews appear as a new column. Each output can be reviewed individually.">
      <img src="https://mintcdn.com/orqai/3nt6UkYDp2QEiEBs/images/human-review-experiment.png?fit=max&auto=format&n=3nt6UkYDp2QEiEBs&q=85&s=cb26a4593588ca1bea7b3ba0fec797a4" alt="Experiment grid with a Select Feedback dialog open showing Good and Bad options with an explanation field, and Bad selected with the note Could've offered a link to relevant documentation." width="1118" height="668" data-path="images/human-review-experiment.png" />
    </Frame>

    <Info>
      To learn more, see [Human Reviews](/docs/projects/human-review).
    </Info>
  </Tab>
</Tabs>

## Run an Experiment

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Click the **Run** button to start the experiment. Depending on the dataset size, all generations may take a few minutes to complete. The status changes to **Completed** when done.

    <Info>
      To start a new iteration with different prompts or data, use the **New Run** button. A new Experiment Run is created in **Draft** state.
    </Info>
  </Tab>

  <Tab title="API & SDK" icon="code">
    Pass your data, jobs, and evaluators to `evaluatorq()`:

    <CodeGroup>
      ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
      import asyncio
      from evaluatorq import evaluatorq, DatasetIdInput

      async def main():
          await evaluatorq(
              "compare-summarization-variants",
              data=DatasetIdInput(dataset_id="01ARZ3NDEKTSV4RRFFQ69G5FAV"),
              jobs=[summarize_variant_a, summarize_variant_b],
              evaluators=[
                  {"name": "word-count", "scorer": word_count_scorer},
                  {"name": "quality", "scorer": summarization_quality_scorer},
              ],
          )

      if __name__ == "__main__":
          asyncio.run(main())
      ```

      ```typescript TypeScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
      import { evaluatorq } from "@orq-ai/evaluatorq";

      await evaluatorq("compare-summarization-variants", {
          data: { datasetId: "01ARZ3NDEKTSV4RRFFQ69G5FAV" },
          jobs: [summarizeVariantA, summarizeVariantB],
          evaluators: [
              { name: "word-count", scorer: wordCountScorer },
              { name: "quality", scorer: summarizationQualityScorer },
          ],
      });
      ```
    </CodeGroup>

    Once complete, `evaluatorq` prints a summary table in the terminal and a URL to the results in the **Orq.ai** AI Studio.

    <Frame caption="Terminal output after an experiment run.">
      <img src="https://mintcdn.com/orqai/UyqHKZasjtJIMOwi/images/terminal-evaluatorq.png?fit=max&auto=format&n=UyqHKZasjtJIMOwi&q=85&s=e5663a845a87ced172a406edb5831c95" alt="Terminal output showing an evaluation completed summary with 3 total data points, quality and word-count evaluator scores, and a link to view results in Orq.ai." width="1385" height="704" data-path="images/terminal-evaluatorq.png" />
    </Frame>

    **Add evaluators from the UI after a code run:**

    Once the experiment completes, attach evaluators and re-run evaluations directly in the AI Studio without touching code. Use the <Icon icon="circle-plus" /> **Evaluator** button to attach any evaluator and trigger a new evaluation pass.

    <Frame caption="Use the + Evaluator button to attach evaluators to an evaluatorq experiment run.">
      <img src="https://mintcdn.com/orqai/6kvJGT17Rfyyilmw/images/evaluatorq-evaluators-ui.png?fit=max&auto=format&n=6kvJGT17Rfyyilmw&q=85&s=4d25f778efe3b5dd2762a757ded2cef8" alt="vercel-multi-agent-eval experiment Run view showing Tasks (research-agent, math-agent) and Evaluators (city-relevance, correctness, quality-rubric, tool-usage, length_less_than_uqrv, llm_evaluator_tmmq) in the left panel." width="1724" height="944" data-path="images/evaluatorq-evaluators-ui.png" />
    </Frame>

    <Card title="evaluatorq Tutorial" icon="book-open" href="/docs/tutorials/evaluator-q" arrow="true">
      Advanced patterns: comparing Deployments and Agents, third-party framework integration, multi-job workflows, CI/CD integration.
    </Card>

    <Card title="Red Teaming LLMs with evaluatorq" icon="shield-halved" href="/docs/tutorials/red-teaming" arrow="true">
      Probing LLM deployments and agents for security vulnerabilities using the evaluatorq red teaming CLI.
    </Card>
  </Tab>

  <Tab title="MCP" icon="https://mintcdn.com/orqai/i7ZhKI7LFRfXU7ox/images/logos/mcp.svg?fit=max&auto=format&n=i7ZhKI7LFRfXU7ox&q=85&s=cef7916eb5fe1f6bb97541398d3f7639" width="16" height="16" data-path="images/logos/mcp.svg">
    **Run an experiment with auto-run enabled:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Create an experiment comparing GPT-5.2 and Claude Sonnet 4.6 on the "user-queries" dataset and run it automatically
    ```

    The assistant uses `create_experiment` with `auto_run: true` and returns the experiment ID once both configurations have run.

    ***

    **List recent runs:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Show me the latest experiment runs in my workspace
    ```

    The assistant uses `list_experiment_runs` with cursor pagination to retrieve recent runs.
  </Tab>
</Tabs>

### Evaluation-Only Mode

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    To score existing responses in your dataset without generating new outputs:

    1. Set up the experiment with a dataset that already contains responses in the **Messages** column.
    2. Do not select a prompt during setup.
    3. Add your evaluators.
    4. Run the experiment.
  </Tab>
</Tabs>

### Run a Single Prompt

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    To run one task against the existing dataset without re-running everything, click <Icon icon="angle-down" /> next to the task and choose **Run**.

    <Frame>
      <img src="https://mintcdn.com/orqai/7Dru9SOm-qTNSU3m/images/contextual-menu-for-experiment-model-run.png?fit=max&auto=format&n=7Dru9SOm-qTNSU3m&q=85&s=bb909234906320056e81df8d86e23e4d" alt="Context menu on the gpt-5-mini column header showing options: Run, Settings, Duplicate, Hide Column, and Delete." width="390" height="293" data-path="images/contextual-menu-for-experiment-model-run.png" />
    </Frame>
  </Tab>
</Tabs>

### Partial Runs

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Hover on a single cell and click <Icon icon="arrows-rotate-reverse" /> to re-run that row only.

    <img src="https://mintcdn.com/orqai/pyKgLAXgUb0ooMkd/images/re-run-prompt.png?fit=max&auto=format&n=pyKgLAXgUb0ooMkd&q=85&s=bbad88effac57b992000d893d1cb5d6a" alt="Re Run Prompt" className="mx-auto" style={{width:"61%"}} width="325" height="200" data-path="images/re-run-prompt.png" />

    Select **Partial Run** from the Run menu to re-run all cells that are in Error or have not been run yet.

    <img src="https://mintcdn.com/orqai/pyKgLAXgUb0ooMkd/images/partial-run-experiment.png?fit=max&auto=format&n=pyKgLAXgUb0ooMkd&q=85&s=cd3602a9b84e106b35e16e0280532b5f" alt="Partial Run" className="mx-auto" style={{width:"63%"}} width="553" height="278" data-path="images/partial-run-experiment.png" />
  </Tab>
</Tabs>

### Add Evaluators After Running

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Add extra Evaluators or Human Reviews to an already-completed run. Use the drop-down on the Evaluator column to run only the newly added evaluations without re-running model generations.

    <Frame caption="Use the drop-down on your Evaluator column to run newly added Evaluations.">
      <img src="https://mintcdn.com/orqai/MIQvMD51vcgugI2x/images/experiment-extra-evaluator.png?fit=max&auto=format&n=MIQvMD51vcgugI2x&q=85&s=f1ea64cc8011821903aff127991c56c9" alt="Experiment Extra Evaluator" className="mx-auto" style={{width:"67%"}} width="708" height="474" data-path="images/experiment-extra-evaluator.png" />
    </Frame>
  </Tab>
</Tabs>

## View Results

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Once the experiment status changes to **Completed**, open the **Review** tab.

    <Frame caption="The Review tab provides detailed insights into individual responses and their metrics.">
      <img src="https://mintcdn.com/orqai/kym08_pOTNRFhXF_/images/experiment-review.png?fit=max&auto=format&n=kym08_pOTNRFhXF_&q=85&s=be360a70813460d7f621ad36a1e85f35" alt="Review tab for demo-experiment showing response 1 of 48 with a gpt-5 prompt, Feedback quality slider, HumanReview sentiment buttons, and BERT evaluator scores." width="1425" height="758" data-path="images/experiment-review.png" />
    </Frame>

    The Review tab has two views:

    * <Icon icon="eye" /> **Review**: inspect each model output individually.
    * <Icon icon="columns" /> **Compare**: view multiple model outputs side by side.
  </Tab>

  <Tab title="API & SDK" icon="code">
    Results sync to the **Orq.ai** AI Studio automatically when `ORQ_API_KEY` is set. The framework prints the experiment URL at the end of the run.

    <Frame caption="The Orq.ai UI after a code-triggered experiment run.">
      <img src="https://mintcdn.com/orqai/UyqHKZasjtJIMOwi/images/ui-evaluatorq.png?fit=max&auto=format&n=UyqHKZasjtJIMOwi&q=85&s=5303ed03daa833feb0b459014da6f3d0" alt="compare-summarization experiment Run #5 showing Tasks (summarize-variant-a, summarize-variant-b) with input texts and quality evaluator scores for each row." width="1799" height="570" data-path="images/ui-evaluatorq.png" />
    </Frame>

    [LangGraph](/docs/proxy/frameworks/langgraph) and [Vercel AI SDK](/docs/proxy/frameworks/vercel-ai) agent executions are fully visualised in the UI, including individual steps and tool invocations.

    <Frame caption="Vercel AI SDK execution trace with all agent steps and tool invocations visible in Orq.ai.">
      <img src="https://mintcdn.com/orqai/6kvJGT17Rfyyilmw/images/evaluatorq-vercel.png?fit=max&auto=format&n=6kvJGT17Rfyyilmw&q=85&s=4044de000baf76ddb26d19b0d68e75b7" alt="vercel-multi-agent-eval experiment Review showing Job 1 of 8 for research-agent with a knowledgeBase tool call using topic population of Tokyo 2023, and evaluator scores in the Feedback panel." width="1724" height="816" data-path="images/evaluatorq-vercel.png" />
    </Frame>
  </Tab>

  <Tab title="MCP" icon="https://mintcdn.com/orqai/i7ZhKI7LFRfXU7ox/images/logos/mcp.svg?fit=max&auto=format&n=i7ZhKI7LFRfXU7ox&q=85&s=cef7916eb5fe1f6bb97541398d3f7639" width="16" height="16" data-path="images/logos/mcp.svg">
    **Export results:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Export the latest experiment run as CSV
    ```

    The assistant uses `list_experiment_runs` to find the most recent run, then `get_experiment_run` with CSV export format and returns a signed download URL.

    ***

    **Get results for a specific run:**

    ```prompt wrap theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Show me the results for experiment run ID "01ARZ3NDEKTSV4RRFFQ69G5FAV"
    ```

    The assistant uses `get_experiment_run` to retrieve the full run including all evaluation scores.
  </Tab>
</Tabs>

### Column Result Overview

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Each response column shows an aggregated summary at the top: average evaluator score, latency, and cost across all rows.

    <Frame caption="Each response column shows an aggregated result overview: evaluator score, latency, and cost.">
      <img src="https://mintcdn.com/orqai/1LTUSDrjrmE49Lpa/images/experiments-result-overview.png?fit=max&auto=format&n=1LTUSDrjrmE49Lpa&q=85&s=38837b45d077149d2481bc19fb15099b" alt="Experiment results grid showing gpt-4o and basic_translator variant columns with a tooltip over gpt-4o showing Pass Rate 33%, Avg. Latency 2,354ms, Avg. Cost $0.00218, Input Tokens 2,376, and Total Tokens 3,090." width="1097" height="340" data-path="images/experiments-result-overview.png" />
    </Frame>
  </Tab>
</Tabs>

### Review Mode

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    The **Review** mode shows each output individually with:

    * **Inputs and Outputs**: full conversation context with system prompts, user messages, and model responses.
    * **Metrics**: latency, TTFT, token usage breakdown, cost, model details, streaming status.
    * **Human Review and Feedback**: rate and annotate outputs.
    * **Defects and Evaluators**: automated evaluation results.

    Use <Icon icon="chevron-down" /> / <Icon icon="chevron-up" /> or `J`/`K` to navigate between responses.

    <Note>
      Annotations and Human Reviews can only be added in the Review tab. Compare mode is read-only.
    </Note>
  </Tab>
</Tabs>

### Compare Mode

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Visualise multiple model executions side by side. Variables and Expected Outputs are shown on the left. Evaluator scores appear at the bottom.

    <Frame caption="View multiple model generations side by side.">
      <img src="https://mintcdn.com/orqai/kym08_pOTNRFhXF_/images/models-comparison-experiment.png?fit=max&auto=format&n=kym08_pOTNRFhXF_&q=85&s=4e8a2c14d4617a45060b4db169ef706a" style={{width:"100%"}} alt="View multiple model generations side by side." width="1625" height="904" data-path="images/models-comparison-experiment.png" />
    </Frame>
  </Tab>
</Tabs>

### Tool Call History

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    When reviewing a model execution, see the step-by-step tool call history including payloads sent and responses received.

    <Frame caption="See the model interpretation and reasoning around each tool call.">
      <img src="https://mintcdn.com/orqai/-GOD-4cxQAoeO49V/images/experiment-tool-history.png?fit=max&auto=format&n=-GOD-4cxQAoeO49V&q=85&s=0713306205e65f9995fe019cfefcdcd1" className="mx-auto" style={{width:"59%"}} alt="See the model interpretation and reasoning around each tool call." width="562" height="1221" data-path="images/experiment-tool-history.png" />
    </Frame>
  </Tab>
</Tabs>

### Multiple Runs

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    Use the **Runs** tab to see all previous runs for an experiment and compare Evaluator results across runs at a glance.

    <Frame caption="See at a glance how results evolved between two experiment runs.">
      <img src="https://mintcdn.com/orqai/x_6IXnot9ETOc_0g/images/docs/1c014acf1a2cbb955d4e565534869748d12433e5badf78eb667036dd2b216dda-image.png?fit=max&auto=format&n=x_6IXnot9ETOc_0g&q=85&s=1f49462a49498c48fdb684cc4b9b336c" alt="Runs tab for a New experiment showing a table with Status, Prompt, Cosine Similarity, JSON Schema Evaluator, Run, Creator, and Added columns, listing two Completed runs using gpt-4.1." width="2306" height="688" data-path="images/docs/1c014acf1a2cbb955d4e565534869748d12433e5badf78eb667036dd2b216dda-image.png" />
    </Frame>
  </Tab>
</Tabs>

### Export Results

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    <Frame caption="Exports are available after an experiment runs successfully.">
      <img src="https://mintcdn.com/orqai/kym08_pOTNRFhXF_/images/experiment-export.png?fit=max&auto=format&n=kym08_pOTNRFhXF_&q=85&s=7af0843af89d7f0866ce573f29ec170a" alt="Experiment context menu showing Edit, Duplicate, Share, Export with CSV, JSON, and JSON Lines options, Move to, and Delete." width="524" height="312" data-path="images/experiment-export.png" />
    </Frame>

    The exported file contains: datasets, model configuration, responses, metrics (including Time to First Token), and Human Reviews.

    <Frame caption="Example CSV export: each column holds data entries and generated responses.">
      <img src="https://mintcdn.com/orqai/E8L3R46ivX7g9-QI/images/docs/df7d94697a5462ba1c1a6aa7e882abadb21456231209220ed5931f1944dc81a1-Screenshot_2025-03-25_at_14.16.29.png?fit=max&auto=format&n=E8L3R46ivX7g9-QI&q=85&s=482ad112bbbb743cd52cbcc9102d8c87" alt="CSV export table showing experiment log rows with timestamp, status, model, template, context, reference, and llm_response columns for gpt-3.5-turbo and meta-llama models answering questions about historical figures." width="3134" height="1150" data-path="images/docs/df7d94697a5462ba1c1a6aa7e882abadb21456231209220ed5931f1944dc81a1-Screenshot_2025-03-25_at_14.16.29.png" />
    </Frame>
  </Tab>
</Tabs>

### Duplicate an Experiment

<Tabs>
  <Tab title="AI Studio" icon="https://mintcdn.com/orqai/My16MDKJXrKALEHC/images/logos/ai-studio-round.svg?fit=max&auto=format&n=My16MDKJXrKALEHC&q=85&s=ac04dd509320d58ab9701cb6d6137733" width="100" height="100" data-path="images/logos/ai-studio-round.svg">
    To duplicate an experiment with all its configuration (dataset, prompts, evaluators):

    1. Open the experiment.
    2. Click <Icon icon="ellipsis" /> in the top-right corner.
    3. Select **Duplicate**.
    4. Provide a new name and click **Confirm**.
  </Tab>
</Tabs>
