Skip to main content
Once you have a Deployment with model configurations ready to be exposed to your users, you can start the integration process which involves invoking your deployments from within your environments. In this document, we will see how to fetch prepared code snippets for your deployment and use them to integrate orq.ai in your systems.
If you don’t have Deployments ready to be integrated, see Creating a Deployment.

Getting Code Snippets

The first step for integration is fetching the code related to the chosen Deployment. Each Deployment can contain several variants.
Variant exposition is configured through Routing, to learn more see Deployment Routing.
You can see a snippet for a Variant in two ways:

Via the Routing Page

  • Open a Deployment and go to the Routing Page.
  • Right-Click on the Variant you want to integrate.
  • Select Generate Code Snippet

Via the Variant Page

  • Open a Deployment and go to the Variant Page.
  • Press the Code Snippet icon at the top-right of the Studio.

Code snippet button

The following panel will open:
In this panel, all context attributes will be filled correctly so that your Routing rules are respected. To learn more about context attributes and routing, see Deployment Routing.

This panel contains the code necessary to deploy the selected Variant.

Using Code Snippet

You have multiple integration languages available to integrate your Deployment. Currently we support Python, Javascript (node) and shell (cURL).

Getting Credentials

The first step for an integration is to have an API key ready to be used.
If you don’t have an API key yet, you can fetch one from your panel, see how in our Authentication Documentation.

Initializing a client

Depending on the chosen programming language, you will have different methods to initialize your client. All methods require the previously acquired API Key.
To learn more about client initialization, see our authentication tutorial using our Client Libraries

Invoking a deployment

Once your authentication layer is ready, you can Invoke your Deployment. Invoking means sending a query to the underlying model, which can include your user’s request; orq.ai takes care of operations to reach the correct language model with all prepared configurations and returns the model’s response immediately.
To learn more about Deployment Invocation, see our tutorial using our Client Libraries
Once you have invoked a first deployment, look into our SDKs for available options and calls:

Extra Parameters

This is a powerful tool to access parameters not directly exposed by the Orq.ai panel, or to modify preexisting setting depending on a particular scenario.

Usage Tracking

Track token consumption for every deployment call by including usage metrics in your API responses. This helps you monitor and optimize your LLM costs in real-time. To enable usage tracking, set include_usage: true in the invoke_options parameter when calling your deployment. Response includes:
  • prompt_tokens - Number of tokens in the input
  • completion_tokens - Number of tokens in the generated output
  • total_tokens - Combined token count (prompt + completion)
curl --request POST \
     --url https://api.orq.ai/v2/deployments/invoke \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <orq-api-key>' \
     --header 'content-type: application/json' \
     --data '
{
  "key": "my-deployment",
  "context": {
    "environment": "production"
  },
  "invoke_options": {
    "include_usage": true
  }
}
'

Unsupported Parameters

Not all parameters offered by model providers are natively supported by Orq.ai when using Invoke. Our API offers a way to provide parameters that are not supported, using the extra_param field. Example:
Here we are injecting the presence_penalty parameter on our model generation. This parameter is available with our provider but not natively exposed through the orq API.
curl --request POST \
     --url https://api.orq.ai/v2/deployments/invoke \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <orq-api-key>' \
     --header 'content-type: application/json' \
     --data '
{
  "key": "my-deployment",
  "context": {
    "environment": "production"
  },
  "extra_params": {
    "presence_penalty": 1.0
  }
}
'

Overwriting Existing Parameters

Overwriting existing parameter can impact your model configuration, use this feature with caution.
Using the extra_params field can also be used to overwrite the Model Configuration defined within the Deployment. At runtime, you can dynamically override previously defined parameters within Orq.ai. Example: Overwriting temperature
Here we are using extra_params to override the temperature parameter that can be also defined within your Prompt Configuration
curl --request POST \
     --url https://api.orq.ai/v2/deployments/invoke \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "key": "my-deployment",
  "context": {
    "environment": "production"
  },
  "extra_params": {
    "temperature": 0.4
  }
}
'
Example: Overwriting response_format
All parameters can be overwritten including complex ones, in this example, we’re overwriting response_format to dynamically set the response format for the generation to a predefined JSON object.
curl --request POST \
     --url https://api.orq.ai/v2/deployments/invoke \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "key": "my-deployment",
  "context": {
    "environment": "production"
  },
  "extra_params": {
    "response_format": {
       json_schema: <schema>,
       type: "json_schema"
    }
  }
}
'

# Here <schema> is a valid JSON Object containing the definition of fields to return
# Example:
# {
#  "name": "object_name",
#  "strict": true,
#  "schema": {
#    "type": "object",
#    "properties": {
#      "field1": {
#        "type": "integer",
#        "description": "First integer field"
#      },
#      "field2": {
#        "type": "integer",
#        "description": "Second integer field"
#      }
#    },
#    "additionalProperties": false,
#    "required": [
#      "field1",
#      "field2"
#    ]
#  }
# }

Attaching Files to Deployment

There are 3 options to attach files to a model Deployment.
  1. Attaching PDFs directly to the model in a Deployment.
  2. Uploading a file and including that file to the Deployment.
  3. Attaching a Knowledge Base to a Deployment.

Sending PDFs directly to the model

This feature is only supported with OpenAI, Anthropic and Google Gemini Models
For compatible models, files can be embedded directly within the Invoke payload by adding a Message type file. The message should hold data for the file as a standard data URI scheme: data:content/type;base64followed by the base64 encoded file data. See below how to use the new message type:
curl --request POST \
     --url https://api.orq.ai/v2/deployments/invoke \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <your_orq_key>' \
     --header 'content-type: application/json' \
     --data '
{
  "key": "key",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "prompt"
        },
        {
          "type": "file",
          "file": {
            "file_data": "data:application/pdf;base64,<base64-encoded-data>"
          }
        }
      ]
    }
  ]
}
'

Attaching files to a Deployment

Attaching files to a Deployment is a 2 step process, first you have to upload the file, which returns an id. Afterwards you attach this id when invoking the Deployment.

Step 1: Upload a file

To attach files during generation, they need to be uploaded before the generation happens. To upload a file use the following API call:
You can find latest SDK documentation in the Python SDK and Node.js SDK
curl --location 'https://api.orq.ai/v2/files' \
--header 'Authorization: Bearer xxxxxx' \
--form 'purpose="retrieval"' \
--form 'file=@"/Users/cormick/Downloads/filename.pdf"'
Here is an example response, store the _id for future usage.
{
    "_id": "file_01JA5D27ZVW2N702Z0D3B1G8EK",
    "object_name": "files-api/workspaces/e747f6ac-19b0-47cd-8e79-0e1bf72b2a3e/retrieval/file_01JA5D27ZVW2N702Z0D3B1G8EK.vnd.openxmlformats-officedocument.spreadsheetml.sheet",
    "purpose": "retrieval",
    "file_name": "file_01JA5D27ZVW2N702Z0D3B1G8EK.vnd.openxmlformats-officedocument.spreadsheetml.sheet",
    "bytes": 5295,
    "created": "2024-10-14T11:36:54.189Z"
}

Step 2: Attach a file during invocation

When invoking a Deployment, attach your file id in the file_ids/ fileIds array as follow:
curl --location 'https://api.orq.ai/v2/deployments/invoke' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer xxxxx' \
--data '{
    "key": "deployment_key",
    "messages": [
        {
            "role": "user",
            "content": ""
        }
    ],
    "file_ids": [
        "file_01JA5D27ZVW2N702Z0D3B1G8EK"
    ]
}'

Attaching a Knowledge Base to a Deployment.

Read here how to set up a Knowledge Base, or how to Use a Knowledge Base in a Prompt

When to use Knowledge Base vs Attaching files

The need for full context understanding

Knowledge Bases and RAG (Retrieval Augmented Generation) retrieve relevant chunks, which works for focused queries but falls short for tasks like summarization that require full-document understanding. Attaching files gives the LLM access to the entire document, ensuring it has the complete context. For example, when summarizing reports, legal cases, or research papers, the LLM needs to process the full document to capture key details and connections that partial text retrieval can’t provide. Full context access leads to better comprehension and more accurate outputs, particularly for tasks requiring a holistic view, such as summarization and detailed analysis.

Dynamic document context

Unlike a static knowledge base, attached files can provide ad-hoc, context-specific documents for one-time or immediate use without the need for integration into a broader knowledge repository. When a user is dealing with unique documents—such as one-off reports, meeting notes, or specific contracts—they can attach these files directly to a deployment. The LLM can instantly use these documents to provide answers or insights. This feature is especially useful for situations where time-sensitive or project-specific documents need to be used on the fly, giving flexibility to quickly incorporate new, temporary knowledge without modifying or updating the knowledge base.

Private or sensitive data

Due to privacy concerns, confidential or sensitive files (e.g., contracts and medical records) may not be suitable for a general knowledge base. Attaching files directly allows secure, temporary interaction with this data.

Knowledge Base Retrievals

When querying a Deployment using a Knowledge Base, it is possible to fetch the details of the knowledge base retrievals during generation.

Invoking with retrieval

When using Invoke a Deployment, use the optional field include_retrievals to embed the retrieval chunks within the response payload. Here are examples on how to use the include_retrievals field in the invoke_options object in your queries payload.
curl --location 'https://api.orq.ai/v2/deployments/invoke' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer xxxxx' \
--data '{
    "key": "deployment_key",
    "messages": [
        {
            "role": "user",
            "content": ""
        }
    ],
    "invoke_options": {
        "include_retrievals": true
    }
}'
Your invokation will then embed the response chunks from the response field retrievals as follow

The retrieval results will be embedded in the retrieval_metadata field, they contain the document details as well as metadata related to the file and search score.

The retrievals are returned in the following format:
Retrievals will contain an array of chunks Each chunk holds the source details and scores (search and re-ranking).
{
    "retrievals": [
        {
            "document": "<chunk_data>",
            "metadata": {
                "file_name": "<filename>",
                "file_type": "application/pdf",
                "page_number": 24,
                "search_score": 0.7886787056922913,
                "rerank_score": 0.19868536
            }
        },
        {
            "document": "<chunk_data>",
            "metadata": {
                "file_name": "<filename>",
                "file_type": "application/pdf",
                "page_number": 25,
                "search_score": 0.746787056030011,
                "rerank_score": 0.1683825
            }
        }
    ]
}