Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

Once you have a Deployment with model configurations ready to be exposed to your users, you can start the integration process which involves invoking your deployments from within your environments. In this document, we will see how to fetch prepared code snippets for your deployment and use them to integrate orq.ai in your systems.

If you don’t have Deployments ready to be integrated, see Creating a Deployment.

Getting Code Snippets

The first step for integration is fetching the code related to the chosen Deployment. Each Deployment can contain several variants.

Variant exposition is configured through Routing, to learn more see Deployment Routing.

You can see a snippet for a Variant in two ways:

Via the Routing Page

Open a Deployment and go to the Routing Page.

Right-Click on the Variant you want to integrate.
Select Generate Code Snippet

Via the Variant Page

Open a Deployment and go to the Variant Page.

Press the Code Snippet icon at the top-right of the Studio.

The following panel will open:

In this panel, all context attributes will be filled correctly so that your Routing rules are respected. To learn more about context attributes and routing, see Deployment Routing.

Using Code Snippet

You have multiple integration languages available to integrate your Deployment. Currently we support Python, Javascript (node) and shell (cURL).

Getting Credentials

The first step for an integration is to have an API key ready to be used.

If you don’t have an API key yet, you can fetch one from your panel, see how in our Authentication Documentation.

Initializing a client

Depending on the chosen programming language, you will have different methods to initialize your client. All methods require the previously acquired API Key.

To learn more about client initialization, see our authentication tutorial using our Client Libraries

Invoking a deployment

Once your authentication layer is ready, you can Invoke your Deployment. Invoking means sending a query to the underlying model, which can include your user’s request; orq.ai takes care of operations to reach the correct language model with all prepared configurations and returns the model’s response immediately.

To learn more about Deployment Invocation, see our tutorial using our Client Libraries

Once you have invoked a first deployment, look into our SDKs for available options and calls:

Retry and Fallback Behavior

When you invoke a Deployment, Orq automatically handles retries and fallbacks based on the configuration you set at the deployment level. Understanding this behavior helps you design robust integrations and anticipate response latency. When you call a Deployment via the API:

Success (First Try): Response returned immediately
Retry Scenario: Response may be delayed by up to base_latency × (retry_count + 1) to account for initial attempt plus all configured retries
Fallback Invoked: Additional latency as Fallback Model processes the request
All Retries & Fallback Failed: Error returned to your application

Recommendation: Set appropriate timeouts on your API calls to account for retry and fallback latency.

Extra Parameters

This is a powerful tool to access parameters not directly exposed by the Orq.ai panel, or to modify preexisting setting depending on a particular scenario.

Usage Tracking

Track token consumption for every deployment call by including usage metrics in your API responses. This helps you monitor and optimize your LLM costs in real-time. To enable usage tracking, set include_usage: true in the invoke_options parameter when calling your deployment. Response includes:

prompt_tokens - Number of tokens in the input
completion_tokens - Number of tokens in the generated output
total_tokens - Combined token count (prompt + completion)

curl --request POST \
     --url https://api.orq.ai/v2/deployments/invoke \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <orq-api-key>' \
     --header 'content-type: application/json' \
     --data '
{
  "key": "my-deployment",
  "context": {
    "environment": "production"
  },
  "invoke_options": {
    "include_usage": true
  }
}
'

Identity

Associate an identity with your deployment invocations for tracking and personalization. The identity parameter allows you to link API calls to specific users in your system. Identity fields:

id - Unique identifier for the identity (required)
display_name - Display name of the identity
email - Email address of the identity
logo_url - URL to the identity’s avatar or logo
tags - List of tags associated with the identity

curl --request POST \
     --url https://api.orq.ai/v2/deployments/invoke \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <orq-api-key>' \
     --header 'content-type: application/json' \
     --data '
{
  "key": "my-deployment",
  "identity": {
    "id": "contact_01ARZ3NDEKTSV4RRFFQ69G5FAV",
    "display_name": "Jane Doe",
    "email": "jane.doe@example.com",
    "logo_url": "https://example.com/avatars/jane-doe.jpg",
    "tags": ["hr", "engineering"]
  }
}
'

Unsupported Parameters

Not all parameters offered by model providers are natively supported by Orq.ai when using Invoke. Our API offers a way to provide parameters that are not supported, using the extra_param field. Example:

Here we are injecting the presence_penalty parameter on our model generation. This parameter is available with our provider but not natively exposed through the orq API.

curl --request POST \
     --url https://api.orq.ai/v2/deployments/invoke \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <orq-api-key>' \
     --header 'content-type: application/json' \
     --data '
{
  "key": "my-deployment",
  "context": {
    "environment": "production"
  },
  "extra_params": {
    "presence_penalty": 1.0
  }
}
'

Overwriting Existing Parameters

Overwriting existing parameter can impact your model configuration, use this feature with caution.

Using the extra_params field can also be used to overwrite the Model Configuration defined within the Deployment. At runtime, you can dynamically override previously defined parameters within Orq.ai. Example: Overwriting temperature

Here we are using extra_params to override the temperature parameter that can be also defined within your Prompt Configuration

curl --request POST \
     --url https://api.orq.ai/v2/deployments/invoke \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "key": "my-deployment",
  "context": {
    "environment": "production"
  },
  "extra_params": {
    "temperature": 0.4
  }
}
'

Example: Overwriting response_format

All parameters can be overwritten including complex ones, in this example, we’re overwriting response_format to dynamically set the response format for the generation to a predefined JSON object.

curl --request POST \
     --url https://api.orq.ai/v2/deployments/invoke \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "key": "my-deployment",
  "context": {
    "environment": "production"
  },
  "extra_params": {
    "response_format": {
       json_schema: <schema>,
       type: "json_schema"
    }
  }
}
'

# Here <schema> is a valid JSON Object containing the definition of fields to return
# Example:
# {
#  "name": "object_name",
#  "strict": true,
#  "schema": {
#    "type": "object",
#    "properties": {
#      "field1": {
#        "type": "integer",
#        "description": "First integer field"
#      },
#      "field2": {
#        "type": "integer",
#        "description": "Second integer field"
#      }
#    },
#    "additionalProperties": false,
#    "required": [
#      "field1",
#      "field2"
#    ]
#  }
# }

Attaching Files to Deployment

There are 3 options to attach files to a model Deployment.

Attaching PDFs directly to the model in a Deployment.
Uploading a file and including that file to the Deployment.
Attaching a Knowledge Base to a Deployment.

Sending PDFs directly to the model

This feature is only supported with OpenAI, Anthropic and Google Gemini Models

For compatible models, files can be embedded directly within the Invoke payload by adding a Message type file. The message should hold data for the file as a standard data URI scheme: data:content/type;base64followed by the base64 encoded file data. See below how to use the new message type:

curl --request POST \
     --url https://api.orq.ai/v2/deployments/invoke \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <your_orq_key>' \
     --header 'content-type: application/json' \
     --data '
{
  "key": "key",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "prompt"
        },
        {
          "type": "file",
          "file": {
            "file_data": "data:application/pdf;base64,<base64-encoded-data>"
          }
        }
      ]
    }
  ]
}
'

Attaching files to a Deployment

Attaching files to a Deployment is a 2 step process, first you have to upload the file, which returns an id. Afterwards you attach this id when invoking the Deployment.

Step 1: Upload a file

To attach files during generation, they need to be uploaded before the generation happens. To upload a file use the following API call:

You can find latest SDK documentation in the Python SDK and Node.js SDK

curl --location 'https://api.orq.ai/v2/files' \
--header 'Authorization: Bearer xxxxxx' \
--form 'purpose="retrieval"' \
--form 'file=@"/Users/cormick/Downloads/filename.pdf"'

Here is an example response, store the _id for future usage.

{
    "_id": "file_01JA5D27ZVW2N702Z0D3B1G8EK",
    "object_name": "files-api/workspaces/e747f6ac-19b0-47cd-8e79-0e1bf72b2a3e/retrieval/file_01JA5D27ZVW2N702Z0D3B1G8EK.vnd.openxmlformats-officedocument.spreadsheetml.sheet",
    "purpose": "retrieval",
    "file_name": "file_01JA5D27ZVW2N702Z0D3B1G8EK.vnd.openxmlformats-officedocument.spreadsheetml.sheet",
    "bytes": 5295,
    "created": "2024-10-14T11:36:54.189Z"
}

Step 2: Attach a file during invocation

When invoking a Deployment, attach your file id in the file_ids/ fileIds array as follow:

curl --location 'https://api.orq.ai/v2/deployments/invoke' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer xxxxx' \
--data '{
    "key": "deployment_key",
    "messages": [
        {
            "role": "user",
            "content": ""
        }
    ],
    "file_ids": [
        "file_01JA5D27ZVW2N702Z0D3B1G8EK"
    ]
}'

Attaching a Knowledge Base to a Deployment.

Read here how to set up a Knowledge Base, or how to Use a Knowledge Base in a Prompt

When to use Knowledge Base vs Attaching files

The need for full context understanding

Knowledge Bases and RAG (Retrieval Augmented Generation) retrieve relevant chunks, which works for focused queries but falls short for tasks like summarization that require full-document understanding. Attaching files gives the LLM access to the entire document, ensuring it has the complete context. For example, when summarizing reports, legal cases, or research papers, the LLM needs to process the full document to capture key details and connections that partial text retrieval can’t provide. Full context access leads to better comprehension and more accurate outputs, particularly for tasks requiring a holistic view, such as summarization and detailed analysis.

Dynamic document context

Unlike a static knowledge base, attached files can provide ad-hoc, context-specific documents for one-time or immediate use without the need for integration into a broader knowledge repository. When a user is dealing with unique documents—such as one-off reports, meeting notes, or specific contracts—they can attach these files directly to a deployment. The LLM can instantly use these documents to provide answers or insights. This feature is especially useful for situations where time-sensitive or project-specific documents need to be used on the fly, giving flexibility to quickly incorporate new, temporary knowledge without modifying or updating the knowledge base.

Private or sensitive data

Due to privacy concerns, confidential or sensitive files (e.g., contracts and medical records) may not be suitable for a general knowledge base. Attaching files directly allows secure, temporary interaction with this data.

Knowledge Base Retrievals

When querying a Deployment using a Knowledge Base, it is possible to fetch the details of the knowledge base retrievals during generation.

Invoking with retrieval

When using Invoke a Deployment, use the optional field include_retrievals to embed the retrieval chunks within the response payload. Here are examples on how to use the include_retrievals field in the invoke_options object in your queries payload.

curl --location 'https://api.orq.ai/v2/deployments/invoke' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer xxxxx' \
--data '{
    "key": "deployment_key",
    "messages": [
        {
            "role": "user",
            "content": ""
        }
    ],
    "invoke_options": {
        "include_retrievals": true
    }
}'

Your invokation will then embed the response chunks from the response field retrievals as follow

The retrievals are returned in the following format:

Retrievals will contain an array of chunks Each chunk holds the source details and scores (search and re-ranking).

{
    "retrievals": [
        {
            "document": "<chunk_data>",
            "metadata": {
                "file_name": "<filename>",
                "file_type": "application/pdf",
                "page_number": 24,
                "search_score": 0.7886787056922913,
                "rerank_score": 0.19868536
            }
        },
        {
            "document": "<chunk_data>",
            "metadata": {
                "file_name": "<filename>",
                "file_type": "application/pdf",
                "page_number": 25,
                "search_score": 0.746787056030011,
                "rerank_score": 0.1683825
            }
        }
    ]
}

Getting Started

Reference

Organization

Integrate and use Deployments via the API | SDK & API Setup Guide

Getting Code Snippets

Via the Routing Page

Via the Variant Page

Using Code Snippet

Getting Credentials

Initializing a client

Invoking a deployment

Retry and Fallback Behavior

Extra Parameters

Usage Tracking

Identity

Unsupported Parameters

Overwriting Existing Parameters

Attaching Files to Deployment

Sending PDFs directly to the model

Attaching files to a Deployment

Step 1: Upload a file

Step 2: Attach a file during invocation

Attaching a Knowledge Base to a Deployment.

When to use Knowledge Base vs Attaching files

The need for full context understanding

Dynamic document context

Private or sensitive data

Knowledge Base Retrievals

Invoking with retrieval

Getting Started

Reference

Organization

​Getting Code Snippets

​Via the Routing Page

​Via the Variant Page

​Using Code Snippet

​Getting Credentials

​Initializing a client

​Invoking a deployment

​Retry and Fallback Behavior

​Extra Parameters

​Usage Tracking

​Identity

​Unsupported Parameters

​Overwriting Existing Parameters

​Attaching Files to Deployment

​Sending PDFs directly to the model

​Attaching files to a Deployment

​Step 1: Upload a file

​Step 2: Attach a file during invocation

​Attaching a Knowledge Base to a Deployment.

​When to use Knowledge Base vs Attaching files

​The need for full context understanding

​Dynamic document context

​Private or sensitive data

​Knowledge Base Retrievals

​Invoking with retrieval

Getting Code Snippets

Via the Routing Page

Via the Variant Page

Using Code Snippet

Getting Credentials

Initializing a client

Invoking a deployment

Retry and Fallback Behavior

Extra Parameters

Usage Tracking

Identity

Unsupported Parameters

Overwriting Existing Parameters

Attaching Files to Deployment

Sending PDFs directly to the model

Attaching files to a Deployment

Step 1: Upload a file

Step 2: Attach a file during invocation

Attaching a Knowledge Base to a Deployment.

When to use Knowledge Base vs Attaching files

The need for full context understanding

Dynamic document context

Private or sensitive data

Knowledge Base Retrievals

Invoking with retrieval