Vision models with orq.ai SDK's

The new capabilities of the vision models enables it to not only process images but also answer questions about them. This is a significant advancement from traditional language models, which were restricted to understanding and responding to just text inputs.

Quick Start

The API supports two primary methods for image submission to the model:

  1. Image URL: Users can provide a direct URL linking to the image.
  2. Data URL: Alternatively, a Data URL (Base64 encoded image data) can be submitted. Note: Certain models may require the inclusion of a mimeType for correct operation. (e.g. data:image/jpeg;base64,/9j/4QDcRXhpZgAASUkqAAg...)

Unified API

To facilitate the use of our API, we provide support for a unified data model to streamline communication with the API.

Consequently, we have chosen to utilize the data structure defined by OpenAI. Further information can be found here.

Currently, our system supports images only in messages where the role is set to user.

How to use

import os

from orq_ai_sdk import OrqAI

client = OrqAI(
    api_key=os.environ.get("ORQ_API_KEY", "__API_KEY__"), environment="production"
)

generation = client.deployments.invoke(
    key="invoice_analyzer",
    messages=[
        {
            "role": "user",
            "content": [
                {"text": "Describe what is on the images", "type": "text"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                    },
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/39/GodfreyKneller-IsaacNewton-1689.jpg/340px-GodfreyKneller-IsaacNewton-1689.jpg"
                    },
                },
            ],
        }
    ],
)


print(generation.choices[0].message.content)

import { createClient } from '@orq-ai/node';

const client = createClient({
  apiKey: 'orquesta-api-key',
  environment: 'production',
});

const deployment = await client.deployments.invoke({
  key: 'invoice_analyzer',
  messages: [
    {
      role: 'user',
      content: [
        { text: 'Describe what is on the images', type: 'text' },
        {
          type: 'image_url',
          image_url: {
            url: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg',
          },
        },
        {
          type: 'image_url',
          image_url: {
            url: 'https://upload.wikimedia.org/wikipedia/commons/thumb/3/39/GodfreyKneller-IsaacNewton-1689.jpg/340px-GodfreyKneller-IsaacNewton-1689.jpg',
          },
        },
      ],
    },
  ],
});

console.log(deployment?.choices[0].message.content);
curl --location 'https: //api.orq.ai/v2/deployments/invoke' \
    --header 'Authorization: <YOUR_API_KEY>' \
    --header 'Content-Type: application/json' \
    --header 'Accept: application/json' \
    --data '{
    "key": "invoice_analyzer",
    "context": {
        "environments": "production"
    },
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "text": "Describe what is on the images",
                    "type": "text"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/39/GodfreyKneller-IsaacNewton-1689.jpg/340px-GodfreyKneller-IsaacNewton-1689.jpg"
                    }
                }
            ]
        }
    ]
}'

Important notes

  1. Vision models will be treated as chat models if no image is provided in the API call.
  2. If the API call includes images but the model used is of the chat or completion type, the images will be ignored.
  3. The number of images supported may vary depending on the vision model used. We recommend reviewing the model provider's documentation for a better understanding of how their API functions.