Skip to main content

Introduction

orq.ai exposes API to manipulate Datasets. These APIs are used to manage and enrich Datasets programmatically. In this page we’ll see the common use cases for creating, enriching and fetching datasets through the API.

Prerequisite

To get started, an API key is needed to use within SDKs or HTTP API.
To get an API key ready, see Authentication.

SDKs

Creating a Dataset

To create a Dataset, we’ll use the Create Dataset API. The following information are required to create a Dataset:
  • a unique name.
  • the path within the orq.ai workspace (see Projects).
curl --request POST \
     --url https://api.orq.ai/v2/datasets \
     --header 'accept: application/json' \
     --header 'authorization: Bearer ORQ_API_KEY' \
     --header 'content-type: application/json' \
     --data '
{
  "display_name": "MyDataset",
  "path": "Default"
}
'
The API responds with the following payload:
The API call will return a dataset_id that is used in the next calls.
{
  "display_name": "MyDataset",
  "path": "Default",
  "_id": "<dataset_id>",
  "workspace_id": "<workspace_id>",
  "created": "2025-06-05T13:16:24.865Z",
  "updated": "2025-06-05T13:16:24.865Z",
  "created_by_id": null,
  "updated_by_id": null,
  "project_id": "<project_id>",
  "metadata": {
    "total_versions": 0,
    "datapoints_count": 0
  }
}

Adding Datapoints to a Dataset

Datapoints are entries in a Dataset. You can add between 1 and 5,000 datapoints in a single API request. To create datapoints, we’ll use the Create Datapoints API The expected payload contains:
  • The previously acquired dataset ID
  • An array of datapoints, where each contains:
    • Inputs – Variables that can be used in the prompt message, e.g., {{firstname}}
    • Messages – The prompt template, structured with system, user, and assistant roles
    • Expected Output – Reference responses that evaluators use to compare against newly generated outputs
curl --request POST \
       --url https://api.orq.ai/v2/datasets/DATASET_ID/datapoints \
       --header 'accept: application/json' \
       --header 'authorization: Bearer ORQ_API_KEY' \
       --header 'content-type: application/json' \
       --data '[
    {
      "inputs": {"country": "France"},
      "messages": [
        {"role": "user", "content": "Capital of {{country}}?"},
        {"role": "assistant", "content": "Paris"}
      ],
      "expected_output": "Paris"
    },
    {
      "inputs": {"country": "Germany"},
      "messages": [
        {"role": "user", "content": "Capital of {{country}}?"},
        {"role": "assistant", "content": "Berlin"}
      ],
      "expected_output": "Berlin"
    },
    {
      "inputs": {"country": "Spain"},
      "messages": [
        {"role": "user", "content": "Capital of {{country}}?"},
        {"role": "assistant", "content": "Madrid"}
      ],
      "expected_output": "Madrid"
    }
  ]'

Batch Limits

  • Minimum: 1 datapoint per request
  • Maximum: 5,000 datapoints per request
  • Requests with more than 500 datapoints are automatically processed in optimized chunks

Large Batch Example

For datasets with many entries, you can programmatically generate and submit datapoints:
from orq_ai_sdk import Orq
import os

with Orq(
    api_key=os.getenv("ORQ_API_KEY", ""),
) as orq:

    # Generate 1000 datapoints programmatically
    datapoints = []
    for i in range(1000):
        datapoints.append({
            "inputs": {
                "number": i,
                "operation": "square"
            },
            "messages": [
                {"role": "user", "content": f"What is {i} squared?"},
                {"role": "assistant", "content": f"{i} squared is {i**2}"}
            ],
            "expected_output": str(i**2)
        })

    # Create all datapoints in one request
    res = orq.datasets.create_datapoint(
        dataset_id="DATASET_ID",
        request_body=datapoints
    )

    print(f"Created {len(res)} datapoints")

Creating an Image Dataset

This guide walks you through creating a dataset and populating it with images using the Orq API. You’ll need images in a supported format (JPEG, PNG, GIF, WEBP).
1

Create a Dataset

Start by creating a new Dataset to organize your images, we’ll be using the Create Dataset Endpoint.
curl -X POST https://api.orq.ai/v2/datasets \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "Image Analysis Dataset",
    "path": "Default"
  }'
Response:
{
  "id": "01K99J7RF0W7PSV4XTJM5XZR1J",
  "display_name": "Image Analysis Dataset",
  "created_at": "2025-01-15T10:30:00Z",
  "updated_at": "2025-01-15T10:30:00Z"
}
2

Convert Images to Base64

Images must be encoded as base64 data URLs before adding to the dataset:
# Convert a single image to base64 using bash
IMAGE_PATH="/path/to/image.jpg"
IMAGE_DATA=$(base64 < "$IMAGE_PATH" | tr -d '\n')
MIME_TYPE="image/jpeg"

# Create the data URL
DATA_URL="data:${MIME_TYPE};base64,${IMAGE_DATA}"
echo "$DATA_URL"
3

Add your images to the Dataset

We’ll be using the add Datapoint to a Dataset endpoint.
# Convert image to base64 and send to API
DATASET_ID="01K99J7RF0W7PSV4XTJM5XZR1J"
IMAGE_PATH="/path/to/image.jpg"
IMAGE_DATA=$(base64 < "$IMAGE_PATH" | tr -d '\n')

curl -X POST "https://api.orq.ai/v2/datasets/$DATASET_ID/datapoints" \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe what you see in this image"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/jpeg;base64,'$IMAGE_DATA'",
              "detail": "auto"
            }
          }
        ]
      }
    ]
  }'

See the Full Example

Here’s a complete working example with error handling:
from orq_ai_sdk import Orq
import base64
import glob
import os

# Initialize client
orq = Orq(api_key=os.getenv("ORQ_API_KEY"))

# Helper function
def image_to_base64(image_path):
    with open(image_path, "rb") as image_file:
        encoded = base64.b64encode(image_file.read()).decode('utf-8')

    extension = image_path.lower().split('.')[-1]
    mime_types = {
        'png': 'image/png',
        'gif': 'image/gif',
        'webp': 'image/webp',
        'jpg': 'image/jpeg',
        'jpeg': 'image/jpeg'
    }
    mime_type = mime_types.get(extension, 'image/jpeg')
    return f"data:{mime_type};base64,{encoded}"

# Create dataset
dataset = orq.datasets.create(
    request={
        "display_name": "My Image Dataset",
        "path": "Default",
    }
)
dataset_id = dataset.id
print(f" Dataset created: {dataset_id}")

# Process images
image_extensions = ['jpg', 'jpeg', 'png', 'gif', 'webp']
images_folder = "/path/to/images"

all_images = []
for ext in image_extensions:
    all_images.extend(glob.glob(os.path.join(images_folder, f'*.{ext}')))
    all_images.extend(glob.glob(os.path.join(images_folder, f'*.{ext.upper()}')))

successful = 0
failed = 0

for idx, image_path in enumerate(all_images, 1):
    try:
        image_data = image_to_base64(image_path)

        orq.datasets.create_datapoint(
            dataset_id=dataset_id,
            request_body=[{
                "messages": [{
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": "Describe this image"
                        },
                        {
                            "type": "image_url",
                            "image_url": {
                                "url": image_data,
                                "detail": "auto"
                            }
                        }
                    ]
                }]
            }]
        )

        successful += 1
        print(f"[{idx}/{len(all_images)}]  {os.path.basename(image_path)}")

    except Exception as e:
        failed += 1
        print(f"[{idx}/{len(all_images)}]  {os.path.basename(image_path)}: {str(e)}")

print(f"\n{'='*50}")
print(f" Complete! Added {successful}/{len(all_images)} images")
print(f"Dataset ID: {dataset_id}")

Image Format Details

Supported Formats

FormatMIME TypeExtension
JPEGimage/jpeg.jpg, .jpeg
PNGimage/png.png
GIFimage/gif.gif
WebPimage/webp.webp

Detail Parameter

The detail parameter controls how the image is processed:
  • auto (recommended): Automatically optimizes based on image size
  • low: Faster processing, lower token usage
  • high: More detailed analysis, higher token usage

Error Handling

Common issues and solutions:
ErrorCauseSolution
Invalid API keyAuthentication failedCheck your API key in console.orq.ai/settings/api-keys
File not foundImage path is incorrectVerify the image path and file permissions
Unsupported formatImage format not supportedConvert to JPEG, PNG, GIF, or WebP
Payload too largeImage file is too largeCompress or resize images before upload

Listing Datasets

List Datasets using the List Datasets API.
curl --request GET \
     --url https://api.orq.ai/v2/datasets \
     --header 'accept: application/json' \
     --header 'authorization: Bearer ORQ_API_KEY'
The following response is sent by the API:
{
  "object": "list",
  "data": [
    {
      "_id": "<dataset_id>",
      "created": "2024-10-04T05:21:16.992Z",
      "created_by_id": "<user_id>",
      "display_name": "demo-collection",
      "metadata": {
        "total_versions": 0,
        "datapoints_count": 0
      },
      "parent_id": null,
      "project_id": "<project_id>",
      "updated": "2024-10-04T05:21:16.992Z",
      "updated_by_id": "<user_id>",
      "version": null,
      "workspace_id": "<workspace_id>"
    }
 ]
}

Fetching a Dataset

Fetch a Dataset using the Retrieve a Dataset API.
Replace here DATASET_ID with a previously acquired dataset ID
curl --request GET \
     --url https://api.orq.ai/v2/datasets/DATASET_ID \
     --header 'accept: application/json' \
     --header 'authorization: Bearer ORQ_API_KEY'
The following response is sent by the API.
{
  "_id": "<dataset_id>",
  "display_name": "MyDataset",
  "path": "Default",
  "workspace_id": "<workspace_id>",
  "created": "2025-06-05T13:16:24.865Z",
  "updated": "2025-06-05T13:16:24.865Z",
  "created_by_id": null,
  "updated_by_id": null,
  "project_id": "<project_id>",
  "metadata": {
    "total_versions": 0,
    "datapoints_count": 4
  }
}
Once a Dataset is created and populated with Datapoints, it can used in Experiment, to learn more see Creating an Experiment.