Using Datasets via the API

Introduction

orq.ai exposes API to manipulate Datasets. These APIs are used to manage and enrich Datasets programmatically. In this page we'll see the common use cases for creating, enriching and fetching datasets through the API.

Prerequisite

To get started, an API key is needed to use within SDKs or HTTP API.

📘

To get an API key ready, see Authentication.

SDKs

Creating a Dataset

To create a Dataset, we'll use the Create Dataset API.

The following information are required to create a Dataset:

  • a unique name.
  • the path within the orq.ai workspace (see Projects).
curl --request POST \
     --url https://api.orq.ai/v2/datasets \
     --header 'accept: application/json' \
     --header 'authorization: Bearer ORQ_API_KEY' \
     --header 'content-type: application/json' \
     --data '
{
  "display_name": "MyDataset",
  "path": "Default"
}
'
from orq_ai_sdk import Orq
import os


with Orq(
    api_key=os.getenv("ORQ_API_KEY", ""),
) as orq:

    res = orq.datasets.create(request={
        "display_name": "bad_reviewed_logs",
        "path": "Default",
    })

    assert res is not None

    # Handle response
    print(res)
import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env["ORQ_API_KEY"] ?? "",
});

async function run() {
  const result = await orq.datasets.create({
    displayName: "bad_reviewed_logs",
    path: "Default",
  });

  console.log(result);
}

run();

The API responds with the following payload:

The API call will return a dataset_id that is used in the next calls.

{
  "display_name": "MyDataset",
  "path": "Default",
  "_id": "<dataset_id>",
  "workspace_id": "<workspace_id>",
  "created": "2025-06-05T13:16:24.865Z",
  "updated": "2025-06-05T13:16:24.865Z",
  "created_by_id": null,
  "updated_by_id": null,
  "project_id": "<project_id>",
  "metadata": {
    "total_versions": 0,
    "datapoints_count": 0
  }
}

Adding a Datapoint to a Dataset

A Datapoint is a single entry into the Dataset.

To create a Datapoint, we'll use the Create a Datapoint API

The expected payload contains the following fields:

  • The previously acquired dataset ID.
  • Inputs – Variables that can be used in the prompt message, e.g., {{firstname}}.
  • Messages – The prompt template, structured with system, user, and assistant roles.
  • Expected Outputs – Reference responses that evaluators use to compare against newly generated outputs.
curl --request POST \
     --url https://api.orq.ai/v2/datasets/DATASET_ID/datapoints \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data @- <<EOF
{
  "inputs": {
    "language": "english"
  },
  "messages": [
    {
      "content": "you're a helpful assistant, answer in {{language}}"
    },
    {
      "role": "user",
      "content": "what's the capital of Australia?"
    },
    {
      "role": "assistant",
      "content": "Sydney"
    }
  ],
  "expected_output": "Canberra"
}
EOF
from orq_ai_sdk import Orq
import os


with Orq(
    api_key=os.getenv("ORQ_API_KEY", ""),
) as orq:

    res = orq.datasets.create_datapoint(dataset_id="DATASET ID", 
                                        inputs={"inputKey": "value"}, 
                                        messages=[{
                                          "role": "system",
                                          "content": "You are a helpful assistant"
                                        }, {
                                          "role": "user",
                                          "content": "Give me a short explanation of what an AI Engineer does"
                                        }])

    assert res is not None

    # Handle response
    print(res)
import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env["ORQ_API_KEY"] ?? "",
});

async function run() {
  const result = await orq.datasets.createDatapoint({
    datasetId: "<id>",
    inputs: {
      "inputKey": "value"
    },
    messages: [
      {
        "role": "system",
        "content": "You are a helpful assistant"
      }, {
        "role": "user",
        "content": "Give me a short explanation of what an AI Engineer does"                                }
    ]
  });

  console.log(result);
}

run();

The following response is sent by the API:

{
  "inputs": {
    "language": "english"
  },
  "messages": [
    {
      "content": "you're a helpful assistant, answer in {{language}}"
    },
    {
      "role": "user",
      "content": "what's the capital of Australia?"
    },
    {
      "role": "assistant",
      "content": "Sydney"
    }
  ],
  "expected_output": "Canberra",
  "_id": "<datapoint_id>",
  "dataset_id": "<dataset_id>",
  "workspace_id": "<workspace_id>",
  "created_by_id": null,
  "updated_by_id": null
}


Adding Multiple Datapoints to a Dataset

For larger Dataset, consider adding Datapoints in bulk using the Create multiple Datapoints API.

The payload contains a collection of Datapoints and is structured identically as the previous call.

curl --request POST \
     --url https://api.orq.ai/v2/datasets/DATASET ID/datapoints/bulk \
     --header 'accept: application/json' \
     --header 'authorization: Bearer ORQ_API_KEY' \
     --header 'content-type: application/json' \
     --data '
{
  "items": [
    {
      "inputs": {
        "InputKey": "New Value"
      },
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant"
        },
        {
          "role": "user",
          "content": "Tell me briefly what being a AI Engineer entails"
        }
      ]
    },
    {
      "inputs": {
        "newKey": "New Value2"
      },
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant"
        },
        {
          "role": "user",
          "content": "Tell me briefly what being a Prompt Engineer entails"
        }
      ]
    }
  ]
}
'
from orq_ai_sdk import Orq
import os


with Orq(
    api_key=os.getenv("ORQ_API_KEY", ""),
) as orq:

    res = orq.datasets.create_datapoints(dataset_id="DATASET ID", items=[ 
        {
            'inputs': {"inputKey": "value"}, 
            'messages': [{
                "role": "system",
                "content": "You are a helpful assistant"
                }, {
                "role": "user",
                "content": "Give me a short explanation of what an AI Engineer does"
                }]
        }, 
        {
            'inputs': {"inputKey": "value"}, 
            'messages': [{
                "role": "system",
                "content": "You are a helpful assistant"
                }, {
                "role": "user",
                "content": "Give me a short explanation of what an AI Engineer does"
                }]
        }])

    assert res is not None

    # Handle response
    print(res)
import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env["ORQ_API_KEY"] ?? "",
});

async function run() {
  const result = await orq.datasets.createDatapoint({
    datasetId: "DATASET ID",
    items: [{
      inputs: {
        "inputKey": "value"
      },
      messages: [ {
        "role": "system",
        "content": "You are a helpful assistant"
      }, {
        "role": "user",
        "content": "Give me a short explanation of what an AI Engineer does"                                } ] 
    }, {
      inputs: {
        "inputKey": "value"
      },
      messages: [ {
        "role": "system",
        "content": "You are a helpful assistant"
      }, {
        "role": "user",
        "content": "Give me a short explanation of what an AI Engineer does"
      } ]
    }]);

  console.log(result);
}

run();

The following response is sent by the API:

[
    {
        "inputs": {
            "InputKey": "New Value"
        },
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant"
            },
            {
                "role": "user",
                "content": "Tell me briefly what being a AI Engineer entails"
            }
        ],
        "dataset_id": "<dataset_id>",
        "_id": "<datapoint_id>",
        "workspace_id": "<workspace_id>",
        "created_by_id": null,
        "updated_by_id": null
    },
    {
        "inputs": {
            "newKey": "New Value2"
        },
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant"
            },
            {
                "role": "user",
                "content": "Tell me briefly what being a Prompt Engineer entails"
            }
        ],
        "dataset_id": "<dataset_id>",
        "_id": "<datapoint_id>",
        "workspace_id": "<workspace_id>",
        "created_by_id": null,
        "updated_by_id": null
    }
]

Listing Datasets

List Datasets using the List Datasets API.

curl --request GET \
     --url https://api.orq.ai/v2/datasets \
     --header 'accept: application/json' \
     --header 'authorization: Bearer ORQ_API_KEY'
from orq_ai_sdk import Orq
import os


with Orq(
    api_key=os.getenv("ORQ_API_KEY", ""),
) as orq:

    res = orq.datasets.list(limit=10)

    assert res is not None

    # Handle response
    print(res)
import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env["ORQ_API_KEY"] ?? "",
});

async function run() {
  const result = await orq.datasets.list({});

  console.log(result);
}

run();

The following response is sent by the API:

{
  "object": "list",
  "data": [
    {
      "_id": "<dataset_id>",
      "created": "2024-10-04T05:21:16.992Z",
      "created_by_id": "<user_id>",
      "display_name": "demo-collection",
      "metadata": {
        "total_versions": 0,
        "datapoints_count": 0
      },
      "parent_id": null,
      "project_id": "<project_id>",
      "updated": "2024-10-04T05:21:16.992Z",
      "updated_by_id": "<user_id>",
      "version": null,
      "workspace_id": "<workspace_id>"
    }
 ]
}

Fetching a Dataset

Fetch a Dataset using the Retrieve a Dataset API.

Replace here DATASET_ID with a previously acquired dataset ID

curl --request GET \
     --url https://api.orq.ai/v2/datasets/DATASET_ID \
     --header 'accept: application/json' \
     --header 'authorization: Bearer ORQ_API_KEY'
from orq_ai_sdk import Orq
import os


with Orq(
    api_key=os.getenv("ORQ_API_KEY", ""),
) as orq:

    res = orq.datasets.retrieve(dataset_id="<id>")

    assert res is not None

    # Handle response
    print(res)
import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env["ORQ_API_KEY"] ?? "",
});

async function run() {
  const result = await orq.datasets.retrieve({
    datasetId: "<id>",
  });

  console.log(result);
}

run();

The following response is sent by the API.

{
  "_id": "<dataset_id>",
  "display_name": "MyDataset",
  "path": "Default",
  "workspace_id": "<workspace_id>",
  "created": "2025-06-05T13:16:24.865Z",
  "updated": "2025-06-05T13:16:24.865Z",
  "created_by_id": null,
  "updated_by_id": null,
  "project_id": "<project_id>",
  "metadata": {
    "total_versions": 0,
    "datapoints_count": 4
  }
}


👍

Once a Dataset is created and populated with Datapoints, it can used in Experiment, to learn more see Creating an Experiment.