Creating a Knowledge Base via API

Introduction

The newly released Knowledge Bases API offers users the ability to create, manage and enrich Knowledge Base using the orq.ai API.

In this guide we will see how to manipulate the different entities while building a Knowledge Base.

Prerequisite

API Key

To get started using the API, an API key is needed to use within SDKs or HTTP API.

📘
To get an API key ready, see Authentication.

SDKs

Node.js

Get started with our Node.js SDK

Python

Get started with our Python SDK

Creating a Knowledge Base

To create a Knowledge Base, we'll be using the Create a Knowledge API.

The necessary inputs to create a knowledge base are:

key to define its name
embedding_model choose here a model to create embedding.

To find an embedding model, head to the Model Garden and filter models with Model Type =Embedding . The value should be formatted as follow: supplier/model_name, example: cohere/embed-english-v3.0 .

path is the Project and folder the Knowledge Base will created in, formatted as follow project/path, example Default/Production

The resulting call looks as follow:

curl --request POST \
     --url https://api.orq.ai/v2/knowledge \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --header 'authorization: Bearer <API_KEY>' \
     --data '
{
  "key": "<key>",
  "embedding_model": "<model>",
  "path": "<path>"
}
'

from orq_ai_sdk import Orq
import os


with Orq(
    api_key=os.getenv("ORQ_API_KEY", ""),
) as orq:

    res = orq.knowledge.create(key="key", 
                               embedding_model="supplier/model", 
                               path="project/path")

    assert res is not None

    # Handle response
    print(res)

import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env["ORQ_API_KEY"] ?? "",
});

async function run() {
  const result = await orq.knowledge.create({
    key: "key",
    embeddingModel: "supplier/model",
    path: "project/path",
  });

  // Handle the result
  console.log(result);
}

run();

Once the Knowledge Base is created, save the knowledge_base id returned from the API call.

(Optional) Uploading a File

The most common use case when building a knowledge base is uploading a file (e.g. a pdf) containing the data that you want models to search into. Before integrating your source file into the Knowledge Base, you need this file to be created and uploaded, for that, we'll use the Create file API.

To upload a file, simply point the API to the path of your file and set a name to it.

⚠️
The maximum file size is 10MB.

The resulting call looks as follow:

curl --request POST \
     --url https://api.orq.ai/v2/files \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <API_KEY>' \
     --header 'content-type: multipart/form-data' \
     --form file='@file_path'

from orq_ai_sdk import Orq
import os


with Orq(
    api_key=os.getenv("ORQ_API_KEY", ""),
) as orq:

    res = orq.files.create(file={
        "file_name": "example.file",
        "content": open("example.file", "rb"),
    })

    assert res is not None

    # Handle response
    print(res)

import { Orq } from "@orq-ai/node";
import { openAsBlob } from "node:fs";

const orq = new Orq({
  apiKey: process.env["ORQ_API_KEY"] ?? "",
});

async function run() {
  const result = await orq.files.create({
    file: await openAsBlob("example.file"),
  });

  // Handle the result
  console.log(result);
}

run();

Once this file is created, save the resulting file id returned from the api call.

Creating a Datasource

A Datasource is the integral part of the Knowledge Base, it holds chunks of data within which a model can search and make retrievals returned within a RAG use case. A Knowledge base can hold any number of Datasources.

To create a data source, we'll be using the Create a datasource API.

The following fields are needed:

knowledge_id, corresponding to the previously created knowledge
(optional) file_id, from the previously created file, if you want to prepopulate the data source with your file.
name, a name for the datasource.

The resulting call looks as follow:

curl --request POST \
     --url https://api.orq.ai/v2/knowledge/knowledge_id/datasources \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <API_KEY>' \
     --header 'content-type: application/json' \
     --data '
{
  "display_name": "name",
  "file_id": "file_id"
}
'

from orq_ai_sdk import Orq
import os


with Orq(
    api_key=os.getenv("ORQ_API_KEY", ""),
) as orq:

    res = orq.knowledge.create_datasource(knowledge_id="knowledge_id",
                                          file_id="file_id",
                                          display_name="name")

    assert res is not None

    # Handle response
    print(res)

import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env["ORQ_API_KEY"] ?? "",
});

async function run() {
  const result = await orq.knowledge.createDatasource({
    knowledgeId: "<id>",
    requestBody: {},
  });

  // Handle the result
  console.log(result);
}

run();

The created datasource will be returned, it is important to store its datasource id.

Viewing Datasource Chunks

Once a Datasource is populated with a file or manually, it holds Chunks for each parts of data that can be searched and retrieved. To view chunks we are using the List all chunks for a datasource API.

The only needed data is the previously acquired datasource id and knowledge id.

The resulting call looks as follow:

curl --request GET \
     --url https://api.orq.ai/v2/knowledge/<knowledge_id>/datasources/<datasource_id>/chunks \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <API_KEY>'

from orq_ai_sdk import Orq
import os


with Orq(
    api_key=os.getenv("ORQ_API_KEY", ""),
) as orq:

    res = orq.knowledge.list_chunks(knowledge_id="<id>", datasource_id="<id>", status="completed")

    assert res is not None

    # Handle response
    print(res)

import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env["ORQ_API_KEY"] ?? "",
});

async function run() {
  const result = await orq.knowledge.listChunks({
    knowledgeId: "<id>",
    datasourceId: "<id>",
    status: "completed",
  });

  // Handle the result
  console.log(result);
}

run();

The result contains lists of chunks containing the data for each chunk.

Chunking Data

Before adding Chunks to a Datasource, prepare the content by chunking the data to best fit the need of the Knowledge Base. orq exposes a Chunking text API that prepares data for Datasource ingestion.

To get started, choose which chunking strategy to use for the source data:

Token Chunker

Splits your text based on token count. Great for keeping chunks small enough for LLMs and for consistent embedding sizes

Example Payload

{
  "strategy": "token",
  "text": "The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size."
}

Sentence Chunker

Breaks your text at sentence boundaries, so each chunk stays readable and sentences remain intact

Example Payload

{
  "strategy": "sentence",
  "text": "The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.",
  "min_sentences_per_chunk": 3
}

Recursive chunker

Chunks text by working down a hierarchy (paragraphs, then sentences, then words) to maintain document structure.

Example Payload

{
  "strategy": "recursive",
  "text": "The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.",
  "min_characters_per_chunk": 256
}

Semantic chunker

Groups together sentences that are topically related, so each chunk makes sense on its own.

Example Payload

{
  "strategy": "semantic",
  "text": "The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.",
  "embedding_model": "openai/text-embedding-3-small",
  "mode": "window",
  "similarity_window": 1
}

SDPM chunker

Uses advanced skip-gram patterns to find natural split points, especially helpful for technical or structured documents.

Example Payload

{
  "strategy": "sdpm",
  "text": "The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.",
  "skip_window": 1,
  "embedding_model": "openai/text-embedding-3-small",
  "mode": "window"
}

Agentic chunker

Uses an LLM to determine the best split points, ideal for complex documents that need intelligent segmentation.

Example Payload

{
  "strategy": "agentic",
  "text": "The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.",
  "model": "openai/gpt-4.1",
  "chunk_size": 1024,
  "candidate_size": 128,
  "min_characters_per_chunk": 24
}

Here is an example call using the semantic data method

curl --request POST \
     --url https://api.orq.ai/v2/chunking \
     --header 'accept: application/json' \
     --header 'authorization: Bearer ORQ_API_KEY' \
     --header 'content-type: application/json' \
     --data '
{
  "strategy": "semantic",
  "text": "The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.",
  "chunk_size": 55,
  "embedding_model": "openai/text-embedding-3-small"
}
'

import { Orq } from "@orq-ai/node";

const orq = new Orq({
  apiKey: process.env["ORQ_API_KEY"] ?? "",
});

async function run() {
  const result = await orq.chunking.parse({
    text: "The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.",
    metadata: true,
    strategy: "semantic",
    chunkSize: 256,
    threshold: 0.8,
    embeddingModel: "openai/text-embedding-3-small",
    mode: "window",
    similarityWindow: 1,
  });

  console.log(result);
}

run();

from orq_ai_sdk import Orq
import os


with Orq(
    api_key=os.getenv("ORQ_API_KEY", ""),
) as orq:

    res = orq.chunking.parse(request={
        "text": "The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.",
        "metadata": True,
        "strategy": "semantic",
        "chunk_size": 256,
        "threshold": 0.8,
        "embedding_model": "openai/text-embedding-3-small",
        "mode": "window",
        "similarity_window": 1,
    })

    assert res is not None

    # Handle response
    print(res)

Responding with the following payload.

{
  "chunks": [
    {
      "id": "01JZVV3NM2X9RNC1FR3CJ89GGT",
      "text": "The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. ",
      "index": 0,
      "metadata": {
        "start_index": 0,
        "end_index": 109,
        "token_count": 26
      }
    },
    {
      "id": "01JZVV3NM2KWE7A1HS7XHE0N5W",
      "text": "Each chunk will maintain context while respecting the maximum chunk size.The quick brown fox jumps over the lazy dog. ",
      "index": 1,
      "metadata": {
        "start_index": 109,
        "end_index": 227,
        "token_count": 22
      }
    },
 ... 
  ]
}

Adding Chunk to a Datasource

It is possible to manually add a Chunk into a Datasource, to do so, we use the Create chunk API.

The needed input data are:

The previously fetched knowledge_id.
The desired Datasource to add data to.
The Text to add to the chunk.

The resulting call looks as follow:

curl --request POST \
     --url https://api.orq.ai/v2/knowledge/<knowledge_id>/datasources/<datasource_id>/chunks \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <API_KEY>' \
     --header 'content-type: application/json' \
     --data '
[
  {
    "text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
  },
  "metadata": {
  	"page_id": "page_1xadko1",
    "premium_information": false
  }
]
'

👍
Once a Knowledge Base is created, it can be used within a Prompt to Retrieve Chunk data during model generation. To learn more, see Using a Knowledge Base in a Prompt.