curl --request POST \
--url https://api.orq.ai/v2/chunking \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"text": "The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.",
"strategy": "semantic",
"chunk_size": 256,
"threshold": 0.8,
"embedding_model": "openai/text-embedding-3-small",
"dimensions": 512,
"mode": "window",
"similarity_window": 1,
"metadata": true
}
'{
"chunks": [
{
"id": "01HQ3K4M5N6P7Q8R9SATBVCWDX",
"text": "The quick brown fox jumps over the lazy dog.",
"index": 0,
"metadata": {
"start_index": 0,
"end_index": 44,
"token_count": 10
}
},
{
"id": "01HQ3K4M5N6P7Q8R9SATBVCWDY",
"text": "This is a sample text that will be chunked into smaller pieces.",
"index": 1,
"metadata": {
"start_index": 45,
"end_index": 108,
"token_count": 12
}
}
]
}Split large text documents into smaller, manageable chunks using different chunking strategies optimized for RAG (Retrieval-Augmented Generation) workflows. This endpoint supports multiple chunking algorithms including token-based, sentence-based, recursive, semantic, and specialized strategies.
curl --request POST \
--url https://api.orq.ai/v2/chunking \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"text": "The quick brown fox jumps over the lazy dog. This is a sample text that will be chunked into smaller pieces. Each chunk will maintain context while respecting the maximum chunk size.",
"strategy": "semantic",
"chunk_size": 256,
"threshold": 0.8,
"embedding_model": "openai/text-embedding-3-small",
"dimensions": 512,
"mode": "window",
"similarity_window": 1,
"metadata": true
}
'{
"chunks": [
{
"id": "01HQ3K4M5N6P7Q8R9SATBVCWDX",
"text": "The quick brown fox jumps over the lazy dog.",
"index": 0,
"metadata": {
"start_index": 0,
"end_index": 44,
"token_count": 10
}
},
{
"id": "01HQ3K4M5N6P7Q8R9SATBVCWDY",
"text": "This is a sample text that will be chunked into smaller pieces.",
"index": 1,
"metadata": {
"start_index": 45,
"end_index": 108,
"token_count": 12
}
}
]
}Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Splits text based on token count. Best for ensuring chunks fit within LLM context windows and maintaining consistent chunk sizes for embedding models.
The text content to be chunked
token Whether to include metadata for each chunk
Return format: chunks (with metadata) or texts (plain strings)
chunks, texts Maximum tokens per chunk
Number of tokens to overlap between chunks
x >= 0Text successfully chunked
Show child attributes
Was this page helpful?