Creating a Knowledge Base
Use the+ button in a chosen Project and select Knowledge Base > Internal.
Press Create Knowledge, the following modal will appear:

With the Knowledge Bases API, it is now possible to create Knowledge Bases with code, to learn more, see Creating a Knowledge Base Programmatically.
Adding a source
You are then taken to the source management page. A source represents a document that is loaded within your Knowledge Base. This document’s information will then be used when referencing and querying the Knowledge Base. Documents needs to be loaded ahead of time so that they can be parsed and cut into chunks. Language models will then use the loaded information as source for answering user queries. To load a new source, select the Add Source button. Here you can add any document of the following format: TXT, PDF, DOCX, CSV, XML.
Chunk Settings and Strategies
Chunks are portions of a source document loaded within a Knowledge Base. When adding a new source to a Knowledge Base, you can decide how this source’s information will be chunked. Larger chunks will hold more relevant information but will imply more token use when sent to a model, impacting the generation cost.Token
Token
Splits text into chunks based on token count. Best for ensuring chunks fit within LLM context windows and maintaining consistent chunk sizes for embedding models.
| Parameter | Description | Default |
|---|---|---|
chunk_size | Maximum tokens per chunk | 512 |
chunk_overlap | Number of tokens to overlap between chunks | 0 |
Sentence
Sentence
Splits text at sentence boundaries while respecting token limits. Ideal for maintaining semantic coherence and readability.
| Parameter | Description | Default |
|---|---|---|
chunk_size | Maximum tokens per chunk | 512 |
chunk_overlap | Number of overlapping tokens between chunks | 0 |
min_sentences_per_chunk | Minimum number of sentences per chunk | 1 |
Recursive
Recursive
Recursively splits text using a hierarchy of separators (paragraphs, sentences, words). Versatile general-purpose chunker that preserves document structure.
| Parameter | Description | Default |
|---|---|---|
chunk_size | Maximum tokens per chunk | 512 |
separators | Hierarchy of separators to use | ["\n\n", "\n", " ", ""] |
min_characters_per_chunk | Minimum characters allowed per chunk | 24 |
Semantic
Semantic
Groups semantically similar sentences using embeddings. Excellent for maintaining topic coherence and context within chunks.
| Parameter | Description | Default |
|---|---|---|
chunk_size | Maximum tokens per chunk | 512 |
embedding_model | Embedding model for similarity (required) | - |
dimensions | Number of dimensions for embedding output | - |
threshold | Similarity threshold (0-1) or “auto" | "auto” |
mode | Chunking mode: “window” or “sentence" | "window” |
similarity_window | Window size for similarity comparison | 1 |
Agentic
Agentic
AI-powered intelligent chunking that uses an LLM to determine optimal split points. Best for complex documents requiring intelligent segmentation.
| Parameter | Description | Default |
|---|---|---|
model | LLM model to use for chunking (required) | - |
chunk_size | Maximum tokens per chunk | 1024 |
candidate_size | Size of candidate splits for LLM evaluation | 128 |
min_characters_per_chunk | Minimum characters allowed per chunk | 24 |
Fast
Fast
High-performance SIMD-optimized byte-level chunking. Best for large files (>1MB) where speed and memory efficiency are critical. 2x faster and 3x less memory than token-based chunking.
| Parameter | Description | Default |
|---|---|---|
target_size | Target chunk size in bytes | 4096 |
delimiters | Single-byte delimiters to split on (e.g., "\n.?!") | "\n.?" |
pattern | Multi-byte pattern for splitting (e.g., "▁" for SentencePiece) | - |
prefix | Attach delimiter to start of next chunk | false |
consecutive | Split at START of consecutive delimiter runs | false |
forward_fallback | Search forward if no delimiter found backward | false |
When to use Fast: Large files (>1MB), high-throughput ingestion, memory-constrained environments.When NOT to use Fast: When you need precise token counts for embedding models, small documents where speed isn’t critical, or when semantic boundaries matter more than byte boundaries.
| Use Case | Recommended Strategy |
|---|---|
| Large files (>1MB) | Fast - 2x faster, 3x less memory |
| RAG with precise tokens | Token or Recursive |
| Semantic search | Semantic |
| Complex document understanding | Agentic |
| General purpose | Recursive |
Default
Default
Automatically set chunk and preprocessing rules. Unfamiliar users are recommended to select this.
Advanced
Advanced
Maximum Chunk LengthDefines the maximum size of each chunk. The bigger the size, the more information they contain.Chunk OverlapDefines the number of characters overlapping neighboring chunks. The higher the value, the more chunks will contain redundant information from one another, but the more likely relevant information will be sent back to models.
Data Cleanup
You can choose to modify the data loaded within your sources, this can be great to clean the chunks or anonymize data. To activate each cleanup, simply toggle on the option within the data cleanup panel.
Summary and Cost Estimation
Once your document has been processed, the following summary will be displayed:
Retrieval Settings
You can configure these options on the Knowledge Settings page. Each option will yield different results, depending on your needs.Search Methods
Vector Search
Vector Search
Vector search is the fastest method of searching through a database built with your Knowledge Sources. Here, our systems take the user query and look for the text segments most similar to their vector representations.The search will return the preprocessed chunks from the sources most similar and relevant to the user’s query.
Keyword Search
Keyword Search
Keyword Search is a different method for retrieving relevant results within a Knowledge Base. In this method, the entire content is indexed, and the system searches for segments containing the words from the user’s query.
Hybrid Search
Hybrid Search
Hybrid search uses both the previously mentioned Vector & Keyword searches, then combining results and returning the most relevant chunks to the model.
Search Parameters
All previous search types can be configured with the following parameters:Chunk limit
Chunk limit
This parameter sets the number of chunks most similar to the user’s questions.
Threshold
Threshold
This controls the relevance of the results on a scale from 0 to 1. Results scoring below threshold will be excluded from retrieval.The closer to 1, the more relevant and narrow the results will be.
Reranking
Reranking invokes a model that analyzes your initial query and the result fetched by the Knowledge Base search. This model then scores the similarity of the chunks returned with the user query, then scores and ranks the chunks accordingly. This ensures the results is the most relevant for your query.
Knowledge Settings
By choosing the Knowledge Settings button, you can configure the following settings.Embedding Models
Here, you can configure which llm model to use to query the Knowledge Base. Your configuration here is similar to any model configuration within Playground, Experiment, Deployment, and Agent and includes the usual parameters
Agentic RAG

- Document Grading, which ensures relevant chunks are retrieved.
- Query Refinement, improving the query if needed.
Example
See the screenshot below on how the input query gets refined. Input query:is my suitcase too big? is reformulated to luggage size requirements and restrictions for carry-on and checked baggage

Connecting an External Knowledge Base
To connect to an external Knowledge Base, choose the+ button on the desired Project.


| Field | Description | Example |
|---|---|---|
| Key | Unique Identifier, alphanumeric with hyphens/underscore | external_kb |
| Description | Description | External Knowledge Base |
| Name | Display Name | External Knowledge Base Name |
| API URL | URL to search knowledge base, must be HTTPS | https://api.example.org/search |
| API Key | Authentication API key to the previously API URL. Orq will use Bearer Authentication Header to call your API. | <API_KEY> |
orq.ai will include the API Key in the
Authorization: Bearer <API_KEY> header when calling your endpoint.API keys are encrypted using workspace-specific keys (AES-256-GCM)
API Payloads
Here are example payloads for request and response expected from your API.Request Payload
Request Payload
Response Payload
Response Payload
The API must respond like a standard Knowledge Base search, to learn more about the expected payload, see our Search
API.
Example Implementation for an External API
We’ve created example implementation for External Knowledge Base API.Python Implementation
An Example Python Server for External Knowledge Base
Python Implementation
An Example Python Server for External Knowledge Base
Get the Code
Clone the Python example Server
Install Dependencies
Run the Server
Test the API
The API is running at
http://localhost:8000Dynamic Documentation will be running at http://localhost:8000/docs Node.js Implementation
An Example Node Server for External Knowledge Base
Node.js Implementation
An Example Node Server for External Knowledge Base
Get the Code
Clone the Node example Server
Install Dependencies
Run the Server
Test the API
The API is running at
http://localhost:8000Dynamic Documentation will be running at http://localhost:8000/doc Integrating Vector Database Providers
We support providers like Weaviate and Pinecone, as both platforms expose REST APIs that conform to the expected payload format documented above. Integration ExamplesCommon Errors and Troubleshooting
| Scenario | Error Message |
|---|---|
| HTTP instead of HTTPS | ”External knowledge base URL must use HTTPS protocol” |
| Local/private IP | ”External knowledge base URL cannot point to local network” |
| API unreachable | ”Failed to verify external knowledge base connectivity” |
| API timeout (>50s) | “External API request timed out” |
- Verify your API endpoint is publicly accessible via HTTPS
- Check your API logs for incoming requests from orq.ai IP addresses
- Verify your firewall/security groups allow inbound HTTPS traffic
- Verify the API key is correct and has not expired
- Check that your API expects Bearer authentication in the
Authorizationheader - Confirm your API key has the necessary permissions to perform searches
- Verify your API returns the expected response format (see Response Payload above)
- Check that
scores.search_scorevalues are between 0 and 1 - Test with different
thresholdvalues (lower threshold = more results) - If using reranking, ensure both
search_scoreandrerank_scoreare provided - Verify your external vector database has sufficient indexed documents
- Monitor your external API response times
- Consider implementing caching for frequently searched queries
- Optimize your vector database indexes
- Check if your external API is rate limiting requests
Configuring your External Knowledge Base
Datasource configuration is not accessible within External Knowledge base, as data is hosted outside of Orq.ai.
- Agentic RAG, to handle knowledge base search refinement with an agent, through orq.ai’s system.
- Search retrieval parameters, Chunk Limit, Search Threshold
- Reranking Models
Your External Knowledge Base is connected:
- Use it just like any other Knowledge Base, to learn more see Using a Knowledge Base in a Prompt.
- Your knowledge base can also be used with Agents, see using Knowledge Bases with Agent.
- Your API will be called at runtime when the model needs to perform a search.
Retrieval Logs
When Using a Knowledge Base in a Prompt within Playground, Experiment, Deployment, or Agent, logs are generated and will transparently contain details of how Knowledge Bases were accessed. To find logs, head to the Logs tabs within the module you’re in, then select one log to open the detail panel for one log entry. The following panel will open.
User Message Augmentation
On the left side of the panel, you can see how the Knowledge Base variable is modified with the retrieval results in blue. The blue parts of the messages are the retrieval results injected into the user message. They will then be used as data with which the model can respond to the user query.Using the blue text, you can verify that the query is correct and that the expected chunks are loaded into the message.