Orq MCP is live: Use natural language to interrogate traces, spot regressions, and experiment your way to optimal AI configurations. Available in Claude Desktop, Claude Code, Cursor, and more. Start now →
All three solutions store information that an agent can retrieve, but they serve different purposes depending on where the data lives and how it changes.Knowledge Bases index documents you upload into Orq.ai. The platform handles embeddings, chunking, and retrieval. Use them when you have documents to ingest and want a fully managed RAG pipeline.External Knowledge Bases connect to a vector database you already operate. Orq.ai calls your API at query time and passes the results to the model. Use them when your data cannot leave your infrastructure, or when you already have an embedding pipeline.Memory Stores store arbitrary text per entity, such as a user or session. Documents accumulate over time and are retrieved semantically on each interaction. Use them when your agent needs to remember what a specific person said or did in a previous conversation.
A Knowledge Base is a database that provides relevant, specific information for an LLM to retrieve at query time. Knowledge can include domain-specific or business-specific information, ensuring the details surfaced to models are both correct and accurate.
key: the name used to reference the Knowledge Base
embedding_model: formatted as supplier/model_name, for example cohere/embed-english-v3.0. Find embedding models in the AI Router by filtering for Model Type = Embedding.
path: the Project and folder, formatted as project/path, for example Default/Production
A source represents a document loaded within the Knowledge Base. Documents are parsed and split into chunks that models search and retrieve at query time.
Save the file_id returned from the response, then create a datasource with the Create a datasource API.Required fields: knowledge_id, display_name, and optionally file_id to pre-populate the datasource.
When using the AI Studio, you only have access to the following chunk strategies. For more options, see the API & SDK tab.
Default
Automatically set chunk and preprocessing rules. Recommended for unfamiliar users.
Advanced
Maximum Chunk Length: Defines the maximum size of each chunk. Larger size means more information per chunk.Chunk Overlap: Defines the number of characters overlapping neighboring chunks. Higher values increase redundancy between chunks but improve the likelihood that relevant information is returned to models.
Use the sidebar to preview chunks using the chosen chunking strategy.
Use the Chunking API to prepare content for datasource ingestion before adding chunks manually.Common parameters:
text (required): the text content to chunk
strategy (required): token, sentence, recursive, semantic, or agentic
metadata (optional, default: true): include metadata per chunk (start_index, end_index, token_count)
Chunk Settings and StrategiesLarger chunks hold more information but increase token use and generation cost.
Token
Splits text into chunks based on token count. Best for ensuring chunks fit within LLM context windows and maintaining consistent chunk sizes for embedding models.
Parameter
Description
Default
chunk_size
Maximum tokens per chunk
512
chunk_overlap
Number of tokens to overlap between chunks
0
Sentence
Splits text at sentence boundaries while respecting token limits. Ideal for maintaining semantic coherence and readability.
Parameter
Description
Default
chunk_size
Maximum tokens per chunk
512
chunk_overlap
Number of overlapping tokens between chunks
0
min_sentences_per_chunk
Minimum number of sentences per chunk
1
Recursive
Recursively splits text using a hierarchy of separators (paragraphs, sentences, words). Versatile general-purpose chunker that preserves document structure.
Parameter
Description
Default
chunk_size
Maximum tokens per chunk
512
separators
Hierarchy of separators to use
["\n\n", "\n", " ", ""]
min_characters_per_chunk
Minimum characters allowed per chunk
24
Semantic
Groups semantically similar sentences using embeddings. Excellent for maintaining topic coherence and context within chunks.
Parameter
Description
Default
chunk_size
Maximum tokens per chunk
512
embedding_model
Embedding model for similarity (required)
-
dimensions
Number of dimensions for embedding output
-
threshold
Similarity threshold (0-1) or “auto"
"auto”
mode
Chunking mode: “window” or “sentence"
"window”
similarity_window
Window size for similarity comparison
1
Agentic
AI-powered intelligent chunking that uses an LLM to determine optimal split points. Best for complex documents requiring intelligent segmentation.
Parameter
Description
Default
model
LLM model to use for chunking (required)
-
chunk_size
Maximum tokens per chunk
1024
candidate_size
Size of candidate splits for LLM evaluation
128
min_characters_per_chunk
Minimum characters allowed per chunk
24
Fast
High-performance SIMD-optimized byte-level chunking. Best for large files (>1MB) where speed and memory efficiency are critical. 2x faster and 3x less memory than token-based chunking.
Parameter
Description
Default
target_size
Target chunk size in bytes
4096
delimiters
Single-byte delimiters to split on (e.g., "\n.?!")
"\n.?"
pattern
Multi-byte pattern for splitting (e.g., "▁" for SentencePiece)
-
prefix
Attach delimiter to start of next chunk
false
consecutive
Split at START of consecutive delimiter runs
false
forward_fallback
Search forward if no delimiter found backward
false
When to use Fast: Large files (>1MB), high-throughput ingestion, memory-constrained environments.When NOT to use Fast: When you need precise token counts for embedding models, small documents where speed isn’t critical, or when semantic boundaries matter more than byte boundaries.
Each chunk in a Knowledge Base can carry a metadata object: a set of key-value pairs that describe the chunk’s origin, topic, or any custom attribute relevant to your use case.Metadata lets you store all your content in a single Knowledge Base while still scoping retrieval to exactly the right subset of chunks at query time.Common use cases:
Multi-tenant RAG: tag chunks by client_id to isolate results per customer.
Source filtering: filter by filetype or source to restrict results to PDFs, support tickets, or a specific data feed.
Topic scoping: tag chunks by topic or category and filter queries to stay on a single subject.
AI Studio
API & SDK
Open a chunk from the datasource view to access the Edit Chunk panel. The panel has three sections:
Text: the chunk content.
Metadata: a JSON editor pre-filled with the current metadata, or {} if none has been set.
Enabled: toggle to enable or disable the chunk.
Edit the metadata JSON directly and save. The metadata object must be valid JSON with all values as strings, numbers, or booleans. Nested arrays or objects are not supported.
Pass an optional metadata object when creating chunks. Metadata values must be primitive types: strings, numbers, or booleans.
Modify the data loaded within your sources to clean or anonymize it. Toggle on each cleanup option within the Data Cleanup panel.
Option
Description
Delete emails
Removes email addresses from chunk content
Delete credit cards
Removes credit card numbers from chunk content
Delete phone numbers
Removes phone numbers from chunk content
Clean bullet points
Normalizes bullet point formatting
Clean numbered lists
Normalizes numbered list formatting
Clean unicode
Removes or normalizes non-standard unicode characters
Clean dashes
Removes or normalizes dash characters
Clean whitespaces
Removes excess whitespace from chunk content
Pass chunking_cleanup_options inside chunking_options when creating a datasource to clean or anonymize source content before it is chunked and indexed.
Option
Description
delete_emails
Removes email addresses from chunk content
delete_credit_cards
Removes credit card numbers from chunk content
delete_phone_numbers
Removes phone numbers from chunk content
clean_bullet_points
Normalizes bullet point formatting
clean_numbered_list
Normalizes numbered list formatting
clean_unicode
Removes or normalizes non-standard unicode characters
An embedding model is a machine learning tool that transforms complex, high-dimensional data into simpler, numerical values that machines can understand, enabling semantic search.Configure which embedding model to use to query the Knowledge Base from the Knowledge Settings panel.
Incorporates AI agents into the RAG pipeline to orchestrate its components and perform additional actions beyond simple information retrieval, overcoming the limitations of a non-agentic pipeline.Enable the Agentic RAG toggle in Knowledge Settings, then select a Model to use. The chosen model drives two actions:
Document Grading: ensures only relevant chunks are retrieved.
Query Refinement: rewrites the query if needed to improve retrieval quality.
Example: Query Refinement
See the screenshot below on how the input query gets refined.Input query: is my suitcase too big? is reformulated to luggage size requirements and restrictions for carry-on and checked baggage
Different Search modes are available for Information to be found in Knowledge Bases:
Vector Search
Vector search is the fastest method of searching through a database built with your Knowledge Sources. The system takes the user query and looks for the text segments most similar to their vector representations.The search returns the preprocessed chunks from the sources most similar and relevant to the user’s query.
Keyword Search
Keyword Search retrieves relevant results by indexing the entire content and searching for segments containing the words from the user’s query.
Hybrid Search
Hybrid search uses both Vector and Keyword search, then combines results and returns the most relevant chunks to the model.
Search Settings
Chunk limit
Sets the number of chunks most similar to the user’s question to return.
Threshold
Controls the relevance of results on a scale from 0 to 1. Results scoring below the threshold are excluded from retrieval.The closer to 1, the more relevant and narrow the results will be.
Setting too high a threshold can yield little to no results.
Reranking invokes a model that analyzes your initial query and the results fetched by the Knowledge Base search. The model scores and ranks the chunks by similarity to the user query, ensuring the most relevant results are returned.
To use reranking, you must enable at least one Reranking model within the AI Router.
Once your Knowledge Base is populated, you can query it in several ways.
AI Studio
API & SDK
MCP
Test via the Studio
Test your Knowledge Base search directly in the AI Studio using the built-in search panel.
1
Open Knowledge Settings
Navigate to your Knowledge Base and click Knowledge Settings.
2
Enter your search query
Type your query in the Search query field in the right panel.
3
View results
Results appear below showing:
Document name (e.g., “Logistics FAQ.docx”)
Relevance score for each chunk (e.g., 0.49, 0.48)
Chunk content preview
Experiment with different search modes and threshold values to find the optimal configuration for your use case. Lower thresholds return more results but may include less relevant chunks.
Integrate to a Deployment
Attach a Knowledge Base to a Deployment to automatically retrieve relevant chunks on every call.
Open the Deployment’s configuration and go to Knowledge Bases.
Select Knowledge Base and choose your Knowledge Base.
Set the query type:
Last User Message: the user’s latest message is used as the search query automatically.
Query: use a predefined query. You can make it dynamic with an input variable such as {{query}}.
Reference the retrieved chunks in your prompt with the {{knowledge_base_key}} syntax. If not explicitly referenced, the chunks are appended to the end of the system message.
Add a Knowledge Base as context to an Agent. Unlike Deployments, the Agent only queries the Knowledge Base when it determines it is necessary, using the query_knowledge_base tool automatically.
In the Agent configuration, go to the Context section and click Add context.
Select your Knowledge Base.
In the Agent’s Instructions, explicitly tell it to use the Knowledge Base. For example:
“First use retrieve_knowledge_bases to see what knowledge sources are available, then use query_knowledge_base to find relevant information before answering.”
The Knowledge Base description must be explicit so the Agent can identify the right source to query.
To add a Knowledge Base in a Prompt, open the Knowledge Base tab in the Configuration screen and select Add a Knowledge Base.
Choose whether the Knowledge Base type is Last User Message or Query. This defines how the Knowledge Base will be queried.Use the {{key}} syntax in your prompt, where key is the key of your Knowledge Base.
Last User Message: the user message is used as a query to retrieve the relevant chunks.
Query: your predefined query is used to retrieve the relevant chunks.Within a Deployment context, make the query dynamic by using an input variable in the query field.
curl --location 'https://api.orq.ai/v2/knowledge/KNOWLEDGE_BASE_ID/search' \--header 'Content-Type: application/json' \--header 'Authorization: Bearer $ORQ_API_KEY' \--data '{ "query": "What are the benefits of machine learning?"}'
Filter by MetadataPass a filter_by object to restrict results to chunks whose metadata matches specified conditions.Supported filter operators (MongoDB-inspired, no $ prefix):
When using a Knowledge Base within Playground, Experiment, Deployment, or Agent, traces are generated containing details of how Knowledge Bases were accessed.
Traces
Logs
To find Traces, go to the Traces tab in the AI Studio.
Retrieval Spans show the following:
Query: the query used to retrieve relevant chunks.
Documents: the retrieved chunks, ordered by relevance score.
To find logs, go to the Logs tab within the module you’re using, then select a log entry to open the detail panel.The right side of the screen shows the Retrievals section, which details the Knowledge Base used and how it was queried.
Query: the query used to retrieve relevant chunks.
Documents: the retrieved chunks, ordered by relevance score.
User Message AugmentationOn the left side of the panel, you can see how the Knowledge Base variable is modified with retrieval results highlighted in blue. These blue parts are the retrieval results injected into the user message, which the model uses to respond to the user query.
Using the highlighted text, you can verify that the query is correct and that the expected chunks are loaded into the message.
Memory Stores provide persistent storage for agent memories, allowing agents to retain and retrieve information across conversations and sessions. Unlike Knowledge Bases, Memory Stores are entity-scoped: each Memory within a store is tied to a specific entity (a user, session, or any object you define), enabling personalized, per-entity recall.Only long-term memory is currently supported: stored information persists indefinitely with no automatic expiration.To use a Memory Store with an Agent, see Connect Memory Stores.
A Memory represents a specific entity within a Memory Store, identified by an entity_id. Each Memory holds Documents: the actual text content embedded for semantic search.
AI Studio
API & SDK
Create an EntityOnce a Memory Store is created, select Add Entity, enter an ID for the entity, and press Save.
View MemoriesSelect an entity to see all Memory Documents stored for it. Each document shows the date it was recorded. Use date filters to narrow results.
Add a Memory DocumentUse Add Memory to manually add a Memory Document to an entity. Fill in the content and press Add Memory.
Memories are best managed dynamically through the API. See the API & SDK tab for programmatic access.
Entity ID strategy: Use consistent, unique identifiers. Prefix by type (e.g., user_123, session_456) and keep IDs stable across your system.Descriptions: Write exhaustive Memory Store descriptions. Agents use them to identify the correct store to query.Organization: Create separate stores for different contexts (customers, products, sessions). Use descriptive keys.Metadata: Use tags for filtering and categorization, not for storing large text content. Keep data types consistent per field.