Skip to main content

Objective

A Simple RAG (Retrieval-Augmented Generation) system provides intelligent information retrieval and answer generation by combining your own knowledge base with large language models. This architecture enables applications to provide accurate, contextual responses based on your specific documents and data while maintaining the natural language capabilities of modern LLMs.

Use Case

Simple RAG is ideal for applications that need:
  • Document-Based Q&A: Answer questions based on company documents, manuals, or knowledge repositories.
  • Internal Knowledge Search: Help employees find information from internal wikis, policies, or procedures.
  • Customer Support: Provide accurate answers based on product documentation and support materials.
  • Domain-Specific Information: Reduce hallucinations by grounding responses in verified company data.
  • Contextual Responses: Generate answers that reference specific sources and maintain accuracy.

Prerequisite

Before configuring a Simple RAG, ensure you have:
  • Orq.ai Account: Active workspace in the AI Studio.
  • API Access: Valid API key from Workspace Settings > API Keys.
  • Model Access: At least one text generation model enabled in the AI Gateway, such as gpt-5.4, claude-sonnet-4-6, or gpt-5.4-mini.
  • Embedding Model: At least one embedding model enabled for knowledge base functionality, such as text-embedding-ada-002 or text-embedding-3-small.
  • Source Documents: PDF, TXT, DOCX, CSV, or XML files containing your knowledge base content (max 10MB per file).
Set up SDKChoose a programming language and install the corresponding SDK:
pip install orq-ai-sdk
Initialize the SDK as follows:
import os
from orq_ai_sdk import Orq

client = Orq(
    api_key=os.environ.get("ORQ_API_KEY", "__API_KEY__"),
)
Creating a Knowledge BaseBegin by creating a knowledge base. The embedding_model uses the provider/model format, and the key is what the deployment prompt references later (use companyDocs to follow this guide).
curl --request POST \
     --url https://api.orq.ai/v2/knowledge \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <API_KEY>' \
     --header 'content-type: application/json' \
     --data '
{
  "key": "companyDocs",
  "embedding_model": "openai/text-embedding-3-small",
  "path": "Default",
  "description": "Customer service documentation"
}
'
Save the knowledge_id from the response. The datasource step below needs it.Create a datasourceA datasource is the container the chunks live in inside the knowledge base. Because chunks are supplied directly in the next step rather than uploaded as a file, create the datasource empty: give it a display_name and leave file_id out.Passing a file_id here would tell Orq.ai to chunk that file automatically. That is the file-based flow. This guide takes control of chunking instead, so the datasource starts empty.
curl --request POST \
     --url https://api.orq.ai/v2/knowledge/<knowledge_id>/datasources \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <API_KEY>' \
     --header 'content-type: application/json' \
     --data '
{
  "display_name": "customer_service_guide.txt"
}
'
Chunk the Text and Add It to the DatasourceThis is the core of the pattern, in two parts: chunk the text with the Chunking API, then add the returned chunks to the datasource.Chunking is the single biggest lever on retrieval quality. Splitting the text directly puts the strategy and chunk size under direct control instead of relying on a default.The example below uses the token strategy, which splits purely on token count so every chunk is a predictable size, with chunk_overlap carrying a little context across boundaries. The Chunking API also supports sentence, recursive, semantic, agentic, and fast strategies. For the full list of strategies and parameters, see the Chunking API reference.
# 1. Chunk the text.
curl --request POST \
     --url https://api.orq.ai/v2/chunking \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <API_KEY>' \
     --header 'content-type: application/json' \
     --data '
{
  "text": "<your document text>",
  "strategy": "token",
  "chunk_size": 50,
  "chunk_overlap": 20
}
'

# 2. Add the returned chunks to the datasource.
#    Pipe the step 1 response through jq to build the request body,
#    then pass it with --data @chunks.json:
#    curl ... (step 1) | jq '[.chunks[] | {text: .text}]' > chunks.json
curl --request POST \
     --url https://api.orq.ai/v2/knowledge/<knowledge_id>/datasources/<datasource_id>/chunks \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <API_KEY>' \
     --header 'content-type: application/json' \
     --data @chunks.json
Search the Knowledge BaseOnce the chunks are added, search the knowledge base directly to retrieve the most relevant chunks for a query. Orq.ai embeds the query, finds the most similar chunks, and ranks them by similarity. top_k controls how many chunks are returned, and each match includes the chunk text and relevance scores.
curl --request POST \
     --url https://api.orq.ai/v2/knowledge/<knowledge_id>/search \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <API_KEY>' \
     --header 'content-type: application/json' \
     --data '
{
  "query": "What is the return policy for items bought more than 30 days ago?",
  "top_k": 3
}
'
For more information on Knowledge Base SDK, see SDK Knowledge.

Configuring a RAG Deployment

A RAG Deployment is a standard Deployment with a Knowledge Base attached. For the full deployment walkthrough, see the Simple Deployment cookbook. The RAG-specific steps are below. To create the Deployment:
  • Choose a Project and Folder and select the button.
  • Choose Deployment.
  • Enter name simpleRAG.
  • Choose a primary Model.
Then configure your prompt messages. Click Add Message and select System role:
YAML
You are a helpful AI assistant that answers questions based on provided context from our company knowledge base.

Instructions:
- Use the retrieved context to answer user questions accurately
- If the context doesn't contain relevant information, say "I don't have enough information in the knowledge base to answer that question"
- Always cite which document or source your answer comes from when possible
- Be concise but comprehensive in your responses
- If asked about something not in the context, direct users to contact support

Context will be provided from the knowledge base: {{companyDocs}}

Answer based on this context:

Adding Knowledge Base to Prompt

  • Click Add Knowledge Base in the settings of the Deployment.
  • Choose the knowledge base key (companyDocs).
The {{companyDocs}} variable in the system prompt must match the Knowledge Base key. Retrieved chunks are injected at that position on each call. If the variable is omitted, the chunks are appended to the end of the system message instead.
Deployment editor with the companyDocs knowledge base attached under the Knowledge Bases section and a system prompt referencing the {{companyDocs}} variable
Test your RAG in the Test tab by asking questions about your uploaded documents.
Learn more about knowledge base configuration in Knowledge Base, and prompt configuration in Knowledge Base in Deployments.
When ready with your Deployment choose Deploy, learn more about Deployment Versioning.

Calling the Deployment

To implement a RAG-powered question answering system:
class RAG:
    def __init__(self, client, deployment_key):
        self.client = client
        self.deployment_key = deployment_key
    
    def ask_question(self, question, include_sources=True):
        """Ask a question and get a RAG-powered response"""
        try:
            # Invoke the RAG deployment
            generation = self.client.deployments.invoke(
                key=self.deployment_key,
                messages=[
                    {
                        "role": "user",
                        "content": question
                    }
                ],
                context={
                    "include_retrievals": include_sources  # Include source chunks
                },
                metadata={
                    "query_type": "rag_question",
                    "user_intent": "information_seeking"
                }
            )
            
            # Extract the response
            answer = generation.choices[0].message.content
            
            # Extract retrieved sources if available
            sources = []
            if hasattr(generation, 'retrievals') and generation.retrievals:
                sources = [
                    {
                        "content": retrieval.content,
                        "source": retrieval.metadata.get("source", "Unknown"),
                        "score": retrieval.score
                    }
                    for retrieval in generation.retrievals
                ]
            
            return {
                "answer": answer,
                "sources": sources,
                "question": question
            }
            
        except Exception as e:
            return {
                "answer": "I'm sorry, I'm experiencing technical difficulties. Please try again later.",
                "sources": [],
                "error": str(e)
            }

# Initialize and use the RAG system
rag = RAG(client, "simpleRAG")
result = rag.ask_question("What is our company return policy?")

print(f"Answer: {result['answer']}")
if result['sources']:
    print("\nSources:")
    for source in result['sources']:
        print(f"- {source['source']}: {source['content'][:100]}...")
Here is what the output looks like:
❯ python3 rag_system.py
Answer: Based on our company documentation, our return policy allows customers to return items within 30 days of purchase with a valid receipt. Items must be in original condition and packaging. Refunds are processed within 5-7 business days after we receive the returned item.

Sources:
- company_policies.pdf: Return Policy: All items can be returned within 30 days of purchase provided...
- customer_service_guide.pdf: For returns, customers must provide proof of purchase and items must be...

Viewing Traces

Open the Traces tab on the Deployment page to inspect every call made through the RAG application. Click any trace to see the full span detail: the user’s question, the generated response, retrieved document chunks with relevance scores, and performance timings.
Deployment Traces panel showing spans for a RAG call, including retrieved chunks and response
To learn more about Traces see Traces.
You’ve completed the setup for a Simple RAG system. Explore other Common Architecture patterns to see more advanced RAG implementations.