LlamaIndex
Integrate Orq.ai with LlamaIndex using OpenTelemetry
Getting Started
LlamaIndex is a powerful framework for building RAG (Retrieval-Augmented Generation) applications with LLMs. Tracing LlamaIndex with Orq.ai provides comprehensive insights into document indexing, retrieval performance, query processing, and LLM interactions to optimize your RAG applications.
Prerequisites
Before you begin, ensure you have:
- An Orq.ai account and API key
- LlamaIndex installed in your project
- Python 3.8+
- OpenAI API key (or other LLM provider credentials)
Install Dependencies
# Core LlamaIndex and OpenTelemetry packages
pip install llama-index opentelemetry-sdk opentelemetry-exporter-otlp
# Additional instrumentation packages
pip install openlit traceloop-sdk
# Optional: For advanced vector stores and embeddings
pip install llama-index-vector-stores-chroma llama-index-embeddings-openai
Configure Orq.ai
Set up your environment variables to connect to Orq.ai's OpenTelemetry collector:
Unix/Linux/macOS:
export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer <ORQ_API_KEY>"
export OTEL_RESOURCE_ATTRIBUTES="service.name=llamaindex-app,service.version=1.0.0"
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
Windows (PowerShell):
$env:OTEL_EXPORTER_OTLP_ENDPOINT = "https://api.orq.ai/v2/otel"
$env:OTEL_EXPORTER_OTLP_HEADERS = "Authorization=Bearer <ORQ_API_KEY>"
$env:OTEL_RESOURCE_ATTRIBUTES = "service.name=llamaindex-app,service.version=1.0.0"
$env:OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY>"
Using .env file:
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.orq.ai/v2/otel
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer <ORQ_API_KEY>
OTEL_RESOURCE_ATTRIBUTES=service.name=llamaindex-app,service.version=1.0.0
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
Integrations
Choose your preferred OpenTelemetry framework for collecting traces:
OpenLit
Auto-instrumentation with minimal setup:
import openlit
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Initialize OpenLit
openlit.init(
otlp_endpoint="https://api.orq.ai/v2/otel",
otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)
# Your LlamaIndex code is automatically traced
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic of these documents?")
OpenLLMetry
Non-intrusive tracing with decorators:
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
Traceloop.init()
@workflow(name="llamaindex-rag-workflow")
def create_rag_pipeline():
# Load and index documents
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
# Create query engine
query_engine = index.as_query_engine(similarity_top_k=3)
# Query the index
response = query_engine.query("Summarize the key findings from the documents")
return response.response
result = create_rag_pipeline()
Logfire
Pydantic-based observability:
import logfire
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
logfire.configure()
# LlamaIndex integration with Logfire
logfire.instrument_openai()
def build_index_with_logfire():
with logfire.span("document-loading"):
documents = SimpleDirectoryReader("./data").load_data()
with logfire.span("index-creation"):
index = VectorStoreIndex.from_documents(documents)
with logfire.span("query-execution") as span:
query_engine = index.as_query_engine()
response = query_engine.query("What are the main themes?")
span.set_attribute("response_length", len(response.response))
return response
result = build_index_with_logfire()
OpenInference
Arize-compatible tracing with LlamaIndex instrumentation:
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Initialize OpenTelemetry
tracer_provider = trace_sdk.TracerProvider()
tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(tracer_provider)
# Instrument LlamaIndex
LlamaIndexInstrumentor().instrument()
# Your LlamaIndex code is now automatically traced
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("Analyze the document content")
MLFlow
MLOps-focused tracing:
import mlflow
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Enable MLflow tracing
mlflow.llama_index.autolog()
@mlflow.trace
def rag_pipeline_with_mlflow(query: str):
# Load documents
documents = SimpleDirectoryReader("./data").load_data()
# Create index
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query(query)
return response.response
result = rag_pipeline_with_mlflow("What is the summary of the documents?")
Examples
Basic RAG Pipeline
import openlit
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
# Initialize tracing
openlit.init(
otlp_endpoint="https://api.orq.ai/v2/otel",
otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)
# Configure LlamaIndex settings
Settings.llm = OpenAI(model="gpt-4", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
def basic_rag_example():
# Load documents
print("Loading documents...")
documents = SimpleDirectoryReader("./data").load_data()
# Create vector index
print("Creating vector index...")
index = VectorStoreIndex.from_documents(documents)
# Create query engine
query_engine = index.as_query_engine(
similarity_top_k=3,
response_mode="compact"
)
# Query the index
queries = [
"What is the main topic discussed in the documents?",
"Can you summarize the key findings?",
"What recommendations are provided?"
]
for query in queries:
print(f"\nQuery: {query}")
response = query_engine.query(query)
print(f"Response: {response.response}")
return query_engine
engine = basic_rag_example()
Advanced RAG with Custom Tools
import openlit
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
openlit.init(
otlp_endpoint="https://api.orq.ai/v2/otel",
otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)
def advanced_rag_with_agent():
# Create multiple indexes for different document types
tech_docs = SimpleDirectoryReader("./tech_docs").load_data()
business_docs = SimpleDirectoryReader("./business_docs").load_data()
tech_index = VectorStoreIndex.from_documents(tech_docs)
business_index = VectorStoreIndex.from_documents(business_docs)
# Create query engines
tech_engine = tech_index.as_query_engine(similarity_top_k=3)
business_engine = business_index.as_query_engine(similarity_top_k=3)
# Create tools
tech_tool = QueryEngineTool(
query_engine=tech_engine,
metadata=ToolMetadata(
name="tech_docs",
description="Technical documentation search tool"
)
)
business_tool = QueryEngineTool(
query_engine=business_engine,
metadata=ToolMetadata(
name="business_docs",
description="Business documentation search tool"
)
)
# Create agent
llm = OpenAI(model="gpt-4", temperature=0)
agent = ReActAgent.from_tools(
[tech_tool, business_tool],
llm=llm,
verbose=True
)
# Query the agent
response = agent.chat(
"Compare the technical architecture with business requirements"
)
return response
result = advanced_rag_with_agent()
Custom Spans for Performance Monitoring
from opentelemetry import trace
import openlit
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.evaluation import FaithfulnessEvaluator
openlit.init(
otlp_endpoint="https://api.orq.ai/v2/otel",
otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)
tracer = trace.get_tracer("llamaindex")
def rag_pipeline_with_evaluation():
with tracer.start_as_current_span("rag-pipeline") as pipeline_span:
with tracer.start_as_current_span("document-loading") as load_span:
documents = SimpleDirectoryReader("./data").load_data()
load_span.set_attribute("document_count", len(documents))
load_span.set_attribute("total_characters", sum(len(doc.text) for doc in documents))
with tracer.start_as_current_span("index-creation") as index_span:
index = VectorStoreIndex.from_documents(documents)
index_span.set_attribute("index_type", "vector")
with tracer.start_as_current_span("query-execution") as query_span:
query_engine = index.as_query_engine(similarity_top_k=3)
query = "What are the main conclusions from the research?"
response = query_engine.query(query)
query_span.set_attribute("query_length", len(query))
query_span.set_attribute("response_length", len(response.response))
query_span.set_attribute("source_nodes_count", len(response.source_nodes))
with tracer.start_as_current_span("evaluation") as eval_span:
evaluator = FaithfulnessEvaluator()
eval_result = evaluator.evaluate_response(query=query, response=response)
eval_span.set_attribute("faithfulness_score", eval_result.score)
eval_span.set_attribute("evaluation_passing", eval_result.passing)
pipeline_span.set_attribute("pipeline_success", True)
return response, eval_result
response, evaluation = rag_pipeline_with_evaluation()
Multi-modal RAG with Image Processing
import openlit
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.schema import ImageDocument
from llama_index.llms.openai import OpenAI
openlit.init(
otlp_endpoint="https://api.orq.ai/v2/otel",
otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)
def multimodal_rag_example():
# Load text and image documents
documents = SimpleDirectoryReader(
"./mixed_data",
required_exts=[".txt", ".pdf", ".jpg", ".png"]
).load_data()
# Create index that handles both text and images
index = VectorStoreIndex.from_documents(documents)
# Use GPT-4V for multi-modal queries
llm = OpenAI(model="gpt-4-vision-preview")
query_engine = index.as_query_engine(
llm=llm,
similarity_top_k=5
)
# Multi-modal query
response = query_engine.query(
"Analyze both the textual data and any charts or images. "
"What trends can you identify?"
)
return response
multimodal_response = multimodal_rag_example()
Next Steps
✅ Verify traces: Check your Orq.ai dashboard to see incoming traces ✅ Add custom attributes: Enhance traces with business-specific metadata ✅ Set up alerts: Configure monitoring for performance degradation ✅ Explore metrics: Use trace data for performance optimization
Related Documentation
Support
Updated about 23 hours ago