LlamaIndex

Integrate Orq.ai with LlamaIndex using OpenTelemetry

Getting Started

LlamaIndex is a powerful framework for building RAG (Retrieval-Augmented Generation) applications with LLMs. Tracing LlamaIndex with Orq.ai provides comprehensive insights into document indexing, retrieval performance, query processing, and LLM interactions to optimize your RAG applications.

Prerequisites

Before you begin, ensure you have:

  • An Orq.ai account and API key
  • LlamaIndex installed in your project
  • Python 3.8+
  • OpenAI API key (or other LLM provider credentials)

Install Dependencies

# Core LlamaIndex and OpenTelemetry packages
pip install llama-index opentelemetry-sdk opentelemetry-exporter-otlp

# Additional instrumentation packages
pip install openlit traceloop-sdk

# Optional: For advanced vector stores and embeddings
pip install llama-index-vector-stores-chroma llama-index-embeddings-openai

Configure Orq.ai

Set up your environment variables to connect to Orq.ai's OpenTelemetry collector:

Unix/Linux/macOS:

export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer <ORQ_API_KEY>"
export OTEL_RESOURCE_ATTRIBUTES="service.name=llamaindex-app,service.version=1.0.0"
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

Windows (PowerShell):

$env:OTEL_EXPORTER_OTLP_ENDPOINT = "https://api.orq.ai/v2/otel"
$env:OTEL_EXPORTER_OTLP_HEADERS = "Authorization=Bearer <ORQ_API_KEY>"
$env:OTEL_RESOURCE_ATTRIBUTES = "service.name=llamaindex-app,service.version=1.0.0"
$env:OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY>"

Using .env file:

OTEL_EXPORTER_OTLP_ENDPOINT=https://api.orq.ai/v2/otel
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer <ORQ_API_KEY>
OTEL_RESOURCE_ATTRIBUTES=service.name=llamaindex-app,service.version=1.0.0
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>

Integrations

Choose your preferred OpenTelemetry framework for collecting traces:

OpenLit

Auto-instrumentation with minimal setup:

import openlit
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Initialize OpenLit
openlit.init(
    otlp_endpoint="https://api.orq.ai/v2/otel",
    otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)

# Your LlamaIndex code is automatically traced
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic of these documents?")

OpenLLMetry

Non-intrusive tracing with decorators:

from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

Traceloop.init()

@workflow(name="llamaindex-rag-workflow")
def create_rag_pipeline():
    # Load and index documents
    documents = SimpleDirectoryReader("./data").load_data()
    index = VectorStoreIndex.from_documents(documents)

    # Create query engine
    query_engine = index.as_query_engine(similarity_top_k=3)

    # Query the index
    response = query_engine.query("Summarize the key findings from the documents")

    return response.response

result = create_rag_pipeline()

Logfire

Pydantic-based observability:

import logfire
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

logfire.configure()

# LlamaIndex integration with Logfire
logfire.instrument_openai()

def build_index_with_logfire():
    with logfire.span("document-loading"):
        documents = SimpleDirectoryReader("./data").load_data()

    with logfire.span("index-creation"):
        index = VectorStoreIndex.from_documents(documents)

    with logfire.span("query-execution") as span:
        query_engine = index.as_query_engine()
        response = query_engine.query("What are the main themes?")
        span.set_attribute("response_length", len(response.response))

    return response

result = build_index_with_logfire()

OpenInference

Arize-compatible tracing with LlamaIndex instrumentation:

from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Initialize OpenTelemetry
tracer_provider = trace_sdk.TracerProvider()
tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(tracer_provider)

# Instrument LlamaIndex
LlamaIndexInstrumentor().instrument()

# Your LlamaIndex code is now automatically traced
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
response = query_engine.query("Analyze the document content")

MLFlow

MLOps-focused tracing:

import mlflow
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Enable MLflow tracing
mlflow.llama_index.autolog()

@mlflow.trace
def rag_pipeline_with_mlflow(query: str):
    # Load documents
    documents = SimpleDirectoryReader("./data").load_data()

    # Create index
    index = VectorStoreIndex.from_documents(documents)

    # Query
    query_engine = index.as_query_engine(similarity_top_k=5)
    response = query_engine.query(query)

    return response.response

result = rag_pipeline_with_mlflow("What is the summary of the documents?")

Examples

Basic RAG Pipeline

import openlit
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Initialize tracing
openlit.init(
    otlp_endpoint="https://api.orq.ai/v2/otel",
    otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)

# Configure LlamaIndex settings
Settings.llm = OpenAI(model="gpt-4", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

def basic_rag_example():
    # Load documents
    print("Loading documents...")
    documents = SimpleDirectoryReader("./data").load_data()

    # Create vector index
    print("Creating vector index...")
    index = VectorStoreIndex.from_documents(documents)

    # Create query engine
    query_engine = index.as_query_engine(
        similarity_top_k=3,
        response_mode="compact"
    )

    # Query the index
    queries = [
        "What is the main topic discussed in the documents?",
        "Can you summarize the key findings?",
        "What recommendations are provided?"
    ]

    for query in queries:
        print(f"\nQuery: {query}")
        response = query_engine.query(query)
        print(f"Response: {response.response}")

    return query_engine

engine = basic_rag_example()

Advanced RAG with Custom Tools

import openlit
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

openlit.init(
    otlp_endpoint="https://api.orq.ai/v2/otel",
    otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)

def advanced_rag_with_agent():
    # Create multiple indexes for different document types
    tech_docs = SimpleDirectoryReader("./tech_docs").load_data()
    business_docs = SimpleDirectoryReader("./business_docs").load_data()

    tech_index = VectorStoreIndex.from_documents(tech_docs)
    business_index = VectorStoreIndex.from_documents(business_docs)

    # Create query engines
    tech_engine = tech_index.as_query_engine(similarity_top_k=3)
    business_engine = business_index.as_query_engine(similarity_top_k=3)

    # Create tools
    tech_tool = QueryEngineTool(
        query_engine=tech_engine,
        metadata=ToolMetadata(
            name="tech_docs",
            description="Technical documentation search tool"
        )
    )

    business_tool = QueryEngineTool(
        query_engine=business_engine,
        metadata=ToolMetadata(
            name="business_docs",
            description="Business documentation search tool"
        )
    )

    # Create agent
    llm = OpenAI(model="gpt-4", temperature=0)
    agent = ReActAgent.from_tools(
        [tech_tool, business_tool],
        llm=llm,
        verbose=True
    )

    # Query the agent
    response = agent.chat(
        "Compare the technical architecture with business requirements"
    )

    return response

result = advanced_rag_with_agent()

Custom Spans for Performance Monitoring

from opentelemetry import trace
import openlit
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.evaluation import FaithfulnessEvaluator

openlit.init(
    otlp_endpoint="https://api.orq.ai/v2/otel",
    otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)

tracer = trace.get_tracer("llamaindex")

def rag_pipeline_with_evaluation():
    with tracer.start_as_current_span("rag-pipeline") as pipeline_span:

        with tracer.start_as_current_span("document-loading") as load_span:
            documents = SimpleDirectoryReader("./data").load_data()
            load_span.set_attribute("document_count", len(documents))
            load_span.set_attribute("total_characters", sum(len(doc.text) for doc in documents))

        with tracer.start_as_current_span("index-creation") as index_span:
            index = VectorStoreIndex.from_documents(documents)
            index_span.set_attribute("index_type", "vector")

        with tracer.start_as_current_span("query-execution") as query_span:
            query_engine = index.as_query_engine(similarity_top_k=3)
            query = "What are the main conclusions from the research?"
            response = query_engine.query(query)

            query_span.set_attribute("query_length", len(query))
            query_span.set_attribute("response_length", len(response.response))
            query_span.set_attribute("source_nodes_count", len(response.source_nodes))

        with tracer.start_as_current_span("evaluation") as eval_span:
            evaluator = FaithfulnessEvaluator()
            eval_result = evaluator.evaluate_response(query=query, response=response)

            eval_span.set_attribute("faithfulness_score", eval_result.score)
            eval_span.set_attribute("evaluation_passing", eval_result.passing)

        pipeline_span.set_attribute("pipeline_success", True)
        return response, eval_result

response, evaluation = rag_pipeline_with_evaluation()

Multi-modal RAG with Image Processing

import openlit
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.schema import ImageDocument
from llama_index.llms.openai import OpenAI

openlit.init(
    otlp_endpoint="https://api.orq.ai/v2/otel",
    otlp_headers="Authorization=Bearer <ORQ_API_KEY>"
)

def multimodal_rag_example():
    # Load text and image documents
    documents = SimpleDirectoryReader(
        "./mixed_data",
        required_exts=[".txt", ".pdf", ".jpg", ".png"]
    ).load_data()

    # Create index that handles both text and images
    index = VectorStoreIndex.from_documents(documents)

    # Use GPT-4V for multi-modal queries
    llm = OpenAI(model="gpt-4-vision-preview")

    query_engine = index.as_query_engine(
        llm=llm,
        similarity_top_k=5
    )

    # Multi-modal query
    response = query_engine.query(
        "Analyze both the textual data and any charts or images. "
        "What trends can you identify?"
    )

    return response

multimodal_response = multimodal_rag_example()

Next Steps

Verify traces: Check your Orq.ai dashboard to see incoming traces ✅ Add custom attributes: Enhance traces with business-specific metadata ✅ Set up alerts: Configure monitoring for performance degradation ✅ Explore metrics: Use trace data for performance optimization

Related Documentation

Support