Skip to main content

AI Gateway

Overview

LlamaIndex is a powerful framework for building RAG (Retrieval-Augmented Generation) applications with LLMs. Connecting LlamaIndex to Orq.ai’s AI Gateway transforms experimental RAG applications into production-ready systems with enterprise-grade capabilities.

Key Benefits

Orq.ai’s AI Gateway enhances LlamaIndex applications with:

Complete Observability

Track document indexing, retrieval performance, and query processing with detailed traces

Built-in Reliability

Automatic fallbacks, retries, and load balancing for production resilience

Cost Optimization

Real-time cost tracking and spend management across all AI operations

Multi-Provider Access

Access 300+ LLMs and 20+ providers through a single, unified integration

Prerequisites

Before integrating LlamaIndex with Orq.ai, ensure the following are in place:
  • An Orq.ai account and API Key
  • Python 3.8 or higher
  • LlamaIndex installed in your project
To set up an API key, see API keys & Endpoints.

Installation

Install LlamaIndex and required dependencies:
pip install llama-index llama-index-llms-openai-like

Configuration

Configure LlamaIndex to use Orq.ai’s AI Gateway with the OpenAILike class:
Python
from llama_index.llms.openai_like import OpenAILike
import os

# Configure OpenAI-compatible LLM with Orq.ai AI Gateway
llm = OpenAILike(
    model="gpt-4o",
    api_key=os.getenv("ORQ_API_KEY"),
    api_base="https://api.orq.ai/v3/router",
    is_chat_model=True,
)
api_base: https://api.orq.ai/v3/router

Basic RAG Example

Here’s a complete example of building a RAG application with LlamaIndex through Orq.ai:
Python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.openai import OpenAIEmbedding
import os

# Configure LLM with Orq.ai AI Gateway
llm = OpenAILike(
    model="gpt-4o",
    api_key=os.getenv("ORQ_API_KEY"),
    api_base="https://api.orq.ai/v3/router",
    is_chat_model=True,
)

# Configure embeddings through Orq.ai (required - LlamaIndex defaults to OpenAI)
embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
    api_key=os.getenv("ORQ_API_KEY"),
    api_base="https://api.orq.ai/v3/router",
)

# Set as global defaults
Settings.llm = llm
Settings.embed_model = embed_model

# Load documents and create index
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic of these documents?")
print(response)

Model Selection

Orq.ai supports any model from 20+ providers:
Python
from llama_index.llms.openai_like import OpenAILike
import os

# Use Claude
claude_llm = OpenAILike(
    model="anthropic/claude-sonnet-4-6",
    api_key=os.getenv("ORQ_API_KEY"),
    api_base="https://api.orq.ai/v3/router",
    is_chat_model=True,
)

# Use Gemini
gemini_llm = OpenAILike(
    model="google-ai/gemini-2.5-flash",
    api_key=os.getenv("ORQ_API_KEY"),
    api_base="https://api.orq.ai/v3/router",
    is_chat_model=True,
)

# Use any other model
groq_llm = OpenAILike(
    model="groq/llama-3.3-70b-versatile",
    api_key=os.getenv("ORQ_API_KEY"),
    api_base="https://api.orq.ai/v3/router",
    is_chat_model=True,
)

Streaming Responses

LlamaIndex supports streaming with Orq.ai:
Python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.openai import OpenAIEmbedding
import os

# Configure LLM
llm = OpenAILike(
    model="gpt-4o",
    api_key=os.getenv("ORQ_API_KEY"),
    api_base="https://api.orq.ai/v3/router",
    is_chat_model=True,
)

# Configure embeddings
embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
    api_key=os.getenv("ORQ_API_KEY"),
    api_base="https://api.orq.ai/v3/router",
)

Settings.llm = llm
Settings.embed_model = embed_model

# Create index and query engine
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(streaming=True)

# Stream response
streaming_response = query_engine.query("Explain the main concepts")
for text in streaming_response.response_gen:
    print(text, end="", flush=True)
print()