AI Proxy Overview

What is the AI Proxy?

Orquesta's AI Proxy is a unified gateway that provides seamless access to 15+ AI providers through a single, standardized API. Built on our advanced llm-adapter-v2 architecture, it eliminates vendor lock-in and simplifies AI integration across your applications.

Who is this for? Developers building AI-powered applications who need reliable, scalable access to multiple AI providers.

What you'll achieve: Unified API access to leading AI providers with automatic failover, intelligent routing, and comprehensive monitoring.

Key Benefits

Unified Interface: One API for OpenAI, Anthropic, Google AI, AWS Bedrock, and 10+ more providers
Intelligent Routing: Automatic provider selection based on model availability and performance
Built-in Reliability: Automatic retries, fallbacks, and error handling with provider-specific optimizations
Cost Optimization: Smart routing to minimize costs while maximizing performance
Real-time Monitoring: Complete observability with metrics, tracing, and analytics

Architecture Overview

The AI Proxy uses a modular architecture centered around the LLM Manager:

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────────┐
│   Your App      │───▶│   LLM Manager    │───▶│   Provider Layer    │
└─────────────────┘    └──────────────────┘    └─────────────────────┘
                              │                           │
                              ▼                           ▼
                       ┌─────────────────┐    ┌─────────────────────┐
                       │   Hooks System  │    │   15+ AI Providers  │
                       └─────────────────┘    └─────────────────────┘

Core Components

LLM Manager: Central orchestrator handling provider selection, request routing, and response processing
Provider Abstraction: Unified interface that normalizes differences between providers
Hooks System: Extensible lifecycle management for custom logic injection
Run Configuration: Flexible execution configurations supporting multiple model types

Supported Providers

Chat & Text Models

OpenAI: GPT-4, GPT-3.5-turbo, and all variants
OpenAI-Compatible: Perplexity, Groq, NVIDIA, TogetherAI, LiteLLM, Cerebras
Anthropic: Claude 3.5 Sonnet, Claude 3 Haiku/Opus
Google AI: Gemini 1.5 Pro/Flash, Gemma models
AWS Bedrock: Claude, Llama, Titan, and more
Azure OpenAI: All GPT models with enterprise features
Cohere: Command R+, Command Light
Bytedance: Doubao models

Specialized Providers

ElevenLabs: Premium text-to-speech and voice cloning
Fal: High-performance image generation
LeonardoAI: Creative image generation and editing
Jina: Advanced embeddings and reranking

Supported Model Types

Type	Description	Streaming	Providers
Chat Completions	Conversational AI with message history	✅	All text providers
Text Completions	Single-turn text generation	✅	OpenAI, compatible providers
Embeddings	Vector representations for RAG	❌	OpenAI, Jina, Cohere
Image Generation	AI-powered image creation	❌	Fal, LeonardoAI, OpenAI
Image Editing	Modify existing images	❌	OpenAI, LeonardoAI
Vision	Analyze images and answer questions	✅	OpenAI, Anthropic, Google
Speech-to-Text	Transcription and translation	❌	OpenAI, AWS
Text-to-Speech	High-quality audio generation	❌	ElevenLabs, OpenAI
Moderation	Content safety and compliance	❌	OpenAI
Reranking	Optimize search results	❌	Jina, Cohere

Core Features

Intelligent Request Routing

Automatically selects the best provider based on:

Model availability and pricing
Current provider health and latency
Custom routing rules and preferences

Advanced Error Handling

Provider-specific error parsing and retry logic
Automatic exponential backoff with jitter
Rate limit detection using provider headers
Graceful degradation and fallback providers

Lifecycle Hooks

const manager = new LLMManager({
  runs: [runConfiguration],
  hooks: {
    beforeCall: async (config) => {
      // Modify request before sending
      console.log('Making request to:', config.provider);
    },
    afterCall: async (response) => {
      // Process response before returning
      console.log('Response tokens:', response.usage?.total_tokens);
    }
  }
});

Comprehensive Monitoring

Real-time metrics and performance tracking
Distributed tracing across provider calls
Cost tracking and usage analytics
Provider health monitoring and alerting

Quick Start

Basic Chat Completion

curl -X POST https://api.orq.ai/v2/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello, world!"}]
  }'

Streaming Response

curl -X POST https://api.orq.ai/v2/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

Multi-Provider Fallback

curl -X POST https://api.orq.ai/v2/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": ["gpt-4", "claude-3-5-sonnet-20241022", "gemini-1.5-pro"],
    "messages": [{"role": "user", "content": "Analyze this data"}]
  }'

Next Steps

Capabilities: Explore specific features like streaming, tool calling, and vision
Retries & Fallbacks: Configure robust error handling
Supported Models: Browse all available models
Integration Guides: Connect with popular frameworks

Need Help?

API Reference: Complete endpoint documentation
SDK Libraries: Official SDKs for Python, Node.js, and more
Community: Join our developer community for support and best practices