AI Proxy Overview

AI Proxy Overview

What is the AI Proxy?

Orquesta's AI Proxy is a unified gateway that provides seamless access to 15+ AI providers through a single, standardized API. Built on our advanced llm-adapter-v2 architecture, it eliminates vendor lock-in and simplifies AI integration across your applications.

Who is this for? Developers building AI-powered applications who need reliable, scalable access to multiple AI providers.

What you'll achieve: Unified API access to leading AI providers with automatic failover, intelligent routing, and comprehensive monitoring.

Key Benefits

  • Unified Interface: One API for OpenAI, Anthropic, Google AI, AWS Bedrock, and 10+ more providers
  • Intelligent Routing: Automatic provider selection based on model availability and performance
  • Built-in Reliability: Automatic retries, fallbacks, and error handling with provider-specific optimizations
  • Cost Optimization: Smart routing to minimize costs while maximizing performance
  • Real-time Monitoring: Complete observability with metrics, tracing, and analytics

Architecture Overview

The AI Proxy uses a modular architecture centered around the LLM Manager:

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────────┐
│   Your App      │───▶│   LLM Manager    │───▶│   Provider Layer    │
└─────────────────┘    └──────────────────┘    └─────────────────────┘
                              │                           │
                              ▼                           ▼
                       ┌─────────────────┐    ┌─────────────────────┐
                       │   Hooks System  │    │   15+ AI Providers  │
                       └─────────────────┘    └─────────────────────┘

Core Components

  • LLM Manager: Central orchestrator handling provider selection, request routing, and response processing
  • Provider Abstraction: Unified interface that normalizes differences between providers
  • Hooks System: Extensible lifecycle management for custom logic injection
  • Run Configuration: Flexible execution configurations supporting multiple model types

Supported Providers

Chat & Text Models

  • OpenAI: GPT-4, GPT-3.5-turbo, and all variants
  • OpenAI-Compatible: Perplexity, Groq, NVIDIA, TogetherAI, LiteLLM, Cerebras
  • Anthropic: Claude 3.5 Sonnet, Claude 3 Haiku/Opus
  • Google AI: Gemini 1.5 Pro/Flash, Gemma models
  • AWS Bedrock: Claude, Llama, Titan, and more
  • Azure OpenAI: All GPT models with enterprise features
  • Cohere: Command R+, Command Light
  • Bytedance: Doubao models

Specialized Providers

  • ElevenLabs: Premium text-to-speech and voice cloning
  • Fal: High-performance image generation
  • LeonardoAI: Creative image generation and editing
  • Jina: Advanced embeddings and reranking

Supported Model Types

TypeDescriptionStreamingProviders
Chat CompletionsConversational AI with message historyAll text providers
Text CompletionsSingle-turn text generationOpenAI, compatible providers
EmbeddingsVector representations for RAGOpenAI, Jina, Cohere
Image GenerationAI-powered image creationFal, LeonardoAI, OpenAI
Image EditingModify existing imagesOpenAI, LeonardoAI
VisionAnalyze images and answer questionsOpenAI, Anthropic, Google
Speech-to-TextTranscription and translationOpenAI, AWS
Text-to-SpeechHigh-quality audio generationElevenLabs, OpenAI
ModerationContent safety and complianceOpenAI
RerankingOptimize search resultsJina, Cohere

Core Features

Intelligent Request Routing

Automatically selects the best provider based on:

  • Model availability and pricing
  • Current provider health and latency
  • Custom routing rules and preferences

Advanced Error Handling

  • Provider-specific error parsing and retry logic
  • Automatic exponential backoff with jitter
  • Rate limit detection using provider headers
  • Graceful degradation and fallback providers

Lifecycle Hooks

const manager = new LLMManager({
  runs: [runConfiguration],
  hooks: {
    beforeCall: async (config) => {
      // Modify request before sending
      console.log('Making request to:', config.provider);
    },
    afterCall: async (response) => {
      // Process response before returning
      console.log('Response tokens:', response.usage?.total_tokens);
    }
  }
});

Comprehensive Monitoring

  • Real-time metrics and performance tracking
  • Distributed tracing across provider calls
  • Cost tracking and usage analytics
  • Provider health monitoring and alerting

Quick Start

Basic Chat Completion

curl -X POST https://api.orq.ai/v2/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello, world!"}]
  }'

Streaming Response

curl -X POST https://api.orq.ai/v2/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

Multi-Provider Fallback

curl -X POST https://api.orq.ai/v2/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": ["gpt-4", "claude-3-5-sonnet-20241022", "gemini-1.5-pro"],
    "messages": [{"role": "user", "content": "Analyze this data"}]
  }'

Next Steps

Need Help?

  • API Reference: Complete endpoint documentation
  • SDK Libraries: Official SDKs for Python, Node.js, and more
  • Community: Join our developer community for support and best practices