AI Proxy Overview
AI Proxy Overview
What is the AI Proxy?
Orquesta's AI Proxy is a unified gateway that provides seamless access to 15+ AI providers through a single, standardized API. Built on our advanced llm-adapter-v2 architecture, it eliminates vendor lock-in and simplifies AI integration across your applications.
Who is this for? Developers building AI-powered applications who need reliable, scalable access to multiple AI providers.
What you'll achieve: Unified API access to leading AI providers with automatic failover, intelligent routing, and comprehensive monitoring.
Key Benefits
- Unified Interface: One API for OpenAI, Anthropic, Google AI, AWS Bedrock, and 10+ more providers
- Intelligent Routing: Automatic provider selection based on model availability and performance
- Built-in Reliability: Automatic retries, fallbacks, and error handling with provider-specific optimizations
- Cost Optimization: Smart routing to minimize costs while maximizing performance
- Real-time Monitoring: Complete observability with metrics, tracing, and analytics
Architecture Overview
The AI Proxy uses a modular architecture centered around the LLM Manager:
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────────┐
│ Your App │───▶│ LLM Manager │───▶│ Provider Layer │
└─────────────────┘ └──────────────────┘ └─────────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────────┐
│ Hooks System │ │ 15+ AI Providers │
└─────────────────┘ └─────────────────────┘
Core Components
- LLM Manager: Central orchestrator handling provider selection, request routing, and response processing
- Provider Abstraction: Unified interface that normalizes differences between providers
- Hooks System: Extensible lifecycle management for custom logic injection
- Run Configuration: Flexible execution configurations supporting multiple model types
Supported Providers
Chat & Text Models
- OpenAI: GPT-4, GPT-3.5-turbo, and all variants
- OpenAI-Compatible: Perplexity, Groq, NVIDIA, TogetherAI, LiteLLM, Cerebras
- Anthropic: Claude 3.5 Sonnet, Claude 3 Haiku/Opus
- Google AI: Gemini 1.5 Pro/Flash, Gemma models
- AWS Bedrock: Claude, Llama, Titan, and more
- Azure OpenAI: All GPT models with enterprise features
- Cohere: Command R+, Command Light
- Bytedance: Doubao models
Specialized Providers
- ElevenLabs: Premium text-to-speech and voice cloning
- Fal: High-performance image generation
- LeonardoAI: Creative image generation and editing
- Jina: Advanced embeddings and reranking
Supported Model Types
Type | Description | Streaming | Providers |
---|---|---|---|
Chat Completions | Conversational AI with message history | ✅ | All text providers |
Text Completions | Single-turn text generation | ✅ | OpenAI, compatible providers |
Embeddings | Vector representations for RAG | ❌ | OpenAI, Jina, Cohere |
Image Generation | AI-powered image creation | ❌ | Fal, LeonardoAI, OpenAI |
Image Editing | Modify existing images | ❌ | OpenAI, LeonardoAI |
Vision | Analyze images and answer questions | ✅ | OpenAI, Anthropic, Google |
Speech-to-Text | Transcription and translation | ❌ | OpenAI, AWS |
Text-to-Speech | High-quality audio generation | ❌ | ElevenLabs, OpenAI |
Moderation | Content safety and compliance | ❌ | OpenAI |
Reranking | Optimize search results | ❌ | Jina, Cohere |
Core Features
Intelligent Request Routing
Automatically selects the best provider based on:
- Model availability and pricing
- Current provider health and latency
- Custom routing rules and preferences
Advanced Error Handling
- Provider-specific error parsing and retry logic
- Automatic exponential backoff with jitter
- Rate limit detection using provider headers
- Graceful degradation and fallback providers
Lifecycle Hooks
const manager = new LLMManager({
runs: [runConfiguration],
hooks: {
beforeCall: async (config) => {
// Modify request before sending
console.log('Making request to:', config.provider);
},
afterCall: async (response) => {
// Process response before returning
console.log('Response tokens:', response.usage?.total_tokens);
}
}
});
Comprehensive Monitoring
- Real-time metrics and performance tracking
- Distributed tracing across provider calls
- Cost tracking and usage analytics
- Provider health monitoring and alerting
Quick Start
Basic Chat Completion
curl -X POST https://api.orq.ai/v2/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello, world!"}]
}'
Streaming Response
curl -X POST https://api.orq.ai/v2/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'
Multi-Provider Fallback
curl -X POST https://api.orq.ai/v2/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": ["gpt-4", "claude-3-5-sonnet-20241022", "gemini-1.5-pro"],
"messages": [{"role": "user", "content": "Analyze this data"}]
}'
Next Steps
- Capabilities: Explore specific features like streaming, tool calling, and vision
- Retries & Fallbacks: Configure robust error handling
- Supported Models: Browse all available models
- Integration Guides: Connect with popular frameworks
Need Help?
- API Reference: Complete endpoint documentation
- SDK Libraries: Official SDKs for Python, Node.js, and more
- Community: Join our developer community for support and best practices
Updated about 17 hours ago