This page describes features extending the AI Gateway, which provides a unified API for accessing multiple AI providers. To learn more, see AI Gateway.
Quick Start
Analyze images alongside text for multimodal AI interactions.Supported Formats
| Format | Use Case | Max Size | 
|---|---|---|
| JPEG/JPG | Photos, general images | 20MB | 
| PNG | Screenshots, diagrams | 20MB | 
| GIF | Static images only | 20MB | 
| WebP | Modern web images | 20MB | 
| Base64 | Embedded image data | - | 
| URLs | Public image links | - | 
Image Detail Levels
| Level | Resolution | Speed | Cost | Use Case | 
|---|---|---|---|---|
| "low" | 512x512 | Fast | Low | Quick overview | 
| "high" | Full resolution | Slow | High | Detailed analysis | 
| "auto" | Model decides | Medium | Medium | Balanced (default) | 
Code examples
Image Processing Patterns
Multiple Image Analysis
Image with Structured Output
OCR and Text Extraction
Common Use Cases
Document Processing
UI/UX Analysis
Chart and Graph Analysis
Performance Optimization
Image preprocessing
Batch processing
Error Handling
Best Practices
Image quality
- Use high-resolution images for detailed analysis
- Ensure good lighting and contrast
- Avoid blurry or distorted images
- Compress large files to improve upload speed
Prompt engineering
Cost optimization
- Use detail: "low"for simple analysis
- Resize large images before encoding
- Cache results for repeated analysis
- Batch similar image processing tasks
Troubleshooting
**Image not processing- Check file size (under 20MB)
- Verify supported format (JPEG, PNG, GIF, WebP)
- Ensure valid base64 encoding
- Test with public URL instead of base64
- Increase detail level to “high”
- Improve image quality/resolution
- Use more specific prompts
- Try different model (gpt-4o vs gpt-4o-mini)
- Reduce image size
- Use “low” detail for speed
- Optimize image compression
- Consider async processing for multiple images
Limitations
| Limitation | Details | Workaround | 
|---|---|---|
| File size | 20MB max per image | Compress before upload | 
| Image count | Varies by model (5-16) | Process in batches | 
| Video support | Static images only | Extract frames for analysis | 
| Real-time | Not suitable for live video | Use for screenshots/snapshots | 
| Privacy | Images sent to provider | Use on-premise models if needed |