This page describes features extending the AI Gateway, which provides a unified API for accessing multiple AI providers. To learn more, see AI Gateway.
Quick Start
Analyze images alongside text for multimodal AI interactions.Supported Formats
| Format | Use Case | Max Size |
|---|---|---|
| JPEG/JPG | Photos, general images | 20MB |
| PNG | Screenshots, diagrams | 20MB |
| GIF | Static images only | 20MB |
| WebP | Modern web images | 20MB |
| Base64 | Embedded image data | - |
| URLs | Public image links | - |
Image Detail Levels
| Level | Resolution | Speed | Cost | Use Case |
|---|---|---|---|---|
"low" | 512x512 | Fast | Low | Quick overview |
"high" | Full resolution | Slow | High | Detailed analysis |
"auto" | Model decides | Medium | Medium | Balanced (default) |
Code examples
Image Processing Patterns
Multiple Image Analysis
Image with Structured Output
OCR and Text Extraction
Common Use Cases
Document Processing
UI/UX Analysis
Chart and Graph Analysis
Performance Optimization
Image preprocessing
Batch processing
Error Handling
Best Practices
Image quality
- Use high-resolution images for detailed analysis
- Ensure good lighting and contrast
- Avoid blurry or distorted images
- Compress large files to improve upload speed
Prompt engineering
Cost optimization
- Use
detail: "low"for simple analysis - Resize large images before encoding
- Cache results for repeated analysis
- Batch similar image processing tasks
Troubleshooting
**Image not processing- Check file size (under 20MB)
- Verify supported format (JPEG, PNG, GIF, WebP)
- Ensure valid base64 encoding
- Test with public URL instead of base64
- Increase detail level to “high”
- Improve image quality/resolution
- Use more specific prompts
- Try different model (gpt-4o vs gpt-4o-mini)
- Reduce image size
- Use “low” detail for speed
- Optimize image compression
- Consider async processing for multiple images
Limitations
| Limitation | Details | Workaround |
|---|---|---|
| File size | 20MB max per image | Compress before upload |
| Image count | Varies by model (5-16) | Process in batches |
| Video support | Static images only | Extract frames for analysis |
| Real-time | Not suitable for live video | Use for screenshots/snapshots |
| Privacy | Images sent to provider | Use on-premise models if needed |