Vision
This page describes features extending the AI Gateway, which provides a unified API for accessing multiple AI providers. To learn more, see AI Gateway.
Quick Start
Analyze images alongside text for multimodal AI interactions.
import fs from "fs";
// Base64 encode image
const imageBuffer = fs.readFileSync("chart.png");
const base64Image = imageBuffer.toString("base64");
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content: [
{
type: "text",
text: "Extract data from this sales chart and summarize trends",
},
{
type: "image_url",
image_url: {
url: `data:image/png;base64,${base64Image}`,
detail: "high", // For detailed analysis
},
},
],
},
],
});
Supported Formats
Format | Use Case | Max Size |
---|---|---|
JPEG/JPG | Photos, general images | 20MB |
PNG | Screenshots, diagrams | 20MB |
GIF | Static images only | 20MB |
WebP | Modern web images | 20MB |
Base64 | Embedded image data | - |
URLs | Public image links | - |
Image Detail Levels
Level | Resolution | Speed | Cost | Use Case |
---|---|---|---|---|
"low" | 512x512 | Fast | Low | Quick overview |
"high" | Full resolution | Slow | High | Detailed analysis |
"auto" | Model decides | Medium | Medium | Balanced (default) |
Supported Models
Provider | Model | Support | Max Images |
---|---|---|---|
OpenAI | gpt-4o | ✅ Full | 10 |
OpenAI | gpt-4o-mini | ✅ Full | 10 |
OpenAI | gpt-4-turbo | ✅ Full | 10 |
Anthropic | claude-3-5-sonnet | ✅ Full | 5 |
Anthropic | claude-3-haiku | ✅ Full | 5 |
gemini-1.5-pro | ✅ Full | 16 |
Code examples
curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image? Describe in detail."
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
}'
from openai import OpenAI
import base64
import os
openai = OpenAI(
api_key=os.environ.get("ORQ_API_KEY"),
base_url="https://api.orq.ai/v2/proxy"
)
# Function to encode image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Encode a local image
base64_image = encode_image("chart.png")
response = openai.chat.completions.create(
model="openai/gpt-4o",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Analyze this chart and extract the key data points"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{base64_image}"
}
}
]
}
]
)
print(response.choices[0].message.content)
import OpenAI from "openai";
import fs from "fs";
const openai = new OpenAI({
apiKey: process.env.ORQ_API_KEY,
baseURL: "https://api.orq.ai/v2/proxy",
});
// Function to encode image
function encodeImage(imagePath: string): string {
const imageBuffer = fs.readFileSync(imagePath);
return imageBuffer.toString("base64");
}
// Encode a local image
const base64Image = encodeImage("chart.png");
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content: [
{
type: "text",
text: "Analyze this chart and extract the key data points",
},
{
type: "image_url",
image_url: {
url: `data:image/png;base64,${base64Image}`,
},
},
],
},
]
});
console.log(response.choices[0].message.content);
Image Processing Patterns
Multiple Image Analysis
const analyzeMultipleImages = async (images, prompt) => {
const content = [
{ type: "text", text: prompt },
...images.map((url) => ({
type: "image_url",
image_url: { url, detail: "high" },
})),
];
return await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [{ role: "user", content }],
});
};
// Usage
const images = [
"https://example.com/before.jpg",
"https://example.com/after.jpg",
];
const comparison = await analyzeMultipleImages(
images,
"Compare these before and after photos. What changes do you notice?",
);
Image with Structured Output
from pydantic import BaseModel
from typing import List
class ImageAnalysis(BaseModel):
objects: List[str]
text_content: str
dominant_colors: List[str]
estimated_age: str
confidence: float
response = openai.beta.chat.completions.parse(
model="openai/gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Analyze this image systematically"},
{"type": "image_url", "image_url": {"url": image_url}}
]
}],
response_format=ImageAnalysis
)
analysis = response.choices[0].message.parsed
print(f"Objects found: {analysis.objects}")
print(f"Text content: {analysis.text_content}")
OCR and Text Extraction
const extractText = async (imageUrl) => {
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content: [
{
type: "text",
text: "Extract all text from this image. Return as plain text, preserving formatting where possible.",
},
{
type: "image_url",
image_url: {
url: imageUrl,
detail: "high", // High detail for better OCR
},
},
],
},
],
});
return response.choices[0].message.content;
};
Common Use Cases
Document Processing
def process_invoice(image_path):
base64_image = encode_image(image_path)
response = openai.chat.completions.create(
model="openai/gpt-4o",
messages=[{
"role": "user",
"content": [
{
"type": "text",
"text": "Extract invoice data: company name, date, total amount, line items"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}",
"detail": "high"
}
}
]
}]
)
return response.choices[0].message.content
UI/UX Analysis
const analyzeUI = async (screenshotPath) => {
const base64Image = fs.readFileSync(screenshotPath, "base64");
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content: [
{
type: "text",
text: "Analyze this UI screenshot. Identify usability issues, design inconsistencies, and suggest improvements.",
},
{
type: "image_url",
image_url: {
url: `data:image/png;base64,${base64Image}`,
detail: "high",
},
},
],
},
],
});
return response.choices[0].message.content;
};
Chart and Graph Analysis
def analyze_chart(chart_image):
response = openai.chat.completions.create(
model="openai/gpt-4o",
messages=[{
"role": "user",
"content": [
{
"type": "text",
"text": "Analyze this chart/graph and provide: 1) Data trends, 2) Key insights, 3) Specific values, 4) Recommendations"
},
{
"type": "image_url",
"image_url": {"url": chart_image, "detail": "high"}
}
]
}]
)
return response.choices[0].message.content
Performance Optimization
Image preprocessing
const optimizeImage = async (imagePath, maxSize = 2048) => {
const sharp = require("sharp");
const metadata = await sharp(imagePath).metadata();
if (metadata.width > maxSize || metadata.height > maxSize) {
const buffer = await sharp(imagePath)
.resize(maxSize, maxSize, {
fit: "inside",
withoutEnlargement: true,
})
.jpeg({ quality: 85 })
.toBuffer();
return buffer.toString("base64");
}
return fs.readFileSync(imagePath, "base64");
};
Batch processing
import asyncio
from concurrent.futures import ThreadPoolExecutor
async def process_images_batch(image_paths, prompt):
async def process_single_image(image_path):
base64_image = encode_image(image_path)
response = await openai.chat.completions.create(
model="openai/gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}]
)
return {
"image": image_path,
"analysis": response.choices[0].message.content
}
# Process images concurrently
tasks = [process_single_image(path) for path in image_paths]
results = await asyncio.gather(*tasks)
return results
Error Handling
const safeImageAnalysis = async (imageUrl, prompt) => {
try {
// Validate image URL/format
if (!isValidImageUrl(imageUrl)) {
throw new Error("Invalid image URL or format");
}
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content: [
{ type: "text", text: prompt },
{
type: "image_url",
image_url: {
url: imageUrl,
detail: "auto",
},
},
],
},
],
});
return response.choices[0].message.content;
} catch (error) {
if (error.message.includes("image_parse_error")) {
return "Unable to process image. Please check format and size.";
} else if (error.message.includes("content_policy_violation")) {
return "Image content violates usage policies.";
} else {
console.error("Vision API error:", error);
return "Error processing image. Please try again.";
}
}
};
const isValidImageUrl = (url) => {
const imageExtensions = /\.(jpg|jpeg|png|gif|webp)$/i;
return (
url.startsWith("http") ||
url.startsWith("data:image/") ||
imageExtensions.test(url)
);
};
Best Practices
Image quality
- Use high-resolution images for detailed analysis
- Ensure good lighting and contrast
- Avoid blurry or distorted images
- Compress large files to improve upload speed
Prompt engineering
const effectiveVisionPrompts = {
general: "Describe what you see in this image",
specific: "Focus on the text in the upper right corner",
comparative: "Compare the layout of these two screenshots",
analytical: "Extract all numerical data from this chart",
instructional: "List step-by-step instructions shown in this diagram",
};
Cost optimization
- Use
detail: "low"
for simple analysis - Resize large images before encoding
- Cache results for repeated analysis
- Batch similar image processing tasks
Troubleshooting
Image not processing
- Check file size (under 20MB)
- Verify supported format (JPEG, PNG, GIF, WebP)
- Ensure valid base64 encoding
- Test with public URL instead of base64
Poor analysis quality
- Increase detail level to "high"
- Improve image quality/resolution
- Use more specific prompts
- Try different model (gpt-4o vs gpt-4o-mini)
Slow performance
- Reduce image size
- Use "low" detail for speed
- Optimize image compression
- Consider async processing for multiple images
Limitations
Limitation | Details | Workaround |
---|---|---|
File size | 20MB max per image | Compress before upload |
Image count | Varies by model (5-16) | Process in batches |
Video support | Static images only | Extract frames for analysis |
Real-time | Not suitable for live video | Use for screenshots/snapshots |
Privacy | Images sent to provider | Use on-premise models if needed |
Advanced Features
Vision with streaming
stream = openai.chat.completions.create(
model="openai/gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this complex image in detail"},
{"type": "image_url", "image_url": {"url": image_url}}
]
}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Vision with function calling
const tools = [
{
type: "function",
function: {
name: "extract_data",
description: "Extract structured data from image",
parameters: {
type: "object",
properties: {
data_type: { type: "string" },
values: { type: "array", items: { type: "string" } },
},
},
},
},
];
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content: [
{ type: "text", text: "Extract table data from this image" },
{ type: "image_url", image_url: { url: imageUrl } },
],
},
],
tools,
});
Updated about 2 hours ago