Vision

📖
This page describes features extending the AI Gateway, which provides a unified API for accessing multiple AI providers. To learn more, see AI Gateway.

Quick Start

Analyze images alongside text for multimodal AI interactions.

import fs from "fs";

// Base64 encode image
const imageBuffer = fs.readFileSync("chart.png");
const base64Image = imageBuffer.toString("base64");

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Extract data from this sales chart and summarize trends",
        },
        {
          type: "image_url",
          image_url: {
            url: `data:image/png;base64,${base64Image}`,
            detail: "high", // For detailed analysis
          },
        },
      ],
    },
  ],
});

Supported Formats

Format	Use Case	Max Size
JPEG/JPG	Photos, general images	20MB
PNG	Screenshots, diagrams	20MB
GIF	Static images only	20MB
WebP	Modern web images	20MB
Base64	Embedded image data	-
URLs	Public image links	-

Image Detail Levels

Level	Resolution	Speed	Cost	Use Case
`"low"`	512x512	Fast	Low	Quick overview
`"high"`	Full resolution	Slow	High	Detailed analysis
`"auto"`	Model decides	Medium	Medium	Balanced (default)

Supported Models

Provider	Model	Support	Max Images
OpenAI	`gpt-4o`	✅ Full	10
OpenAI	`gpt-4o-mini`	✅ Full	10
OpenAI	`gpt-4-turbo`	✅ Full	10
Anthropic	`claude-3-5-sonnet`	✅ Full	5
Anthropic	`claude-3-haiku`	✅ Full	5
Google	`gemini-1.5-pro`	✅ Full	16

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image? Describe in detail."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image.jpg"
            }
          }
        ]
      }
    ]
  }'

from openai import OpenAI
import base64
import os

openai = OpenAI(
  api_key=os.environ.get("ORQ_API_KEY"),
  base_url="https://api.orq.ai/v2/proxy"
)

# Function to encode image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# Encode a local image
base64_image = encode_image("chart.png")

response = openai.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Analyze this chart and extract the key data points"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{base64_image}"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

import OpenAI from "openai";
import fs from "fs";

const openai = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v2/proxy",
});

// Function to encode image
function encodeImage(imagePath: string): string {
  const imageBuffer = fs.readFileSync(imagePath);
  return imageBuffer.toString("base64");
}

// Encode a local image
const base64Image = encodeImage("chart.png");

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Analyze this chart and extract the key data points",
        },
        {
          type: "image_url",
          image_url: {
            url: `data:image/png;base64,${base64Image}`,
          },
        },
      ],
    },
  ]
});

console.log(response.choices[0].message.content);

Image Processing Patterns

Multiple Image Analysis

const analyzeMultipleImages = async (images, prompt) => {
  const content = [
    { type: "text", text: prompt },
    ...images.map((url) => ({
      type: "image_url",
      image_url: { url, detail: "high" },
    })),
  ];

  return await openai.chat.completions.create({
    model: "openai/gpt-4o",
    messages: [{ role: "user", content }],
  });
};

// Usage
const images = [
  "https://example.com/before.jpg",
  "https://example.com/after.jpg",
];

const comparison = await analyzeMultipleImages(
  images,
  "Compare these before and after photos. What changes do you notice?",
);

Image with Structured Output

from pydantic import BaseModel
from typing import List

class ImageAnalysis(BaseModel):
    objects: List[str]
    text_content: str
    dominant_colors: List[str]
    estimated_age: str
    confidence: float

response = openai.beta.chat.completions.parse(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this image systematically"},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }],
    response_format=ImageAnalysis
)

analysis = response.choices[0].message.parsed
print(f"Objects found: {analysis.objects}")
print(f"Text content: {analysis.text_content}")

OCR and Text Extraction

const extractText = async (imageUrl) => {
  const response = await openai.chat.completions.create({
    model: "openai/gpt-4o",
    messages: [
      {
        role: "user",
        content: [
          {
            type: "text",
            text: "Extract all text from this image. Return as plain text, preserving formatting where possible.",
          },
          {
            type: "image_url",
            image_url: {
              url: imageUrl,
              detail: "high", // High detail for better OCR
            },
          },
        ],
      },
    ],
  });

  return response.choices[0].message.content;
};

Common Use Cases

Document Processing

def process_invoice(image_path):
    base64_image = encode_image(image_path)

    response = openai.chat.completions.create(
        model="openai/gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Extract invoice data: company name, date, total amount, line items"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}",
                        "detail": "high"
                    }
                }
            ]
        }]
    )

    return response.choices[0].message.content

UI/UX Analysis

const analyzeUI = async (screenshotPath) => {
  const base64Image = fs.readFileSync(screenshotPath, "base64");

  const response = await openai.chat.completions.create({
    model: "openai/gpt-4o",
    messages: [
      {
        role: "user",
        content: [
          {
            type: "text",
            text: "Analyze this UI screenshot. Identify usability issues, design inconsistencies, and suggest improvements.",
          },
          {
            type: "image_url",
            image_url: {
              url: `data:image/png;base64,${base64Image}`,
              detail: "high",
            },
          },
        ],
      },
    ],
  });

  return response.choices[0].message.content;
};

Chart and Graph Analysis

def analyze_chart(chart_image):
    response = openai.chat.completions.create(
        model="openai/gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Analyze this chart/graph and provide: 1) Data trends, 2) Key insights, 3) Specific values, 4) Recommendations"
                },
                {
                    "type": "image_url",
                    "image_url": {"url": chart_image, "detail": "high"}
                }
            ]
        }]
    )

    return response.choices[0].message.content

Performance Optimization

Image preprocessing

const optimizeImage = async (imagePath, maxSize = 2048) => {
  const sharp = require("sharp");

  const metadata = await sharp(imagePath).metadata();

  if (metadata.width > maxSize || metadata.height > maxSize) {
    const buffer = await sharp(imagePath)
      .resize(maxSize, maxSize, {
        fit: "inside",
        withoutEnlargement: true,
      })
      .jpeg({ quality: 85 })
      .toBuffer();

    return buffer.toString("base64");
  }

  return fs.readFileSync(imagePath, "base64");
};

Batch processing

import asyncio
from concurrent.futures import ThreadPoolExecutor

async def process_images_batch(image_paths, prompt):
    async def process_single_image(image_path):
        base64_image = encode_image(image_path)

        response = await openai.chat.completions.create(
            model="openai/gpt-4o",
            messages=[{
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{base64_image}"
                        }
                    }
                ]
            }]
        )

        return {
            "image": image_path,
            "analysis": response.choices[0].message.content
        }

    # Process images concurrently
    tasks = [process_single_image(path) for path in image_paths]
    results = await asyncio.gather(*tasks)

    return results

Error Handling

const safeImageAnalysis = async (imageUrl, prompt) => {
  try {
    // Validate image URL/format
    if (!isValidImageUrl(imageUrl)) {
      throw new Error("Invalid image URL or format");
    }

    const response = await openai.chat.completions.create({
      model: "openai/gpt-4o",
      messages: [
        {
          role: "user",
          content: [
            { type: "text", text: prompt },
            {
              type: "image_url",
              image_url: {
                url: imageUrl,
                detail: "auto",
              },
            },
          ],
        },
      ],
    });

    return response.choices[0].message.content;
  } catch (error) {
    if (error.message.includes("image_parse_error")) {
      return "Unable to process image. Please check format and size.";
    } else if (error.message.includes("content_policy_violation")) {
      return "Image content violates usage policies.";
    } else {
      console.error("Vision API error:", error);
      return "Error processing image. Please try again.";
    }
  }
};

const isValidImageUrl = (url) => {
  const imageExtensions = /\.(jpg|jpeg|png|gif|webp)$/i;
  return (
    url.startsWith("http") ||
    url.startsWith("data:image/") ||
    imageExtensions.test(url)
  );
};

Best Practices

Image quality

Use high-resolution images for detailed analysis
Ensure good lighting and contrast
Avoid blurry or distorted images
Compress large files to improve upload speed

Prompt engineering

const effectiveVisionPrompts = {
  general: "Describe what you see in this image",
  specific: "Focus on the text in the upper right corner",
  comparative: "Compare the layout of these two screenshots",
  analytical: "Extract all numerical data from this chart",
  instructional: "List step-by-step instructions shown in this diagram",
};

Cost optimization

Use detail: "low" for simple analysis
Resize large images before encoding
Cache results for repeated analysis
Batch similar image processing tasks

Troubleshooting

Image not processing

Check file size (under 20MB)
Verify supported format (JPEG, PNG, GIF, WebP)
Ensure valid base64 encoding
Test with public URL instead of base64

Poor analysis quality

Increase detail level to "high"
Improve image quality/resolution
Use more specific prompts
Try different model (gpt-4o vs gpt-4o-mini)

Slow performance

Reduce image size
Use "low" detail for speed
Optimize image compression
Consider async processing for multiple images

Limitations

Limitation	Details	Workaround
File size	20MB max per image	Compress before upload
Image count	Varies by model (5-16)	Process in batches
Video support	Static images only	Extract frames for analysis
Real-time	Not suitable for live video	Use for screenshots/snapshots
Privacy	Images sent to provider	Use on-premise models if needed

Advanced Features

Vision with streaming

stream = openai.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this complex image in detail"},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Vision with function calling

const tools = [
  {
    type: "function",
    function: {
      name: "extract_data",
      description: "Extract structured data from image",
      parameters: {
        type: "object",
        properties: {
          data_type: { type: "string" },
          values: { type: "array", items: { type: "string" } },
        },
      },
    },
  },
];

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Extract table data from this image" },
        { type: "image_url", image_url: { url: imageUrl } },
      ],
    },
  ],
  tools,
});