Vision

📖

This page describes features extending the AI Gateway, which provides a unified API for accessing multiple AI providers. To learn more, see AI Gateway.

Quick Start

Analyze images alongside text for multimodal AI interactions.

import fs from "fs";

// Base64 encode image
const imageBuffer = fs.readFileSync("chart.png");
const base64Image = imageBuffer.toString("base64");

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Extract data from this sales chart and summarize trends",
        },
        {
          type: "image_url",
          image_url: {
            url: `data:image/png;base64,${base64Image}`,
            detail: "high", // For detailed analysis
          },
        },
      ],
    },
  ],
});

Supported Formats

FormatUse CaseMax Size
JPEG/JPGPhotos, general images20MB
PNGScreenshots, diagrams20MB
GIFStatic images only20MB
WebPModern web images20MB
Base64Embedded image data-
URLsPublic image links-

Image Detail Levels

LevelResolutionSpeedCostUse Case
"low"512x512FastLowQuick overview
"high"Full resolutionSlowHighDetailed analysis
"auto"Model decidesMediumMediumBalanced (default)

Supported Models

ProviderModelSupportMax Images
OpenAIgpt-4o✅ Full10
OpenAIgpt-4o-mini✅ Full10
OpenAIgpt-4-turbo✅ Full10
Anthropicclaude-3-5-sonnet✅ Full5
Anthropicclaude-3-haiku✅ Full5
Googlegemini-1.5-pro✅ Full16

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image? Describe in detail."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image.jpg"
            }
          }
        ]
      }
    ]
  }'
from openai import OpenAI
import base64
import os

openai = OpenAI(
  api_key=os.environ.get("ORQ_API_KEY"),
  base_url="https://api.orq.ai/v2/proxy"
)

# Function to encode image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# Encode a local image
base64_image = encode_image("chart.png")

response = openai.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Analyze this chart and extract the key data points"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{base64_image}"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)
import OpenAI from "openai";
import fs from "fs";

const openai = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v2/proxy",
});

// Function to encode image
function encodeImage(imagePath: string): string {
  const imageBuffer = fs.readFileSync(imagePath);
  return imageBuffer.toString("base64");
}

// Encode a local image
const base64Image = encodeImage("chart.png");

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Analyze this chart and extract the key data points",
        },
        {
          type: "image_url",
          image_url: {
            url: `data:image/png;base64,${base64Image}`,
          },
        },
      ],
    },
  ]
});

console.log(response.choices[0].message.content);

Image Processing Patterns

Multiple Image Analysis

const analyzeMultipleImages = async (images, prompt) => {
  const content = [
    { type: "text", text: prompt },
    ...images.map((url) => ({
      type: "image_url",
      image_url: { url, detail: "high" },
    })),
  ];

  return await openai.chat.completions.create({
    model: "openai/gpt-4o",
    messages: [{ role: "user", content }],
  });
};

// Usage
const images = [
  "https://example.com/before.jpg",
  "https://example.com/after.jpg",
];

const comparison = await analyzeMultipleImages(
  images,
  "Compare these before and after photos. What changes do you notice?",
);

Image with Structured Output

from pydantic import BaseModel
from typing import List

class ImageAnalysis(BaseModel):
    objects: List[str]
    text_content: str
    dominant_colors: List[str]
    estimated_age: str
    confidence: float

response = openai.beta.chat.completions.parse(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this image systematically"},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }],
    response_format=ImageAnalysis
)

analysis = response.choices[0].message.parsed
print(f"Objects found: {analysis.objects}")
print(f"Text content: {analysis.text_content}")

OCR and Text Extraction

const extractText = async (imageUrl) => {
  const response = await openai.chat.completions.create({
    model: "openai/gpt-4o",
    messages: [
      {
        role: "user",
        content: [
          {
            type: "text",
            text: "Extract all text from this image. Return as plain text, preserving formatting where possible.",
          },
          {
            type: "image_url",
            image_url: {
              url: imageUrl,
              detail: "high", // High detail for better OCR
            },
          },
        ],
      },
    ],
  });

  return response.choices[0].message.content;
};

Common Use Cases

Document Processing

def process_invoice(image_path):
    base64_image = encode_image(image_path)

    response = openai.chat.completions.create(
        model="openai/gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Extract invoice data: company name, date, total amount, line items"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}",
                        "detail": "high"
                    }
                }
            ]
        }]
    )

    return response.choices[0].message.content

UI/UX Analysis

const analyzeUI = async (screenshotPath) => {
  const base64Image = fs.readFileSync(screenshotPath, "base64");

  const response = await openai.chat.completions.create({
    model: "openai/gpt-4o",
    messages: [
      {
        role: "user",
        content: [
          {
            type: "text",
            text: "Analyze this UI screenshot. Identify usability issues, design inconsistencies, and suggest improvements.",
          },
          {
            type: "image_url",
            image_url: {
              url: `data:image/png;base64,${base64Image}`,
              detail: "high",
            },
          },
        ],
      },
    ],
  });

  return response.choices[0].message.content;
};

Chart and Graph Analysis

def analyze_chart(chart_image):
    response = openai.chat.completions.create(
        model="openai/gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Analyze this chart/graph and provide: 1) Data trends, 2) Key insights, 3) Specific values, 4) Recommendations"
                },
                {
                    "type": "image_url",
                    "image_url": {"url": chart_image, "detail": "high"}
                }
            ]
        }]
    )

    return response.choices[0].message.content

Performance Optimization

Image preprocessing

const optimizeImage = async (imagePath, maxSize = 2048) => {
  const sharp = require("sharp");

  const metadata = await sharp(imagePath).metadata();

  if (metadata.width > maxSize || metadata.height > maxSize) {
    const buffer = await sharp(imagePath)
      .resize(maxSize, maxSize, {
        fit: "inside",
        withoutEnlargement: true,
      })
      .jpeg({ quality: 85 })
      .toBuffer();

    return buffer.toString("base64");
  }

  return fs.readFileSync(imagePath, "base64");
};

Batch processing

import asyncio
from concurrent.futures import ThreadPoolExecutor

async def process_images_batch(image_paths, prompt):
    async def process_single_image(image_path):
        base64_image = encode_image(image_path)

        response = await openai.chat.completions.create(
            model="openai/gpt-4o",
            messages=[{
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{base64_image}"
                        }
                    }
                ]
            }]
        )

        return {
            "image": image_path,
            "analysis": response.choices[0].message.content
        }

    # Process images concurrently
    tasks = [process_single_image(path) for path in image_paths]
    results = await asyncio.gather(*tasks)

    return results

Error Handling

const safeImageAnalysis = async (imageUrl, prompt) => {
  try {
    // Validate image URL/format
    if (!isValidImageUrl(imageUrl)) {
      throw new Error("Invalid image URL or format");
    }

    const response = await openai.chat.completions.create({
      model: "openai/gpt-4o",
      messages: [
        {
          role: "user",
          content: [
            { type: "text", text: prompt },
            {
              type: "image_url",
              image_url: {
                url: imageUrl,
                detail: "auto",
              },
            },
          ],
        },
      ],
    });

    return response.choices[0].message.content;
  } catch (error) {
    if (error.message.includes("image_parse_error")) {
      return "Unable to process image. Please check format and size.";
    } else if (error.message.includes("content_policy_violation")) {
      return "Image content violates usage policies.";
    } else {
      console.error("Vision API error:", error);
      return "Error processing image. Please try again.";
    }
  }
};

const isValidImageUrl = (url) => {
  const imageExtensions = /\.(jpg|jpeg|png|gif|webp)$/i;
  return (
    url.startsWith("http") ||
    url.startsWith("data:image/") ||
    imageExtensions.test(url)
  );
};

Best Practices

Image quality

  • Use high-resolution images for detailed analysis
  • Ensure good lighting and contrast
  • Avoid blurry or distorted images
  • Compress large files to improve upload speed

Prompt engineering

const effectiveVisionPrompts = {
  general: "Describe what you see in this image",
  specific: "Focus on the text in the upper right corner",
  comparative: "Compare the layout of these two screenshots",
  analytical: "Extract all numerical data from this chart",
  instructional: "List step-by-step instructions shown in this diagram",
};

Cost optimization

  • Use detail: "low" for simple analysis
  • Resize large images before encoding
  • Cache results for repeated analysis
  • Batch similar image processing tasks

Troubleshooting

Image not processing
  • Check file size (under 20MB)
  • Verify supported format (JPEG, PNG, GIF, WebP)
  • Ensure valid base64 encoding
  • Test with public URL instead of base64
Poor analysis quality
  • Increase detail level to "high"
  • Improve image quality/resolution
  • Use more specific prompts
  • Try different model (gpt-4o vs gpt-4o-mini)
Slow performance
  • Reduce image size
  • Use "low" detail for speed
  • Optimize image compression
  • Consider async processing for multiple images

Limitations

LimitationDetailsWorkaround
File size20MB max per imageCompress before upload
Image countVaries by model (5-16)Process in batches
Video supportStatic images onlyExtract frames for analysis
Real-timeNot suitable for live videoUse for screenshots/snapshots
PrivacyImages sent to providerUse on-premise models if needed

Advanced Features

Vision with streaming

stream = openai.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this complex image in detail"},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Vision with function calling

const tools = [
  {
    type: "function",
    function: {
      name: "extract_data",
      description: "Extract structured data from image",
      parameters: {
        type: "object",
        properties: {
          data_type: { type: "string" },
          values: { type: "array", items: { type: "string" } },
        },
      },
    },
  },
];

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Extract table data from this image" },
        { type: "image_url", image_url: { url: imageUrl } },
      ],
    },
  ],
  tools,
});