PDF Input

📖

This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.

Quick Start

Send PDF documents directly in chat messages for analysis and content extraction.

import fs from "fs";

const pdfBuffer = fs.readFileSync("contract.pdf");
const pdfBase64 = pdfBuffer.toString("base64");

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Extract key terms and conditions from this contract",
        },
        {
          type: "file",
          file: {
            file_data: `data:application/pdf;base64,${pdfBase64}`,
            filename: "contract.pdf",
          },
        },
      ],
    },
  ],
});

Configuration

ParameterTypeRequiredDescription
type"file"YesContent type for file input
file.file_datastringYesData URI with base64 PDF content
file.filenamestringYesName of the file for model context

Format: data:application/pdf;base64,{base64_content}

Supported Models

ProviderModelPDF Support
OpenAIgpt-4o✅ Native
OpenAIgpt-4o-mini✅ Native
OpenAIgpt-4-turbo✅ Native
Anthropicclaude-3-sonnet✅ Via conversion
Anthropicclaude-3-haiku✅ Via conversion

Use Cases

ScenarioBest ModelExample Prompt
Contract analysisgpt-4o"Extract key terms and obligations"
Invoice processinggpt-4o-mini"Extract amounts, dates, vendor info"
Research papersgpt-4o"Summarize methodology and findings"
Form extractiongpt-4o-mini"Convert form data to JSON"

Code examples

curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Please analyze this PDF document and provide a summary"
          },
          {
            "type": "file",
            "file": {
              "file_data": "data:application/pdf;base64,YOUR_BASE64_ENCODED_PDF",
              "filename": "document.pdf"
            }
          }
        ]
      }
    ]
  }'
from openai import OpenAI
import os
import base64

openai = OpenAI(
  api_key=os.environ.get("ORQ_API_KEY"),
  base_url="https://api.orq.ai/v2/proxy"
)

# Read and encode your PDF file
with open("document.pdf", "rb") as pdf_file:
    pdf_base64 = base64.b64encode(pdf_file.read()).decode('utf-8')

response = openai.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Please analyze this PDF document and provide a summary"
                },
                {
                    "type": "file",
                    "file": {
                        "file_data": f"data:application/pdf;base64,{pdf_base64}",
                        "filename": "document.pdf"
                    }
                }
            ]
        }
    ]
)
import OpenAI from "openai";
import fs from "fs";

const openai = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v2/proxy",
});

// Read and encode your PDF file
const pdfBuffer = fs.readFileSync("document.pdf");
const pdfBase64 = pdfBuffer.toString("base64");

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Please analyze this PDF document and provide a summary",
        },
        {
          type: "file",
          file: {
            file_data: `data:application/pdf;base64,${pdfBase64}`,
            filename: "document.pdf",
          },
        },
      ],
    },
  ],
});

File Handling

Reading PDF files:

// Node.js
const fs = require('fs');
const pdfBase64 = fs.readFileSync('document.pdf', 'base64');

// Browser (File input)
const fileInput = document.getElementById('pdf-upload');
const file = fileInput.files[0];
const arrayBuffer = await file.arrayBuffer();
const pdfBase64 = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)));

// Python
import base64
with open('document.pdf', 'rb') as f:
    pdf_base64 = base64.b64encode(f.read()).decode('utf-8')

Size optimization:

// Check file size before encoding
const maxSize = 20 * 1024 * 1024; // 20MB
if (pdfBuffer.length > maxSize) {
  throw new Error("PDF file too large. Consider compressing first.");
}

Best Practices

File preparation:

  • Compress PDFs to reduce size (under 20MB recommended)
  • Ensure text is selectable (not scanned images)
  • Remove unnecessary pages for focused analysis
  • Use clear, structured layouts for better extraction

Prompt engineering:

// Specific extraction
"Extract all dollar amounts and their associated line items as JSON";

// Structured analysis
"Provide a summary with these sections: Executive Summary, Key Findings, Recommendations";

// Data validation
"Verify if all required fields are present: name, date, signature, amount";

Error handling:

const processPDF = async (pdfPath, prompt) => {
  try {
    const pdfBase64 = fs.readFileSync(pdfPath, "base64");

    if (pdfBase64.length > 50000000) {
      // ~37MB base64 limit
      throw new Error("PDF too large for processing");
    }

    const response = await openai.chat.completions.create({
      model: "openai/gpt-4o",
      messages: [
        {
          role: "user",
          content: [
            { type: "text", text: prompt },
            {
              type: "file",
              file: {
                file_data: `data:application/pdf;base64,${pdfBase64}`,
                filename: pdfPath.split("/").pop(), // Extract filename from path
              },
            },
          ],
        },
      ],
    });

    return response.choices[0].message.content;
  } catch (error) {
    if (error.message.includes("context_length_exceeded")) {
      throw new Error("PDF too large. Try splitting into smaller sections.");
    }
    throw error;
  }
};

Troubleshooting

PDF not processing
  • Verify base64 encoding is correct
  • Check file size (under model's context limit)
  • Ensure MIME type is application/pdf
  • Try with a different model
Poor extraction quality
  • Use higher-quality models (gpt-4o vs gpt-4o-mini)
  • Provide more specific prompts
  • Break complex documents into sections
  • Consider preprocessing scanned PDFs with OCR
Performance issues
  • Compress PDFs before sending
  • Extract only relevant pages
  • Use streaming for large documents
  • Cache results for repeated analysis

Limitations

LimitationDetailsWorkaround
File sizeModel context limitsSplit large PDFs
Scanned documentsQuality varies by modelUse OCR preprocessing
Complex layoutsTables/charts may not extract wellUse structured prompts
SecuritySensitive documents sent to providerUse on-premise models
CostLarge files consume more tokensOptimize file size

Advanced Usage

Batch processing:

const processPDFBatch = async (pdfPaths) => {
  const results = await Promise.allSettled(
    pdfPaths.map((path) => processPDF(path, "Extract key information")),
  );

  return results.map((result, index) => ({
    file: pdfPaths[index],
    success: result.status === "fulfilled",
    data: result.status === "fulfilled" ? result.value : null,
    error: result.status === "rejected" ? result.reason : null,
  }));
};

Progressive analysis:

// Analyze in stages for large documents
const stages = [
  "Identify document type and structure",
  "Extract metadata (author, date, title)",
  "Summarize each section",
  "Extract actionable items",
];

for (const prompt of stages) {
  const result = await processPDF(pdfPath, prompt);
  console.log(`Stage: ${prompt}\nResult: ${result}\n`);
}

Content validation:

const validateExtraction = async (pdfPath, expectedFields) => {
  const prompt = `Extract these fields as JSON: ${expectedFields.join(", ")}`;
  const result = await processPDF(pdfPath, prompt);

  try {
    const data = JSON.parse(result);
    const missing = expectedFields.filter((field) => !data[field]);

    return {
      valid: missing.length === 0,
      missing,
      data,
    };
  } catch (error) {
    return { valid: false, error: "Invalid JSON response" };
  }
};