PDF Input
This page describes features extending the AI Proxy, which provides a unified API for accessing multiple AI providers. To learn more, see AI Proxy.
Quick Start
Send PDF documents directly in chat messages for analysis and content extraction.
import fs from "fs";
const pdfBuffer = fs.readFileSync("contract.pdf");
const pdfBase64 = pdfBuffer.toString("base64");
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content: [
{
type: "text",
text: "Extract key terms and conditions from this contract",
},
{
type: "file",
file: {
file_data: `data:application/pdf;base64,${pdfBase64}`,
filename: "contract.pdf",
},
},
],
},
],
});
Configuration
Parameter | Type | Required | Description |
---|---|---|---|
type | "file" | Yes | Content type for file input |
file.file_data | string | Yes | Data URI with base64 PDF content |
file.filename | string | Yes | Name of the file for model context |
Format: data:application/pdf;base64,{base64_content}
Supported Models
Provider | Model | PDF Support |
---|---|---|
OpenAI | gpt-4o | ✅ Native |
OpenAI | gpt-4o-mini | ✅ Native |
OpenAI | gpt-4-turbo | ✅ Native |
Anthropic | claude-3-sonnet | ✅ Via conversion |
Anthropic | claude-3-haiku | ✅ Via conversion |
Use Cases
Scenario | Best Model | Example Prompt |
---|---|---|
Contract analysis | gpt-4o | "Extract key terms and obligations" |
Invoice processing | gpt-4o-mini | "Extract amounts, dates, vendor info" |
Research papers | gpt-4o | "Summarize methodology and findings" |
Form extraction | gpt-4o-mini | "Convert form data to JSON" |
Code examples
curl -X POST https://api.orq.ai/v2/proxy/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Please analyze this PDF document and provide a summary"
},
{
"type": "file",
"file": {
"file_data": "data:application/pdf;base64,YOUR_BASE64_ENCODED_PDF",
"filename": "document.pdf"
}
}
]
}
]
}'
from openai import OpenAI
import os
import base64
openai = OpenAI(
api_key=os.environ.get("ORQ_API_KEY"),
base_url="https://api.orq.ai/v2/proxy"
)
# Read and encode your PDF file
with open("document.pdf", "rb") as pdf_file:
pdf_base64 = base64.b64encode(pdf_file.read()).decode('utf-8')
response = openai.chat.completions.create(
model="openai/gpt-4o",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Please analyze this PDF document and provide a summary"
},
{
"type": "file",
"file": {
"file_data": f"data:application/pdf;base64,{pdf_base64}",
"filename": "document.pdf"
}
}
]
}
]
)
import OpenAI from "openai";
import fs from "fs";
const openai = new OpenAI({
apiKey: process.env.ORQ_API_KEY,
baseURL: "https://api.orq.ai/v2/proxy",
});
// Read and encode your PDF file
const pdfBuffer = fs.readFileSync("document.pdf");
const pdfBase64 = pdfBuffer.toString("base64");
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content: [
{
type: "text",
text: "Please analyze this PDF document and provide a summary",
},
{
type: "file",
file: {
file_data: `data:application/pdf;base64,${pdfBase64}`,
filename: "document.pdf",
},
},
],
},
],
});
File Handling
Reading PDF files:
// Node.js
const fs = require('fs');
const pdfBase64 = fs.readFileSync('document.pdf', 'base64');
// Browser (File input)
const fileInput = document.getElementById('pdf-upload');
const file = fileInput.files[0];
const arrayBuffer = await file.arrayBuffer();
const pdfBase64 = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)));
// Python
import base64
with open('document.pdf', 'rb') as f:
pdf_base64 = base64.b64encode(f.read()).decode('utf-8')
Size optimization:
// Check file size before encoding
const maxSize = 20 * 1024 * 1024; // 20MB
if (pdfBuffer.length > maxSize) {
throw new Error("PDF file too large. Consider compressing first.");
}
Best Practices
File preparation:
- Compress PDFs to reduce size (under 20MB recommended)
- Ensure text is selectable (not scanned images)
- Remove unnecessary pages for focused analysis
- Use clear, structured layouts for better extraction
Prompt engineering:
// Specific extraction
"Extract all dollar amounts and their associated line items as JSON";
// Structured analysis
"Provide a summary with these sections: Executive Summary, Key Findings, Recommendations";
// Data validation
"Verify if all required fields are present: name, date, signature, amount";
Error handling:
const processPDF = async (pdfPath, prompt) => {
try {
const pdfBase64 = fs.readFileSync(pdfPath, "base64");
if (pdfBase64.length > 50000000) {
// ~37MB base64 limit
throw new Error("PDF too large for processing");
}
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content: [
{ type: "text", text: prompt },
{
type: "file",
file: {
file_data: `data:application/pdf;base64,${pdfBase64}`,
filename: pdfPath.split("/").pop(), // Extract filename from path
},
},
],
},
],
});
return response.choices[0].message.content;
} catch (error) {
if (error.message.includes("context_length_exceeded")) {
throw new Error("PDF too large. Try splitting into smaller sections.");
}
throw error;
}
};
Troubleshooting
PDF not processing
- Verify base64 encoding is correct
- Check file size (under model's context limit)
- Ensure MIME type is
application/pdf
- Try with a different model
Poor extraction quality
- Use higher-quality models (gpt-4o vs gpt-4o-mini)
- Provide more specific prompts
- Break complex documents into sections
- Consider preprocessing scanned PDFs with OCR
Performance issues
- Compress PDFs before sending
- Extract only relevant pages
- Use streaming for large documents
- Cache results for repeated analysis
Limitations
Limitation | Details | Workaround |
---|---|---|
File size | Model context limits | Split large PDFs |
Scanned documents | Quality varies by model | Use OCR preprocessing |
Complex layouts | Tables/charts may not extract well | Use structured prompts |
Security | Sensitive documents sent to provider | Use on-premise models |
Cost | Large files consume more tokens | Optimize file size |
Advanced Usage
Batch processing:
const processPDFBatch = async (pdfPaths) => {
const results = await Promise.allSettled(
pdfPaths.map((path) => processPDF(path, "Extract key information")),
);
return results.map((result, index) => ({
file: pdfPaths[index],
success: result.status === "fulfilled",
data: result.status === "fulfilled" ? result.value : null,
error: result.status === "rejected" ? result.reason : null,
}));
};
Progressive analysis:
// Analyze in stages for large documents
const stages = [
"Identify document type and structure",
"Extract metadata (author, date, title)",
"Summarize each section",
"Extract actionable items",
];
for (const prompt of stages) {
const result = await processPDF(pdfPath, prompt);
console.log(`Stage: ${prompt}\nResult: ${result}\n`);
}
Content validation:
const validateExtraction = async (pdfPath, expectedFields) => {
const prompt = `Extract these fields as JSON: ${expectedFields.join(", ")}`;
const result = await processPDF(pdfPath, prompt);
try {
const data = JSON.parse(result);
const missing = expectedFields.filter((field) => !data[field]);
return {
valid: missing.length === 0,
missing,
data,
};
} catch (error) {
return { valid: false, error: "Invalid JSON response" };
}
};
Updated 4 days ago