Documentation Index
Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
Use this file to discover all available pages before exploring further.
Quick Start
Send PDF documents directly in chat messages for analysis and content extraction.
import fs from "fs";
const pdfBuffer = fs.readFileSync("contract.pdf");
const pdfBase64 = pdfBuffer.toString("base64");
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content: [
{
type: "text",
text: "Extract key terms and conditions from this contract",
},
{
type: "file",
file: {
file_data: `data:application/pdf;base64,${pdfBase64}`,
filename: "contract.pdf",
},
},
],
},
],
});
Configuration
| Parameter | Type | Required | Description |
|---|
type | "file" | Yes | Content type for file input |
file.file_data | string | Yes | Data URI with base64 PDF content |
file.filename | string | Yes | Name of the file for model context |
Format: data:application/pdf;base64,{base64_content}
PDF input support varies by model. See the Supported Models page and check your provider’s documentation for PDF capability.
Use Cases
| Scenario | Example Prompt |
|---|
| Contract analysis | ”Extract key terms and obligations” |
| Invoice processing | ”Extract amounts, dates, vendor info” |
| Research papers | ”Summarize methodology and findings” |
| Form extraction | ”Convert form data to JSON” |
Code examples
curl -X POST https://api.orq.ai/v3/router/chat/completions \
-H "Authorization: Bearer $ORQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Please analyze this PDF document and provide a summary"
},
{
"type": "file",
"file": {
"file_data": "data:application/pdf;base64,YOUR_BASE64_ENCODED_PDF",
"filename": "document.pdf"
}
}
]
}
]
}'
File Handling
Reading PDF files:
// Node.js
const fs = require('fs');
const pdfBase64 = fs.readFileSync('document.pdf', 'base64');
// Browser (File input)
const fileInput = document.getElementById('pdf-upload');
const file = fileInput.files[0];
const arrayBuffer = await file.arrayBuffer();
const pdfBase64 = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)));
// Python
import base64
with open('document.pdf', 'rb') as f:
pdf_base64 = base64.b64encode(f.read()).decode('utf-8')
Size optimization:
// Check file size before encoding
const maxSize = 20 * 1024 * 1024; // 20MB
if (pdfBuffer.length > maxSize) {
throw new Error("PDF file too large. Consider compressing first.");
}
Best Practices
File preparation:
- Compress PDFs to reduce size (under 20MB recommended)
- Ensure text is selectable (not scanned images)
- Remove unnecessary pages for focused analysis
- Use clear, structured layouts for better extraction
Prompt engineering:
// Specific extraction
"Extract all dollar amounts and their associated line items as JSON";
// Structured analysis
"Provide a summary with these sections: Executive Summary, Key Findings, Recommendations";
// Data validation
"Verify if all required fields are present: name, date, signature, amount";
Error handling:
const processPDF = async (pdfPath, prompt) => {
try {
const pdfBase64 = fs.readFileSync(pdfPath, "base64");
if (pdfBase64.length > 50000000) {
// ~37MB base64 limit
throw new Error("PDF too large for processing");
}
const response = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [
{
role: "user",
content: [
{ type: "text", text: prompt },
{
type: "file",
file: {
file_data: `data:application/pdf;base64,${pdfBase64}`,
filename: pdfPath.split("/").pop(), // Extract filename from path
},
},
],
},
],
});
return response.choices[0].message.content;
} catch (error) {
if (error.message.includes("context_length_exceeded")) {
throw new Error("PDF too large. Try splitting into smaller sections.");
}
throw error;
}
};
Troubleshooting
PDF not processing
- Verify base64 encoding is correct
- Check file size (under model’s context limit)
- Ensure MIME type is
application/pdf
- Try with a different model
Poor extraction quality
- Use higher-quality models (gpt-4o vs gpt-4o-mini)
- Provide more specific prompts
- Break complex documents into sections
- Consider preprocessing scanned PDFs with OCR
Performance issues
- Compress PDFs before sending
- Extract only relevant pages
- Use streaming for large documents
- Cache results for repeated analysis
Limitations
| Limitation | Details | Workaround |
|---|
| File size | Model context limits | Split large PDFs |
| Scanned documents | Quality varies by model | Use OCR preprocessing |
| Complex layouts | Tables/charts may not extract well | Use structured prompts |
| Security | Sensitive documents sent to provider | Use on-premise models |
| Cost | Large files consume more tokens | Optimize file size |
Advanced Usage
Batch processing:
const processPDFBatch = async (pdfPaths) => {
const results = await Promise.allSettled(
pdfPaths.map((path) => processPDF(path, "Extract key information")),
);
return results.map((result, index) => ({
file: pdfPaths[index],
success: result.status === "fulfilled",
data: result.status === "fulfilled" ? result.value : null,
error: result.status === "rejected" ? result.reason : null,
}));
};
Progressive analysis:
// Analyze in stages for large documents
const stages = [
"Identify document type and structure",
"Extract metadata (author, date, title)",
"Summarize each section",
"Extract actionable items",
];
for (const prompt of stages) {
const result = await processPDF(pdfPath, prompt);
console.log(`Stage: ${prompt}\nResult: ${result}\n`);
}
Content validation:
const validateExtraction = async (pdfPath, expectedFields) => {
const prompt = `Extract these fields as JSON: ${expectedFields.join(", ")}`;
const result = await processPDF(pdfPath, prompt);
try {
const data = JSON.parse(result);
const missing = expectedFields.filter((field) => !data[field]);
return {
valid: missing.length === 0,
missing,
data,
};
} catch (error) {
return { valid: false, error: "Invalid JSON response" };
}
};