Vision
Analyze images alongside text. Pass image URLs or base64-encoded files in
chat/completions messages.PDF Input
Send PDF documents for extraction and analysis. Supported natively by compatible models.
Image Generation
Generate, edit, and vary images using DALL-E 2, DALL-E 3, and GPT Image 1.
Audio
Convert text to speech, transcribe audio files, and translate audio to English.
Modality compatibility
| Modality | Endpoint | Models |
|---|---|---|
| Vision (image input) | POST /v2/router/chat/completions | GPT-4o, Claude 3.x/4.x, Gemini, and others |
| PDF input | POST /v2/router/chat/completions | Models with native file support |
| Image generation | POST /v2/router/images/generations | openai/dall-e-2, openai/dall-e-3, openai/gpt-image-1 |
| Image editing | POST /v2/router/images/edits | openai/dall-e-2, openai/gpt-image-1 |
| Image variations | POST /v2/router/images/variations | openai/dall-e-2 |
| Text to speech | POST /v2/router/audio/speech | OpenAI TTS, ElevenLabs |
| Transcription | POST /v2/router/audio/transcriptions | openai/whisper-1 and others |
| Translation | POST /v2/router/audio/translations | openai/whisper-1 and others |