Skip to main content
POST
/
v2
/
gateway
/
audio
/
translations
Create translation
curl --request POST \
  --url https://api.orq.ai/v2/gateway/audio/translations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'model=<string>' \
  --form 'prompt=<string>' \
  --form enable_logging=true \
  --form diarize=false \
  --form response_format=json \
  --form tag_audio_events=true \
  --form num_speakers=123 \
  --form timestamps_granularity=word \
  --form temperature=0.5 \
  --form 'orq={
  "fallbacks": [
    {
      "model": "openai/gpt-4o-mini"
    }
  ],
  "retry": {
    "count": 3,
    "on_codes": [
      429,
      500,
      502,
      503,
      504
    ]
  },
  "contact": {
    "id": "contact_01ARZ3NDEKTSV4RRFFQ69G5FAV",
    "display_name": "Jane Doe",
    "email": "[email protected]",
    "metadata": [
      {
        "department": "Engineering",
        "role": "Senior Developer"
      }
    ],
    "logo_url": "https://example.com/avatars/jane-doe.jpg",
    "tags": [
      "hr",
      "engineering"
    ]
  },
  "load_balancer": [
    {
      "model": "openai/gpt-4o",
      "weight": 0.7
    },
    {
      "model": "anthropic/claude-3-5-sonnet",
      "weight": 0.3
    }
  ],
  "timeout": {
    "call_timeout": 30000
  }
}' \
  --form file='@example-file'
{
  "text": "<string>"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data

Translates audio into English.

model
string
required

ID of the model to use

prompt
string

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

enable_logging
boolean
default:true

When enable_logging is set to false, zero retention mode is used. This disables history features like request stitching and is only available to enterprise customers.

diarize
boolean
default:false

Whether to annotate which speaker is currently talking in the uploaded file.

response_format
enum<string>

The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.

Available options:
json,
text,
srt,
verbose_json,
vtt
tag_audio_events
boolean
default:true

Whether to tag audio events like (laughter), (footsteps), etc. in the transcription.

num_speakers
number

The maximum amount of speakers talking in the uploaded file. Helps with predicting who speaks when, the maximum is 32.

timestamps_granularity
enum<string>
default:word

The granularity of the timestamps in the transcription. Word provides word-level timestamps and character provides character-level timestamps per word.

Available options:
none,
word,
character
temperature
number

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

Example:

0.5

orq
object
file
file

The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

Response

Returns the translated text

text
string
required