curl --request POST \
--url https://api.orq.ai/v2/gateway/audio/transcriptions \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: multipart/form-data' \
--form 'model=<string>' \
--form 'prompt=<string>' \
--form enable_logging=true \
--form diarize=false \
--form response_format=json \
--form tag_audio_events=true \
--form num_speakers=123 \
--form timestamps_granularity=word \
--form temperature=0.5 \
--form 'language=<string>' \
--form 'timestamp_granularities[0]=word' \
--form 'timestamp_granularities[1]=segment' \
--form 'orq={
"fallbacks": [
{
"model": "openai/gpt-4o-mini"
}
],
"retry": {
"count": 3,
"on_codes": [
429,
500,
502,
503,
504
]
},
"contact": {
"id": "contact_01ARZ3NDEKTSV4RRFFQ69G5FAV",
"display_name": "Jane Doe",
"email": "[email protected]",
"metadata": [
{
"department": "Engineering",
"role": "Senior Developer"
}
],
"logo_url": "https://example.com/avatars/jane-doe.jpg",
"tags": [
"hr",
"engineering"
]
},
"load_balancer": [
{
"model": "openai/gpt-4o",
"weight": 0.7
},
{
"model": "anthropic/claude-3-5-sonnet",
"weight": 0.3
}
],
"timeout": {
"call_timeout": 30000
}
}' \
--form file='@example-file'{
"text": "<string>"
}curl --request POST \
--url https://api.orq.ai/v2/gateway/audio/transcriptions \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: multipart/form-data' \
--form 'model=<string>' \
--form 'prompt=<string>' \
--form enable_logging=true \
--form diarize=false \
--form response_format=json \
--form tag_audio_events=true \
--form num_speakers=123 \
--form timestamps_granularity=word \
--form temperature=0.5 \
--form 'language=<string>' \
--form 'timestamp_granularities[0]=word' \
--form 'timestamp_granularities[1]=segment' \
--form 'orq={
"fallbacks": [
{
"model": "openai/gpt-4o-mini"
}
],
"retry": {
"count": 3,
"on_codes": [
429,
500,
502,
503,
504
]
},
"contact": {
"id": "contact_01ARZ3NDEKTSV4RRFFQ69G5FAV",
"display_name": "Jane Doe",
"email": "[email protected]",
"metadata": [
{
"department": "Engineering",
"role": "Senior Developer"
}
],
"logo_url": "https://example.com/avatars/jane-doe.jpg",
"tags": [
"hr",
"engineering"
]
},
"load_balancer": [
{
"model": "openai/gpt-4o",
"weight": 0.7
},
{
"model": "anthropic/claude-3-5-sonnet",
"weight": 0.3
}
],
"timeout": {
"call_timeout": 30000
}
}' \
--form file='@example-file'{
"text": "<string>"
}Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Transcribes audio into the input language.
ID of the model to use
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
When enable_logging is set to false, zero retention mode is used. This disables history features like request stitching and is only available to enterprise customers.
Whether to annotate which speaker is currently talking in the uploaded file.
The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.
json, text, srt, verbose_json, vtt Whether to tag audio events like (laughter), (footsteps), etc. in the transcription.
The maximum amount of speakers talking in the uploaded file. Helps with predicting who speaks when, the maximum is 32.
The granularity of the timestamps in the transcription. Word provides word-level timestamps and character provides character-level timestamps per word.
none, word, character The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
0.5
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
The timestamp granularities to populate for this transcription. response_format must be set to verbose_json to use timestamp granularities. Either or both of these options are supported: "word" or "segment". Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.
word, segment ["word", "segment"]
Show child attributes
Retry configuration for the request
Information about the contact making the request. If the contact does not exist, it will be created automatically.
Show child attributes
Unique identifier for the contact
"contact_01ARZ3NDEKTSV4RRFFQ69G5FAV"
Display name of the contact
"Jane Doe"
Email address of the contact
URL to the contact's avatar or logo
"https://example.com/avatars/jane-doe.jpg"
A list of tags associated with the contact
["hr", "engineering"]
Array of models with weights for load balancing requests
[
{ "model": "openai/gpt-4o", "weight": 0.7 },
{
"model": "anthropic/claude-3-5-sonnet",
"weight": 0.3
}
]
The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
Returns the transcription or verbose transcription
Was this page helpful?