Router.Audio.Transcriptions
Create a Transcription
Create transcriptionfrom orq_ai_sdk import Orq
import os
with Orq(
api_key=os.getenv("ORQ_API_KEY", ""),
) as orq:
res = orq.router.audio.transcriptions.create(model="Malibu", enable_logging=True, diarize=False, tag_audio_events=True, timestamps_granularity="word", temperature=0.5, timestamp_granularities=[
"word",
"segment",
], retry={
"on_codes": [
429,
500,
502,
503,
504,
],
}, load_balancer={
"type": "weight_based",
"models": [
{
"model": "openai/gpt-4o",
"weight": 0.7,
},
],
}, timeout={
"call_timeout": 30000,
}, orq={
"fallbacks": [
{
"model": "openai/gpt-4o-mini",
},
],
"retry": {
"on_codes": [
429,
500,
502,
503,
504,
],
},
"identity": {
"id": "contact_01ARZ3NDEKTSV4RRFFQ69G5FAV",
"display_name": "Jane Doe",
"email": "jane.doe@example.com",
"metadata": [
{
"department": "Engineering",
"role": "Senior Developer",
},
],
"logo_url": "https://example.com/avatars/jane-doe.jpg",
"tags": [
"hr",
"engineering",
],
},
"load_balancer": {
"type": "weight_based",
"models": [
{
"model": "openai/gpt-4o",
"weight": 0.7,
},
{
"model": "anthropic/claude-3-5-sonnet",
"weight": 0.3,
},
],
},
"timeout": {
"call_timeout": 30000,
},
})
# Handle response
print(res)
Show Parameters
Show Parameters
ID of the model to use
An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.
When enable_logging is set to false, zero retention mode is used. This disables history features like request stitching and is only available to enterprise customers.
Whether to annotate which speaker is currently talking in the uploaded file.
The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.
Whether to tag audio events like (laughter), (footsteps), etc. in the transcription.
The maximum amount of speakers talking in the uploaded file. Helps with predicting who speaks when, the maximum is 32.
The granularity of the timestamps in the transcription. Word provides word-level timestamps and character provides character-level timestamps per word.
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
The timestamp granularities to populate for this transcription. response_format must be set to verbose_json to use timestamp granularities. Either or both of these options are supported: “word” or “segment”. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.
The name to display on the trace. If not specified, the default system name will be used.
Array of fallback models to use if primary model fails
Retry configuration for the request
Show Properties of retry
Show Properties of retry
Number of retry attempts (1-5)
Load balancer configuration for the request.
Timeout configuration to apply to the request. If the request exceeds the timeout, it will be retried or fallback to the next model if configured.
Show Properties of orq
Show Properties of orq
The name to display on the trace. If not specified, the default system name will be used.
Array of fallback models to use if primary model fails
Retry configuration for the request
Show Properties of retry
Show Properties of retry
Number of retry attempts (1-5)
Information about the identity making the request. If the identity does not exist, it will be created automatically.
Show Properties of identity
Show Properties of identity
@deprecated Use identity instead. Information about the contact making the request.
Show Properties of ~~`contact`~~
Show Properties of ~~`contact`~~
Array of models with weights for load balancing requests