Skip to content

whisper-large-v3 Adapter

Transcribes speech audio to text across 99 languages using openai/whisper-large-v3.

Model details

Field Value
Model openai/whisper-large-v3
Task transcribe
Domain audio, multilingual
License Apache 2.0

Install

pip install synapse-adapter-sdk
pip install transformers torch

Verified output schema

The transformers automatic-speech-recognition pipeline returns a single dict:

from transformers import pipeline

pipe = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")
result = pipe("audio.mp3")
# {"text": " And so my fellow Americans..."}

The adapter strips leading/trailing whitespace and sets payload.content to the transcription string, replacing the original audio reference. Provenance confidence is fixed at 1.0.

Audio input formats

payload.content carries the audio input. The transformers pipeline accepts:

Format Example
File path string "/data/earnings_call.mp3"
NumPy float32 array at 16 kHz np.array([...], dtype=np.float32)
Dict with array and sampling rate {"array": np.ndarray, "sampling_rate": 16000}

Common audio formats supported: MP3, WAV, FLAC, OGG, M4A.

Supported task types

  • transcribe

Supported domains

  • audio
  • multilingual

Usage example

import time
from transformers import pipeline
from whisper_large_v3_adapter import WhisperLargeV3Adapter

pipe    = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")
adapter = WhisperLargeV3Adapter()

# 1. Prepare model input — payload.content holds the audio reference
model_input = adapter.ingress(ir)
# {"audio": "/data/earnings_call.mp3"}

# 2. Run the model (caller's responsibility)
t0 = time.monotonic()
model_output = pipe(model_input["audio"])
latency_ms = int((time.monotonic() - t0) * 1000)
# {"text": " Good morning. Revenue for Q3 came in at..."}

# 3. Convert output back to canonical IR
result_ir = adapter.egress(model_output, ir, latency_ms=latency_ms)

# 4. Access the transcription — original audio reference is REPLACED
transcript = result_ir.payload.content
# "Good morning. Revenue for Q3 came in at..."

Whisper large-v3 automatically detects the spoken language — no language tag is required in the IR. To force a specific language or enable translation to English, pass generate_kwargs to the pipeline (caller's responsibility, outside the adapter contract).

Source

github.com/synapse-ir/adapters