← Back to Tutorials
Transcribe Audio with Whisper-Large-v3-Turbo
AudioIntermediate15 min
Tenzro's audio runtime serves automatic speech recognition (ASR) out of the box. Wave 1 supports OpenAI Whisper, Distil-Whisper, Moonshine v2, and NVIDIA Parakeet/Canary. This tutorial uses whisper-large-v3-turbo— the strongest permissively-licensed ASR model in the catalog — to transcribe a WAV file via the CLI and JSON-RPC.
1. Download the model
# Whisper-Large-v3-Turbo is permissively licensed (MIT)
tenzro model download whisper-large-v3-turbo
# Output:
# Resolving artifact bundle from HuggingFace Hub...
# Source: openai/whisper-large-v3-turbo (ONNX export)
# License tier: Permissive
# Files: encoder.onnx, decoder.onnx, tokenizer.json, config.json
# SHA-256 verified for all files
# Saved to: ~/.tenzro/models/whisper-large-v3-turbo/2. Load into the audio runtime
The audio runtime decodes WAV via hound, MP3/FLAC via symphonia, computes a 128-bin mel-spectrogram with realfft, and feeds the encoder/decoder bundle.
# Load into the audio (ASR) runtime
tenzro audio load whisper-large-v3-turbo
# Output:
# Audio runtime loaded:
# Model: whisper-large-v3-turbo
# Modality: audio (ASR)
# Sample rate: 16000 Hz
# Mel bins: 128
# Bundle: encoder + decoder3. Transcribe via the CLI
# Transcribe a WAV file (also supports MP3 and FLAC via symphonia)
tenzro transcribe \
--model whisper-large-v3-turbo \
--audio ./meeting.wav \
--language en
# Output:
# Transcription:
# "The settlement layer reconciles every inference call against on-chain
# receipts. Each receipt binds the output to the model weights hash..."
#
# Duration: 42.3s
# Latency: 3.1s (real-time factor: 0.07x)4. Transcribe via JSON-RPC
# Equivalent JSON-RPC call. Audio is base64-encoded raw bytes.
AUDIO_B64=$(base64 -i ./meeting.wav)
curl https://rpc.tenzro.network \
-X POST \
-H "Content-Type: application/json" \
-d "{
\"jsonrpc\": \"2.0\",
\"id\": 1,
\"method\": \"tenzro_transcribe\",
\"params\": {
\"model_id\": \"whisper-large-v3-turbo\",
\"audio_base64\": \"$AUDIO_B64\",
\"language\": \"en\"
}
}" | jqA typical response:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"model_id": "whisper-large-v3-turbo",
"text": "The settlement layer reconciles every inference call against on-chain receipts...",
"language": "en",
"duration_secs": 42.3,
"latency_ms": 3100
}
}5. Other ASR models in the catalog
# Other audio models in the Tenzro catalog (all ASR-only in wave 1):
# moonshine-v2-tiny permissive, fastest CPU option
# moonshine-v2-base permissive
# distil-whisper-small.en permissive, English-only
# distil-whisper-medium.en permissive, English-only
# distil-whisper-large-v3 permissive
# whisper-large-v3-turbo permissive (used here)
# parakeet-tdt-0.6b-v3 permissive, NVIDIA encoder/decoder/joiner triple
# canary-1b-flash attribution (CC-BY-4.0 logged)See also
- Model serving documentation — the audio catalog and bundle layout
- Inference RPC reference— full
tenzro_transcribeschema - Become an AI Model Provider — serve audio models as a paid provider