Tutorial — Multi-modal AI
Transcribe audio with Whisper-Turbo
The audio catalog covers Moonshine v2, Distil-Whisper, Whisper-large-v3-turbo, NVIDIA Parakeet, and Canary-1B-Flash. Whisper-Turbo is the strongest general-purpose default.
- Level
- Beginner
- Time
- ~10 min
- Prerequisites
- Tenzro CLI installed, .wav file
- Stack
- CLI · JSON-RPC
01
Load the transcription model
Whisper-Turbo balances quality and speed; Moonshine-tiny is a good edge default if you are bandwidth-constrained.
curl -X POST https://rpc.tenzro.network -H 'content-type: application/json' -d '{"jsonrpc":"2.0","id":1,"method":"tenzro_loadAudioModel","params":["whisper-large-v3-turbo"]}'02
Transcribe a wav file
The CLI handles mel-spectrogram preprocessing and BPE detokenization end-to-end.
tenzro transcribe audio.wav --model whisper-large-v3-turbo03
Pin a language for better accuracy
If you know the language ahead of time, pass it to avoid the language-ID step.
tenzro transcribe interview.wav \
--model whisper-large-v3-turbo \
--language en04
Call from JSON-RPC
The RPC returns a transcript string plus per-segment timestamps.
curl -s https://rpc.tenzro.network -H 'content-type: application/json' \
-d '{"jsonrpc":"2.0","id":1,"method":"tenzro_transcribe","params":{"model":"whisper-large-v3-turbo","audio_b64":"..."}}'Related