Tenzro
Tutorial — Multi-modal AI

Transcribe audio with Whisper-Turbo

The audio catalog covers Moonshine v2, Distil-Whisper, Whisper-large-v3-turbo, NVIDIA Parakeet, and Canary-1B-Flash. Whisper-Turbo is the strongest general-purpose default.
Level
Beginner
Time
~10 min
Prerequisites
Tenzro CLI installed, .wav file
Stack
CLI · JSON-RPC
01

Load the transcription model

Whisper-Turbo balances quality and speed; Moonshine-tiny is a good edge default if you are bandwidth-constrained.

curl -X POST https://rpc.tenzro.network -H 'content-type: application/json'   -d '{"jsonrpc":"2.0","id":1,"method":"tenzro_loadAudioModel","params":["whisper-large-v3-turbo"]}'
02

Transcribe a wav file

The CLI handles mel-spectrogram preprocessing and BPE detokenization end-to-end.

tenzro transcribe audio.wav --model whisper-large-v3-turbo
03

Pin a language for better accuracy

If you know the language ahead of time, pass it to avoid the language-ID step.

tenzro transcribe interview.wav \
  --model whisper-large-v3-turbo \
  --language en
04

Call from JSON-RPC

The RPC returns a transcript string plus per-segment timestamps.

curl -s https://rpc.tenzro.network -H 'content-type: application/json' \
  -d '{"jsonrpc":"2.0","id":1,"method":"tenzro_transcribe","params":{"model":"whisper-large-v3-turbo","audio_b64":"..."}}'
Related
← All tutorials