Models.
- STATUS
- Testnet
- CRATE
- tenzro-model
- STABILITY
- Stable
- TYPE
- Reference
Default LM catalog
Qwen 3 0.6B / 1.5B / 4B / 8B (default 0.6B)
Gemma 3 small / medium / large
Gemma 4 E2B / E4B / 12B / 26B-A4B (MoE) / 31B MTP-enabled targets
Mistral 7B / Nemo / Small / Medium
Phi 3 mini / small / medium
DeepSeek V3 chat / instruct
Granite 3B-instruct / 8B-instruct / Granite-HMulti-modal
Vision CLIP, SigLIP2, DINOv3
Timeseries TimesFM 2.5
Text embed Qwen3-Embedding, EmbeddingGemma, BGE-M3, Arctic
Segmentation SAM 2, EdgeSAM, MobileSAM
Detection RF-DETR, D-FINE
Audio Moonshine v2, Distil-Whisper, Whisper-v3-turbo,
Parakeet, CanaryLicense tiers
Each entry carries a license tier: Permissive, Attribution, CommercialCustom, NonCommercial. Tiered admission is enforced centrally in ModelRegistry::register_model().
Multi-Token Prediction (MTP)
Every HfModelEntry in the language catalog declares an optional speculative-decoding pairing:drafter_id (catalog ID of a vocab-matched drafter GGUF), mtp_kind(None | Generic | DraftMtp), andmtp_default_draft_n (recommended --spec-draft-n-max, 1..=6).
- Generic — classical two-model speculative decoding. Any vocab-matched smaller model can be paired as a drafter (e.g. Qwen 3 32B target + Qwen 3 0.6B drafter). llama.cpp flag:
--spec-type draft. - DraftMtp — jointly-trained Multi-Token-Prediction head. The drafter is a small auxiliary head trained on the target's hidden state, shipped by Unsloth as a sibling GGUF (e.g.
unsloth/gemma-4-12b-it-GGUF/MTP/mtp-gemma-4-12B-it.gguf). Higher accept rates than generic speculative decoding because the drafter mirrors the target's distribution. Unsloth measures 1.5–2.2× throughput on Gemma 4. llama.cpp flag:--spec-type draft-mtp.
Gemma 4 E2B / 12B / 31B all declare DraftMtp with draft_n: 2 as a starting point. Set draft_n in your tenzro_chat request to opt in. The runtime currently returns a structured MtpUnavailable error when MTP is requested but the in-process llama-cpp-2 binding lacks the speculative API — operators serving Gemma 4 MTP GGUFs via raw llama-cli outside the in-process runtime still get the throughput uplift on their hardware.
List
tenzro model list
tenzro model info gemma4-12b
tenzro chat gemma4-12b --draft-n 2