Tenzro
AI

Models.

Model registry and catalog. Default language models: Qwen 3, Gemma 3, Mistral, Phi 3, DeepSeek V3, Granite.
STATUS
Testnet
CRATE
tenzro-model
STABILITY
Stable
TYPE
Reference
01

Default LM catalog

Qwen 3        0.6B / 1.5B / 4B / 8B (default 0.6B)
Gemma 3       small / medium / large
Gemma 4       E2B / E4B / 12B / 26B-A4B (MoE) / 31B   MTP-enabled targets
Mistral       7B / Nemo / Small / Medium
Phi 3         mini / small / medium
DeepSeek V3   chat / instruct
Granite       3B-instruct / 8B-instruct / Granite-H
02

Multi-modal

Vision        CLIP, SigLIP2, DINOv3
Timeseries    TimesFM 2.5
Text embed    Qwen3-Embedding, EmbeddingGemma, BGE-M3, Arctic
Segmentation  SAM 2, EdgeSAM, MobileSAM
Detection     RF-DETR, D-FINE
Audio         Moonshine v2, Distil-Whisper, Whisper-v3-turbo,
              Parakeet, Canary
03

License tiers

Each entry carries a license tier: Permissive, Attribution, CommercialCustom, NonCommercial. Tiered admission is enforced centrally in ModelRegistry::register_model().

04

Multi-Token Prediction (MTP)

Every HfModelEntry in the language catalog declares an optional speculative-decoding pairing:drafter_id (catalog ID of a vocab-matched drafter GGUF), mtp_kind(None | Generic | DraftMtp), andmtp_default_draft_n (recommended --spec-draft-n-max, 1..=6).

  • Generic — classical two-model speculative decoding. Any vocab-matched smaller model can be paired as a drafter (e.g. Qwen 3 32B target + Qwen 3 0.6B drafter). llama.cpp flag: --spec-type draft.
  • DraftMtp — jointly-trained Multi-Token-Prediction head. The drafter is a small auxiliary head trained on the target's hidden state, shipped by Unsloth as a sibling GGUF (e.g. unsloth/gemma-4-12b-it-GGUF/MTP/mtp-gemma-4-12B-it.gguf). Higher accept rates than generic speculative decoding because the drafter mirrors the target's distribution. Unsloth measures 1.5–2.2× throughput on Gemma 4. llama.cpp flag: --spec-type draft-mtp.

Gemma 4 E2B / 12B / 31B all declare DraftMtp with draft_n: 2 as a starting point. Set draft_n in your tenzro_chat request to opt in. The runtime currently returns a structured MtpUnavailable error when MTP is requested but the in-process llama-cpp-2 binding lacks the speculative API — operators serving Gemma 4 MTP GGUFs via raw llama-cli outside the in-process runtime still get the throughput uplift on their hardware.

05

List

tenzro model list
tenzro model info gemma4-12b
tenzro chat gemma4-12b --draft-n 2
Related
← All docs