Multi-modal.
- STATUS
- Shipped
- CRATE
- tenzro-model
- ROUTER
- InferenceRouter (modality-aware)
- LICENSES
- Permissive / Attribution / CommercialCustom / NonCommercial
Modality dispatch
InferencePayload is a tagged union (Chat | Forecast | VisionEmbed | VisionSimilarity | TextEmbed | Segment | Detect | Transcribe | VideoEmbed). InferenceRouter::route() reads model.modality from the registry and dispatches the typed payload to the correct runtime handle.
Seven runtimes
- TimeseriesRuntime — TimesFM 2.5 (
[1, context_len] → [1, horizon]or quantile shapes). - VisionRuntime — CLIP ViT-B/32 + L/14, SigLIP2 base/large/so400m, DINOv3 vits16/vitb16/vitl16.
- TextEmbeddingRuntime — Qwen3-Embedding 0.6B/4B/8B, EmbeddingGemma-300M (Matryoshka 768/512/256/128), BGE-M3, Snowflake Arctic Embed L v2.0.
- SegmentationRuntime — SAM 2 base/large, EdgeSAM, MobileSAM. Two-pass encoder/decoder; SAM-1 vs SAM-2 ABI dispatch.
- DetectionRuntime — RF-DETR (n/s/m/b/l/2xl, 90-class COCO) + D-FINE (n/s/m/l/x, 80-class). NMS-free.
- AudioRuntime — ASR-only catalog: Moonshine v2, Distil-Whisper, Whisper-large-v3-turbo, Parakeet-TDT-0.6B-v3, Canary-1B-Flash. Moonshine and Parakeet ship as concrete ORT-backed transcribers; the remaining Whisper / Distil-Whisper / Canary backends share the same
Transcribertrait. - VideoRuntime —
VisionFallbackVideoEncoderwraps any registered image encoder and uses the systemffmpegCLI to extractnum_framesevenly-spaced frames. Native video catalog is empty (no permissive ONNX-shippable encoder-only video model in 2026).
License-tier gating
Every catalog entry carries a license_tier. ModelRegistry::register_model() enforces it centrally: NonCommercial entries refuse to load without --accept-non-commercial; CommercialCustom (DINOv3, SAM, Gemma terms) require explicit --accept-license <id> per family. Permissive and Attribution load without prompts.
Artifact downloader
HfArtifactDownloader replaces the older single-file downloader. ArtifactSpec::SingleFile { filename, extension } handles GGUF and single-file ONNX. ArtifactSpec::Bundle { files, dir_name } handles multi-file ONNX (encoder + decoder + joiner, e.g. Parakeet). Atomic finalization is tmp-dir-rename.
Cross-surface coverage
Each modality has matching JSON-RPC, MCP, A2A, CLI, and SDK paths. Example for vision:
# JSON-RPC
tenzro_listVisionCatalog | tenzro_listVisionModels |
tenzro_loadVisionModel | tenzro_unloadVisionModel |
tenzro_imageEmbed | tenzro_imageTextSimilarity
# CLI
tenzro embed-image --model siglip2-base --path ./cat.png
# A2A skill: vision-embed
# MCP tools: vision_embed, vision_similarityModality flagged on events
TenzroEvent::ModelRegistered carries the modality so subscribers (UI, indexers, marketplace) can filter without an extra lookup.