Multi-modal.

Seven ONNX-backed runtimes share one inference router: forecast, vision, text-embedding, segmentation, detection, audio ASR, and video. Modality dispatch happens at the router; pricing, latency, and reputation strategies apply per modality (independent provider pools).

STATUS: Testnet
CRATE: tenzro-model
ROUTER: InferenceRouter (modality-aware)
LICENSES: Permissive / Attribution / CommercialCustom / NonCommercial

Modality dispatch

Seven runtimes

TimeseriesRuntime — TimesFM 2.5 ([1, context_len] → [1, horizon] or quantile shapes).
VisionRuntime — CLIP ViT-B/32 + L/14, SigLIP2 base/large/so400m, DINOv3 vits16/vitb16/vitl16.
TextEmbeddingRuntime — Qwen3-Embedding 0.6B/4B/8B, EmbeddingGemma-300M (Matryoshka 768/512/256/128), BGE-M3, Snowflake Arctic Embed L v2.0, ModernBERT-embed base/large (8192-context RoPE encoder).
SegmentationRuntime — SAM 2 base/large, EdgeSAM, MobileSAM. Two-pass encoder/decoder; SAM-1 vs SAM-2 ABI dispatch.
DetectionRuntime — RF-DETR (n/s/m/b/l/2xl, 90-class COCO) + D-FINE (n/s/m/l/x, 80-class). NMS-free.
AudioRuntime — ASR-only catalog: Moonshine v2, Distil-Whisper, Whisper-large-v3-turbo, Parakeet-TDT-0.6B-v3, Canary-1B-Flash. Each is an ORT-backed Transcriber: encoder plus autoregressive decoder with KV-cache, per-family mel preprocessing, RNN-T joint decoding for Parakeet and Conformer AED for Canary. Reachable over JSON-RPC and over POST /v1/audio/transcriptions.
VideoRuntime — VisionFallbackVideoEncoder wraps any registered image encoder and uses the system ffmpeg CLI to extract num_frames evenly-spaced frames. Native video catalog is empty (no permissive ONNX-shippable encoder-only video model in 2026).

License-tier gating

Every catalog entry carries a license_tier. ModelRegistry::register_model() enforces it centrally: NonCommercial entries refuse to load without --accept-non-commercial; CommercialCustom (DINOv3, SAM, Gemma terms) require explicit --accept-license <id> per family. Permissive and Attribution load without prompts.

Artifact downloader

HfArtifactDownloader replaces the older single-file downloader. ArtifactSpec::SingleFile { filename, extension } handles GGUF and single-file ONNX. ArtifactSpec::Bundle { files, dir_name } handles multi-file ONNX (encoder + decoder + joiner, e.g. Parakeet). Atomic finalization is tmp-dir-rename.

Execution providers

All ONNX runtimes share one session builder that registers hardware execution providers before falling back to CPU. The onnx-tensorrt, onnx-cuda, and onnx-coreml cargo features compile in the corresponding providers; the default registration priority is TensorRT → CUDA → CoreML, restricted to whichever features are compiled in. The TENZRO_ONNX_EP environment variable overrides the priority as a comma-separated list drawn from tensorrt, cuda, coreml, cpu. A provider that fails to register logs a warning and falls through to the next — a GPU-featured binary on a machine without the matching driver still serves on CPU. A CUDA container image variant (Dockerfile.cuda) packages the GPU-featured binary with the CUDA runtime libraries, on x86_64 and arm64 alike. No prebuilt aarch64 GPU build of ONNX Runtime exists, so on that architecture the image compiles one with the CUDA execution provider in a separate stage rather than serving these modalities from CPU.

Serving embeddings — local or network

A node can serve embeddings two ways. Local-first: download a catalog encoder onto the node's own disk and serve it from there. Network: route the request to a remote provider already serving that model. Both paths use the same tenzro_textEmbed call — the router picks a local runtime handle when the model is loaded here, and a provider otherwise.

tenzro_loadTextEmbeddingModel with just { model_id } fetches the ONNX graph, its model.onnx_data external-data sidecar (when the export ships one — Qwen3-Embedding, EmbeddingGemma, BGE-M3 do; ModernBERT-embed is self-contained), and the tokenizer from HuggingFace as a co-located bundle onto the persistent models directory, then registers the encoder. The pooling family and dimensions are read from the catalog entry. To serve a self-hosted file already on disk, pass explicit path, tokenizer_path, and family instead.

# Local-first: download + serve a catalog encoder on this node
tenzro embed-text load --model qwen3-embedding-0.6b
tenzro embed-text list

# Embed strings (local runtime if loaded here, else a network provider)
tenzro embed-text run --model qwen3-embedding-0.6b \
  --input "the quick brown fox" --normalize

# JSON-RPC
tenzro_listTextEmbeddingCatalog | tenzro_listTextEmbeddingModels |
  tenzro_loadTextEmbeddingModel | tenzro_unloadTextEmbeddingModel |
  tenzro_textEmbed

# A2A skill: text-embed
# MCP tool: text_embed

OpenAI-compatible endpoints

Three modalities are reachable in the OpenAI wire shape as well as over tenzro_* JSON-RPC, so an existing OpenAI SDK client works unchanged by pointing its base URL at the node.

POST /v1/embeddings — input accepts a single string or an array; dimensions requests Matryoshka truncation on models that support it (EmbeddingGemma, Qwen3-Embedding). Alongside /v1/chat/completions, this endpoint is HTTP 402-gated when a payment gate is configured, and open otherwise.

curl https://rpc.tenzro.xyz/v1/embeddings \
  -H 'content-type: application/json' \
  -d '{
    "model": "qwen3-embedding-0.6b",
    "input": ["the quick brown fox", "lorem ipsum"],
    "dimensions": 512
  }'

# → { "object": "list",
#     "data": [ { "object": "embedding", "index": 0, "embedding": [...] }, ... ],
#     "model": "qwen3-embedding-0.6b",
#     "usage": { "prompt_tokens": 0, "total_tokens": 0 } }

POST /v1/audio/transcriptions — multipart/form-data over any transcriber loaded into the audio runtime. response_format selects json, text, verbose_json, srt, or vtt; the last three render per-segment time ranges, so requesting any of them makes the runtime emit segment timestamps whether or not timestamp_granularities asked for them. The body limit on this route is 128 MB — a chat-sized JSON limit would reject an ordinary audio file.

curl https://rpc.tenzro.xyz/v1/audio/transcriptions \
  -F file=@./interview.wav \
  -F model=parakeet-tdt-0.6b-v3 \
  -F response_format=verbose_json

# → { "task": "transcribe", "language": "en", "duration": 41.8,
#     "text": "…",
#     "segments": [ { "id": 0, "start": 0.0, "end": 3.2, "text": "…" }, ... ] }

POST /v1/images/generations — text-to-image over the media-generation job queue. The route posts a job, waits for a worker to reach a terminal status under a bounded deadline, then returns the rendered bytes base64-encoded alongside a tenzro receipt block. requester_did and requester_address are required: the queue binds every job to the identity that posted it, which owns the price ceiling, is the only party that can cancel, and is who settlement charges.

curl https://rpc.tenzro.xyz/v1/images/generations \
  -H 'content-type: application/json' \
  -d '{
    "model": "flux-schnell",
    "prompt": "a plaster studio room at dawn",
    "size": "1024x1024",
    "requester_did": "did:tenzro:machine:…",
    "requester_address": "0x4b2c…"
  }'

# → { "created": 1780560000,
#     "data": [ { "b64_json": "iVBORw0KGgo…", "revised_prompt": null } ],
#     "tenzro": { "job_id": "mgen_7c1f…", "output_hash": "0x9a3e…",
#                 "seed_used": 1024, "price_paid": "820000000000000", ... } }

See OpenAI-compatible API for the full field tables and error codes on all three routes.

Cross-surface coverage

Each modality has matching JSON-RPC, MCP, A2A, CLI, and SDK paths. Example for vision:

# JSON-RPC
tenzro_listVisionCatalog | tenzro_listVisionModels |
  tenzro_loadVisionModel | tenzro_unloadVisionModel |
  tenzro_imageEmbed | tenzro_imageTextSimilarity

# CLI
tenzro embed-image catalog
tenzro embed-image load --model img --path /models/siglip2.onnx   --catalog-id siglip2-base-224
tenzro embed-image run --model img --image ./cat.png

# A2A skill: vision-embed
# MCP tools: vision_embed, vision_similarity

Modality flagged on events

TenzroEvent::ModelRegistered carries the modality so subscribers (UI, indexers, marketplace) can filter without an extra lookup.

← All docs