One runtime. Any model. Any infrastructure.
Open model serving across every modality the network supports.
Eight runtimes for eight kinds of intelligence.
Chat
GGUF llama.cpp runtime serving Qwen 3, Gemma 3, Mistral, Phi 3, DeepSeek V3, Granite — OpenAI-compatible streaming, KV-cache reuse.
Forecast
ONNX timeseries runtime for TimesFM-class 200M foundation models — context to horizon, optional quantile output.
Vision embed
ONNX image-encoder runtime for CLIP ViT-B/32 and L/14, SigLIP2 base/large/so400m, DINOv3 vits/vitb/vitl.
Text embed
ONNX text-encoder runtime for Qwen3-Embedding (0.6B/4B/8B), EmbeddingGemma-300M, BGE-M3, Snowflake Arctic Embed L v2.0.
Segmentation
Two-pass ONNX runtime for SAM 2 (base/large), EdgeSAM, MobileSAM — point/box-promptable mask prediction.
Detection
ONNX object-detection runtime for RF-DETR (n/s/m/b/l/2xl, 90-class COCO) and D-FINE (n/s/m/l/x, 80-class COCO).
Audio (ASR)
Speech-to-text runtime for Moonshine v2, Distil-Whisper, Whisper-v3-turbo, NVIDIA Parakeet-TDT-v3, Canary-1B-Flash.
Video embed
Video encoder runtime — frame extraction via ffmpeg, mean-pooled image embeddings, optional L2-normalization.
From request to receipt.
- 01Provider registersA provider stakes TNZO, publishes its hardware profile, model catalog, pricing, and availability schedule. Health-checked by the network.
- 02Client requestsA client submits an inference request typed by modality — Chat, Forecast, VisionEmbed, TextEmbed, Segment, Detect, Transcribe, VideoEmbed.
- 03Router dispatchesInferenceRouter selects a provider by strategy — price, latency, reputation, or weighted — and routes to the correct runtime.
- 04Provider servesThe runtime executes the inference, returns the result, and emits a usage record. Optional TEE attestation for confidential execution.
- 05Payment settlesSettlement in TNZO via per-call payment, micropayment channel, or batched. Provider reputation updated on success/failure with asymmetric weighting.
- 06Receipt anchorsOptional on-chain ZK commitment binds the inference to the chain — verifiable by anyone, durable forever.
- Modalities
- Chat, Forecast, Vision, TextEmbed, Segment, Detect, Audio, Video
- Runtimes
- llama.cpp (GGUF), ONNX Runtime, optional CUDA / Metal acceleration
- Routing strategies
- Price, Latency, Reputation, Weighted
- Catalogs
- License-tier gated — Permissive, Attribution, CommercialCustom, NonCommercial
- Payment
- Per-call, per-token (micropayment channels), or batched in TNZO
- Verification
- Optional TEE attestation; optional Plonky3 commitment for inference proof