Model Serving
Tenzro enables any node to serve AI models and earn TNZO for inference. The node serves seven modalities — chat (GGUF via llama.cpp), timeseries forecasting, vision encoders, text embeddings, segmentation, object detection, audio ASR, and video embeddings (ONNX via ORT) — from a single binary. Providers register their model endpoints on-chain, set per-token pricing, and the InferenceRouter dispatches typed payloads to the right runtime based on each model's declared modality.
Provider Flow
Earning flow: User sends inference request, the Router picks your node, the model generates a response, the user pays per token, and TNZO settles to your wallet via a micropayment channel.
Step 1: Download a Model
The node downloads GGUF models directly from HuggingFace via HTTP streaming. Files are saved as flat files to ~/.tenzro/models/<model-id>.gguf (not subdirectories). Download progress is tracked with percentage updates:
# Download a model from HuggingFace
tenzro model download unsloth/gemma-3-270m-it-GGUF
# Download a specific quantization
tenzro model download TheBloke/Mistral-7B-Instruct-v0.2-GGUF \
--filename mistral-7b-instruct-v0.2.Q4_K_M.gguf
# List downloaded models
tenzro model list --local
# Check model integrity (SHA-256 verification)
tenzro model info unsloth/gemma-3-270m-it-GGUFStep 2: Serve the Model
# Serve a model locally
tenzro model serve unsloth/gemma-3-270m-it-GGUF \
--port 8080 \
--ctx-size 4096 \
--gpu-layers 35
# The serve command:
# 1. Starts local inference server with the GGUF file
# 2. Calls tenzro_serveModel RPC to register the endpoint
# 3. Exposes OpenAI-compatible API at http://localhost:8080/v1
# Serve on the network (registers endpoint on-chain)
tenzro model serve unsloth/gemma-3-270m-it-GGUF \
--remote # Registers endpoint via tenzro_serveModel RPC
# Stop serving (waits for in-flight requests before unloading)
tenzro model stop unsloth/gemma-3-270m-it-GGUFStep 3: Register as Provider
# Register as a model provider (requires staking)
tenzro provider register --role model-provider
# Stake TNZO (required for provider registration)
tenzro stake deposit --amount 1000 --role model-provider
# Set pricing for your model
tenzro provider pricing set \
--model gemma3-270m \
--per-token 0.0001 # TNZO per token
# Show current pricing
tenzro provider pricing show
# Set availability schedule
tenzro provider schedule set \
--timezone UTC \
--hours 0-24 # 24/7 availability
# Check provider status
tenzro provider statusStep 4: Chat and Earn
# Users can now chat with your model
tenzro chat --model gemma3-270m
# The chat command:
# 1. Tries local inference server first
# 2. Falls back to tenzro_chat RPC (routes to your provider)
# 3. Interactive REPL with /history and /load session management
# Or via RPC (for apps) — uses chat template for proper formatting
curl -X POST https://rpc.tenzro.network \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "tenzro_chat",
"params": [{
"model_id": "gemma3-270m",
"message": "Hello!",
"max_tokens": 200
}],
"id": 1
}'Multi-Modal Catalogs
Beyond chat, the node ships verified ONNX catalogs for six additional modalities. Each catalog entry resolves a HuggingFace repo + filename, runtime parameters (input shape, embedding dim, normalization preset), and a license tier. The CLI download and serve commands auto-detect modality from the registry; chat models route through tenzro_chat, the rest through their dedicated RPC surfaces.
| Modality | Runtime | Sample models | Inference RPC |
|---|---|---|---|
| Chat | llama.cpp (GGUF) | Gemma 3, Qwen 3, Mistral, DeepSeek | tenzro_chat |
| Forecast | TimeseriesRuntime | Chronos-2, Chronos-Bolt, TimesFM 2.5, Granite-TTM-r2 | tenzro_forecast |
| Vision | VisionRuntime | CLIP, SigLIP2, DINOv3, DINOv2 | tenzro_imageEmbed, tenzro_imageTextSimilarity |
| Text embedding | TextEmbeddingRuntime | Qwen3-Embedding, EmbeddingGemma-300M, BGE-M3, Snowflake Arctic | tenzro_textEmbed |
| Segmentation | SegmentationRuntime | SAM 3 / 3.1, SAM 2, EdgeSAM, MobileSAM | tenzro_segment |
| Detection | DetectionRuntime | RF-DETR (n/s/m/b/l/2xl), D-FINE (n/s/m/l/x) | tenzro_detect |
| Audio (ASR) | AudioRuntime | Moonshine v2, Distil-Whisper, Whisper-v3-turbo, Parakeet-TDT-v3, Canary-1B-Flash | tenzro_transcribe |
| Video | VideoRuntime (scaffold) | Frame-extraction + per-frame vision pooling fallback | tenzro_videoEmbed |
License tiers are enforced centrally by ModelRegistry::register_model(): Permissive (Apache/MIT/BSD) loads by default; Attribution (CC-BY-4.0 — Parakeet, Canary) logs attribution on first load; CommercialCustom (DINOv3, SAM, Gemma terms) requires --accept-license <id> per family; NonCommercial (CC-BY-NC, OpenRAIL-M) refuses without --accept-non-commercial.
Example: serve a forecast model end-to-end
# Browse the timeseries catalog
tenzro forecast catalog
# Download a Chronos-Bolt model (modality auto-detected from registry)
tenzro download chronos-bolt-base
# Serve it (loads into TimeseriesRuntime, registers the endpoint on-chain)
tenzro serve chronos-bolt-base
# Run a forecast against your local node
tenzro forecast --model chronos-bolt-base --series 1,2,3,4,5,6 --horizon 12
# Or via JSON-RPC against the testnet
curl -X POST https://rpc.tenzro.network \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "tenzro_forecast",
"params": [{
"model_id": "chronos-bolt-base",
"history": [42.1, 43.0, 41.8, 44.2, 45.0],
"horizon": 16
}],
"id": 1
}'The same shape applies to vision, text-embedding, segmentation, detection, audio, and video — see the CLI Reference and API Reference for the per-modality command and RPC surface.
Inference Routing Strategies
The InferenceRouter selects the best provider for each request using configurable strategies:
| Strategy | Description |
|---|---|
| Price | Route to the cheapest provider |
| Latency | Route to the fastest provider (lowest measured latency) |
| Reputation | Route to the highest-rated provider |
| Weighted | Balanced routing across all factors |
Provider Health Monitoring
The ProviderManager runs background health checks on all registered providers. Providers that fail health checks are temporarily removed from the routing pool (circuit breaker pattern). Health metrics include response time, error rate, and availability:
# Check your provider stats
curl -X POST https://rpc.tenzro.network \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "tenzro_providerStats",
"params": ["0xYourAddress..."],
"id": 1
}'
# Response:
# {
# "result": {
# "models_served": 2,
# "total_inferences": 15420,
# "total_earned": "154200000000000000000",
# "uptime_percent": 99.7,
# "avg_latency_ms": 245,
# "staked": "10000000000000000000000"
# }
# }Model Endpoints
# List all model endpoints on the network
tenzro model endpoints
# Get details for a specific endpoint
tenzro model endpoint --model gemma3-270m
# Via RPC
curl -X POST https://rpc.tenzro.network \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "tenzro_listModelEndpoints",
"params": [],
"id": 1
}'SDK Usage
import { ProviderClient } from "@tenzro/sdk";
const provider = new ProviderClient({
rpcUrl: "https://rpc.tenzro.network",
walletKey: process.env.PROVIDER_KEY,
});
// Register and serve
await provider.register({ role: "model_provider", stake: "10000" });
await provider.serveModel({
modelId: "gemma3-270m",
endpoint: "http://localhost:8080/v1",
pricing: { perToken: "0.0001" },
});
// Set availability
await provider.setSchedule({
timezone: "UTC",
hours: { start: 0, end: 24 },
});