Tenzro Testnet is live —request testnet TNZO

Model Serving

Tenzro enables any node to serve AI models and earn TNZO for inference. The node serves seven modalities — chat (GGUF via llama.cpp), timeseries forecasting, vision encoders, text embeddings, segmentation, object detection, audio ASR, and video embeddings (ONNX via ORT) — from a single binary. Providers register their model endpoints on-chain, set per-token pricing, and the InferenceRouter dispatches typed payloads to the right runtime based on each model's declared modality.

Provider Flow

Earning flow: User sends inference request, the Router picks your node, the model generates a response, the user pays per token, and TNZO settles to your wallet via a micropayment channel.

Step 1: Download a Model

The node downloads GGUF models directly from HuggingFace via HTTP streaming. Files are saved as flat files to ~/.tenzro/models/<model-id>.gguf (not subdirectories). Download progress is tracked with percentage updates:

# Download a model from HuggingFace
tenzro model download unsloth/gemma-3-270m-it-GGUF

# Download a specific quantization
tenzro model download TheBloke/Mistral-7B-Instruct-v0.2-GGUF \
  --filename mistral-7b-instruct-v0.2.Q4_K_M.gguf

# List downloaded models
tenzro model list --local

# Check model integrity (SHA-256 verification)
tenzro model info unsloth/gemma-3-270m-it-GGUF

Step 2: Serve the Model

# Serve a model locally
tenzro model serve unsloth/gemma-3-270m-it-GGUF \
  --port 8080 \
  --ctx-size 4096 \
  --gpu-layers 35

# The serve command:
# 1. Starts local inference server with the GGUF file
# 2. Calls tenzro_serveModel RPC to register the endpoint
# 3. Exposes OpenAI-compatible API at http://localhost:8080/v1

# Serve on the network (registers endpoint on-chain)
tenzro model serve unsloth/gemma-3-270m-it-GGUF \
  --remote  # Registers endpoint via tenzro_serveModel RPC

# Stop serving (waits for in-flight requests before unloading)
tenzro model stop unsloth/gemma-3-270m-it-GGUF

Step 3: Register as Provider

# Register as a model provider (requires staking)
tenzro provider register --role model-provider

# Stake TNZO (required for provider registration)
tenzro stake deposit --amount 1000 --role model-provider

# Set pricing for your model
tenzro provider pricing set \
  --model gemma3-270m \
  --per-token 0.0001  # TNZO per token

# Show current pricing
tenzro provider pricing show

# Set availability schedule
tenzro provider schedule set \
  --timezone UTC \
  --hours 0-24  # 24/7 availability

# Check provider status
tenzro provider status

Step 4: Chat and Earn

# Users can now chat with your model
tenzro chat --model gemma3-270m

# The chat command:
# 1. Tries local inference server first
# 2. Falls back to tenzro_chat RPC (routes to your provider)
# 3. Interactive REPL with /history and /load session management

# Or via RPC (for apps)  uses chat template for proper formatting
curl -X POST https://rpc.tenzro.network \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tenzro_chat",
    "params": [{
      "model_id": "gemma3-270m",
      "message": "Hello!",
      "max_tokens": 200
    }],
    "id": 1
  }'

Multi-Modal Catalogs

Beyond chat, the node ships verified ONNX catalogs for six additional modalities. Each catalog entry resolves a HuggingFace repo + filename, runtime parameters (input shape, embedding dim, normalization preset), and a license tier. The CLI download and serve commands auto-detect modality from the registry; chat models route through tenzro_chat, the rest through their dedicated RPC surfaces.

ModalityRuntimeSample modelsInference RPC
Chatllama.cpp (GGUF)Gemma 3, Qwen 3, Mistral, DeepSeektenzro_chat
ForecastTimeseriesRuntimeChronos-2, Chronos-Bolt, TimesFM 2.5, Granite-TTM-r2tenzro_forecast
VisionVisionRuntimeCLIP, SigLIP2, DINOv3, DINOv2tenzro_imageEmbed, tenzro_imageTextSimilarity
Text embeddingTextEmbeddingRuntimeQwen3-Embedding, EmbeddingGemma-300M, BGE-M3, Snowflake Arctictenzro_textEmbed
SegmentationSegmentationRuntimeSAM 3 / 3.1, SAM 2, EdgeSAM, MobileSAMtenzro_segment
DetectionDetectionRuntimeRF-DETR (n/s/m/b/l/2xl), D-FINE (n/s/m/l/x)tenzro_detect
Audio (ASR)AudioRuntimeMoonshine v2, Distil-Whisper, Whisper-v3-turbo, Parakeet-TDT-v3, Canary-1B-Flashtenzro_transcribe
VideoVideoRuntime (scaffold)Frame-extraction + per-frame vision pooling fallbacktenzro_videoEmbed

License tiers are enforced centrally by ModelRegistry::register_model(): Permissive (Apache/MIT/BSD) loads by default; Attribution (CC-BY-4.0 — Parakeet, Canary) logs attribution on first load; CommercialCustom (DINOv3, SAM, Gemma terms) requires --accept-license <id> per family; NonCommercial (CC-BY-NC, OpenRAIL-M) refuses without --accept-non-commercial.

Example: serve a forecast model end-to-end

# Browse the timeseries catalog
tenzro forecast catalog

# Download a Chronos-Bolt model (modality auto-detected from registry)
tenzro download chronos-bolt-base

# Serve it (loads into TimeseriesRuntime, registers the endpoint on-chain)
tenzro serve chronos-bolt-base

# Run a forecast against your local node
tenzro forecast --model chronos-bolt-base --series 1,2,3,4,5,6 --horizon 12

# Or via JSON-RPC against the testnet
curl -X POST https://rpc.tenzro.network \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tenzro_forecast",
    "params": [{
      "model_id": "chronos-bolt-base",
      "history": [42.1, 43.0, 41.8, 44.2, 45.0],
      "horizon": 16
    }],
    "id": 1
  }'

The same shape applies to vision, text-embedding, segmentation, detection, audio, and video — see the CLI Reference and API Reference for the per-modality command and RPC surface.

Inference Routing Strategies

The InferenceRouter selects the best provider for each request using configurable strategies:

StrategyDescription
PriceRoute to the cheapest provider
LatencyRoute to the fastest provider (lowest measured latency)
ReputationRoute to the highest-rated provider
WeightedBalanced routing across all factors

Provider Health Monitoring

The ProviderManager runs background health checks on all registered providers. Providers that fail health checks are temporarily removed from the routing pool (circuit breaker pattern). Health metrics include response time, error rate, and availability:

# Check your provider stats
curl -X POST https://rpc.tenzro.network \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tenzro_providerStats",
    "params": ["0xYourAddress..."],
    "id": 1
  }'

# Response:
# {
#   "result": {
#     "models_served": 2,
#     "total_inferences": 15420,
#     "total_earned": "154200000000000000000",
#     "uptime_percent": 99.7,
#     "avg_latency_ms": 245,
#     "staked": "10000000000000000000000"
#   }
# }

Model Endpoints

# List all model endpoints on the network
tenzro model endpoints

# Get details for a specific endpoint
tenzro model endpoint --model gemma3-270m

# Via RPC
curl -X POST https://rpc.tenzro.network \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tenzro_listModelEndpoints",
    "params": [],
    "id": 1
  }'

SDK Usage

import { ProviderClient } from "@tenzro/sdk";

const provider = new ProviderClient({
  rpcUrl: "https://rpc.tenzro.network",
  walletKey: process.env.PROVIDER_KEY,
});

// Register and serve
await provider.register({ role: "model_provider", stake: "10000" });
await provider.serveModel({
  modelId: "gemma3-270m",
  endpoint: "http://localhost:8080/v1",
  pricing: { perToken: "0.0001" },
});

// Set availability
await provider.setSchedule({
  timezone: "UTC",
  hours: { start: 0, end: 24 },
});

Related Documentation

Models — Available models and registry
Inference — Making inference requests
Streaming Inference — Real-time token streaming
Micropayments — Per-token billing channels
MicroNode — Getting started as a provider