Become an AI Model Provider
Become an AI model provider on the Tenzro Network. Download a GGUF model from HuggingFace, serve it with the local inference server, register as a provider, set per-token pricing, and earn TNZO for every inference request you handle. This tutorial covers the full provider lifecycle from hardware detection to revenue monitoring.
What You'll Do
- Detect hardware capabilities and get optimal model recommendations
- Download a GGUF model from HuggingFace with SHA-256 verification
- Serve the model locally via the local inference server with GPU offloading
- Register as a provider and stake TNZO
- Set per-token pricing (input and output rates)
- Advertise the endpoint to the network
- Monitor health, load, and earnings
Step 1: Check Hardware Profile
Start by checking your hardware capabilities. The CLI detects CPU, GPU, memory, and TEE support, then recommends the largest model you can serve efficiently:
# Check hardware profile for optimal configuration
tenzro hardware
# Output:
# Hardware Profile:
# CPU: Apple M2 Max (12 cores)
# Memory: 64 GB
# GPU: Apple M2 Max (38 cores, 64 GB shared)
# Disk: 1.8 TB available
# TEE: None detected (simulation mode)
#
# Recommended configuration:
# Max model size: 30B parameters (Q4_K_M)
# GPU layers: all (unified memory)
# Context length: 8192
# Concurrent reqs: 4Step 2: Download a Model
Download a GGUF-quantized model from HuggingFace. The CLI uses the hf-hub crate and verifies the download with SHA-256 hashing:
# Download a GGUF model from HuggingFace
tenzro model download unsloth/Qwen3.5-0.8B-GGUF --quantization Q4_K_M
# Expected output:
# Downloading Qwen3.5-0.8B-GGUF (Q4_K_M)...
# Source: huggingface.co/unsloth/Qwen3.5-0.8B-GGUF
# File: qwen3.5-0.8b.Q4_K_M.gguf (508 MB)
# Progress: [========================================] 100%
# SHA-256: a1b2c3d4...ef56 (verified)
# Saved to: ~/.tenzro/models/qwen3.5-0.8b.Q4_K_M.gguf# List locally downloaded models
tenzro model list --local
# Output:
# Local Models:
# qwen3.5-0.8b.Q4_K_M 508 MB verified
# gemma3-270m.Q4_K_M 241 MB verifiedStep 3: Serve the Model
Start serving the model with the local inference server. The --gpu-layers flag controls how many transformer layers are offloaded to the GPU:
# Serve the model locally
tenzro model serve qwen3.5-0.8b.Q4_K_M \
--port 8081 \
--context-length 4096 \
--threads 8 \
--gpu-layers 35
# Output:
# Starting inference server...
# Model: qwen3.5-0.8b.Q4_K_M
# Port: 8081
# Context: 4096 tokens
# GPU layers: 35/35 offloaded
# Status: ready
# Endpoint: http://localhost:8081/v1/chat/completionsStep 4: Register as a Provider
Register on the network and stake TNZO. The stake serves as a quality guarantee — providers with higher stakes receive more inference traffic:
# Register as a model provider on the network
tenzro provider register \
--role model-provider \
--stake 1000
# Output:
# Provider registered successfully
# Provider ID: prov-7f3a9c1e
# Role: ModelProvider
# Stake: 1000 TNZO
# Status: ActiveStep 5: Set Pricing
Set per-token pricing for your model. Input tokens and output tokens have separate rates since generation is more compute-intensive than prompt processing:
# Set pricing for your model (TNZO per 1000 tokens)
tenzro provider pricing set \
--model qwen3.5-0.8b \
--input-price 0.05 \
--output-price 0.15
# Output:
# Pricing updated:
# Model: qwen3.5-0.8b
# Input: 0.05 TNZO / 1K tokens
# Output: 0.15 TNZO / 1K tokens
# Show current pricing
tenzro provider pricing show
# Output:
# Provider Pricing:
# qwen3.5-0.8b
# Input: 0.05 TNZO / 1K tokens
# Output: 0.15 TNZO / 1K tokensStep 6: Advertise to the Network
Use the --advertise flag to register your endpoint with the model registry. This makes your model discoverable by users and agents:
# Serve the model on the network (registers endpoint with registry)
tenzro model serve qwen3.5-0.8b.Q4_K_M \
--port 8081 \
--context-length 4096 \
--threads 8 \
--gpu-layers 35 \
--advertise # makes endpoint visible to the network
# Output:
# Model endpoint registered on network
# Model: qwen3.5-0.8b
# Endpoint: https://your-ip:8081/v1/chat/completions
# MCP endpoint: https://your-ip:8081/mcp
# Status: servingStep 7: Set Availability Schedule
Configure when your provider is active. Useful for home setups where you want to serve during off-peak hours:
# Set provider availability schedule
tenzro schedule set \
--weekdays 09:00-21:00 \
--weekends 10:00-18:00
# Output:
# Schedule updated:
# Mon-Fri: 09:00 - 21:00 UTC
# Sat-Sun: 10:00 - 18:00 UTC
# Enable/disable serving
tenzro schedule enable
tenzro schedule disableStep 8: Handle Inference Requests (Rust SDK)
The full programmatic flow for registering, pricing, and serving. The node handles request routing, payment verification, and settlement automatically:
use tenzro_sdk::{TenzroClient, config::SdkConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = SdkConfig::testnet();
let client = TenzroClient::connect(config).await?;
// Register as provider programmatically
let provider = client.provider();
let tx_hash = provider.register(
vec!["qwen3.5-0.8b".to_string()],
1000
).await?;
println!("Provider registration tx: {}", tx_hash);
// Serve model and handle requests
provider.serve_model("qwen3.5-0.8b.Q4_K_M").await?;
// The node handles incoming inference requests automatically.
// Each request triggers:
// 1. Payment verification (MPP or x402)
// 2. Inference execution via the local inference server
// 3. Token counting and billing
// 4. Settlement on-chain
println!("Serving model. Press Ctrl+C to stop.");
tokio::signal::ctrl_c().await?;
// Stop serving
provider.stop_model("qwen3.5-0.8b").await?;
Ok(())
}Payment flow. When a user sends an inference request, the node verifies payment via MPP (session-based streaming) or x402 (stateless one-shot). Tokens are counted after generation, the cost is calculated at your pricing rate, and settlement happens on-chain. The 0.5% network fee is deducted automatically.
Step 9: Test with Chat
# Test your model with the interactive chat
tenzro chat --model qwen3.5-0.8b
# Or send a single message via CLI
tenzro model inference \
--model qwen3.5-0.8b \
--prompt "What is the Tenzro Network?"
# Output:
# The Tenzro Network is a decentralized protocol designed for the AI age...
#
# Usage: 12 input tokens, 87 output tokens
# Cost: 0.0139 TNZOStep 10: Monitor Health and Earnings
# Check provider status and earnings
tenzro provider status
# Output:
# Provider Status:
# Provider ID: prov-7f3a9c1e
# Role: ModelProvider
# Stake: 1000 TNZO
# Status: Active
# Uptime: 4h 23m
# Models Serving: 1
#
# Model: qwen3.5-0.8b
# Total Requests: 847
# Input Tokens: 1,234,567
# Output Tokens: 456,789
# Revenue: 91.35 TNZO
# Avg Latency: 234ms
# Error Rate: 0.2%
# Check endpoint health
tenzro model endpoints
# Output:
# Model Endpoints:
# qwen3.5-0.8b
# API: https://your-ip:8081/v1/chat/completions [healthy]
# MCP: https://your-ip:8081/mcp [healthy]
# Load: 43% (847 req/4h)Beyond LLMs: Serving Multi-Modal Models
The same provider node can serve seven modalities, not just chat. Each modality has its own ONNX runtime, its own catalog, and its own RPC method. The InferenceRouter reads the model's modalityfield from the registry and dispatches incoming requests to the correct runtime — pricing, latency, and reputation strategies apply per-modality with independent provider pools.
- Chat (LLM):
tenzro_chat— covered above - Forecast:
tenzro_forecast— Chronos-2, Chronos-Bolt small/base, TimesFM 2.5, Granite-TTM-r2 - Vision:
tenzro_visionEmbed/tenzro_visionSimilarity/tenzro_visionClassify— CLIP, SigLIP2, DINOv3 / DINOv2 - Text embeddings:
tenzro_textEmbed— Qwen3-Embedding, EmbeddingGemma, BGE-M3, Snowflake Arctic - Segmentation:
tenzro_segment— SAM 3 / 3.1, SAM 2, EdgeSAM, MobileSAM - Detection:
tenzro_detect— RF-DETR (nano-2xl), D-FINE (n-x) - Audio (ASR):
tenzro_transcribe— Moonshine v2, Distil-Whisper, Whisper-Turbo, Parakeet-TDT, Canary
A node serves a non-LLM model with the same workflow as above — tenzro model download, tenzro model serve --advertise, and the registry handles routing. Multi-file bundles (encoder + decoder, or encoder + decoder + joiner for Parakeet) are downloaded atomically by HfArtifactDownloader in Bundle mode; single-file ONNX exports use SingleFile mode. Both verify SHA-256 per file and finalize via tmp-dir-rename.
License-tier gating. Models are tagged Permissive (default), Attribution (CC-BY-4.0 logged), CommercialCustom (DINOv3, SAM family, Gemma family — require --accept-license <id>), or NonCommercial (refused without --accept-non-commercial). The registry refuses to load license-gated models without the matching flag. Acknowledgments are recorded per-family in CF_MODELS so the gate clears across restarts.
For per-modality walkthroughs see: Forecast with Chronos-Bolt, Embed images with DINOv3, Embed text with Qwen3-Embedding, Transcribe with Whisper-Turbo, Segment with SAM 2, and Detect with RF-DETR.
Client-Side Usage (TypeScript)
Users consume your model using the TypeScript or Rust SDK. Here is how a client sends a chat completion request:
import { TenzroClient, TESTNET_CONFIG } from "tenzro-sdk";
const client = new TenzroClient(TESTNET_CONFIG);
// Request inference from a provider on the network
const response = await client.inference.request(
"qwen3.5-0.8b",
"Explain zero-knowledge proofs in one paragraph.",
256,
);
console.log(response.output);
console.log(`Tokens: ${response.tokens_used}`);
console.log(`Cost: ${response.cost} TNZO`);What You Learned
- Hardware detection — profiling your machine for optimal model selection
- Model downloading — HuggingFace GGUF downloads with SHA-256 verification
- the local inference server serving — GPU-offloaded inference with configurable context and threading
- Provider registration — staking TNZO and setting per-token pricing
- Network advertising — making endpoints discoverable via the model registry
- Payment verification — automatic MPP/x402 payment handling per request
- Monitoring — tracking request volume, latency, error rates, and revenue
Next Steps
- See the TEE Confidential Computing tutorial to serve models inside a secure enclave
- See the Join Testnet as Provider for the quickstart version
- Read the Run a Validator Node tutorial to also validate blocks