Become an AI Model Provider

AI ModelsIntermediate30 min

Become an AI model provider on the Tenzro Network. Download a GGUF model from HuggingFace, serve it with the local inference server, register as a provider, set per-token pricing, and earn TNZO for every inference request you handle. This tutorial covers the full provider lifecycle from hardware detection to revenue monitoring.

What You'll Do

Detect hardware capabilities and get optimal model recommendations
Download a GGUF model from HuggingFace with SHA-256 verification
Serve the model locally via the local inference server with GPU offloading
Register as a provider and stake TNZO
Set per-token pricing (input and output rates)
Advertise the endpoint to the network
Monitor health, load, and earnings

Step 1: Check Hardware Profile

Start by checking your hardware capabilities. The CLI detects CPU, GPU, memory, and TEE support, then recommends the largest model you can serve efficiently:

# Check hardware profile for optimal configuration
tenzro hardware

# Output:
# Hardware Profile:
#   CPU:    Apple M2 Max (12 cores)
#   Memory: 64 GB
#   GPU:    Apple M2 Max (38 cores, 64 GB shared)
#   Disk:   1.8 TB available
#   TEE:    None detected (simulation mode)
#
# Recommended configuration:
#   Max model size:   30B parameters (Q4_K_M)
#   GPU layers:       all (unified memory)
#   Context length:   8192
#   Concurrent reqs:  4

Step 2: Download a Model

Download a GGUF-quantized model from HuggingFace. The CLI uses the hf-hub crate and verifies the download with SHA-256 hashing:

# Download a GGUF model from HuggingFace
tenzro model download unsloth/Qwen3.5-0.8B-GGUF --quantization Q4_K_M

# Expected output:
# Downloading Qwen3.5-0.8B-GGUF (Q4_K_M)...
#   Source: huggingface.co/unsloth/Qwen3.5-0.8B-GGUF
#   File: qwen3.5-0.8b.Q4_K_M.gguf (508 MB)
#   Progress: [========================================] 100%
#   SHA-256: a1b2c3d4...ef56 (verified)
#   Saved to: ~/.tenzro/models/qwen3.5-0.8b.Q4_K_M.gguf

# List locally downloaded models
tenzro model list --local

# Output:
# Local Models:
#   qwen3.5-0.8b.Q4_K_M       508 MB    verified
#   gemma3-270m.Q4_K_M         241 MB    verified

Step 3: Serve the Model

Start serving the model with the local inference server. The --gpu-layers flag controls how many transformer layers are offloaded to the GPU:

# Serve the model locally
tenzro model serve qwen3.5-0.8b.Q4_K_M \
  --port 8081 \
  --context-length 4096 \
  --threads 8 \
  --gpu-layers 35

# Output:
# Starting inference server...
#   Model: qwen3.5-0.8b.Q4_K_M
#   Port: 8081
#   Context: 4096 tokens
#   GPU layers: 35/35 offloaded
#   Status: ready
#   Endpoint: http://localhost:8081/v1/chat/completions

Step 4: Register as a Provider

Register on the network and stake TNZO. The stake serves as a quality guarantee — providers with higher stakes receive more inference traffic:

# Register as a model provider on the network
tenzro provider register \
  --role model-provider \
  --stake 1000

# Output:
# Provider registered successfully
#   Provider ID:  prov-7f3a9c1e
#   Role:         ModelProvider
#   Stake:        1000 TNZO
#   Status:       Active

Step 5: Set Pricing

Set per-token pricing for your model. Input tokens and output tokens have separate rates since generation is more compute-intensive than prompt processing:

# Set pricing for your model (TNZO per 1000 tokens)
tenzro provider pricing set \
  --model qwen3.5-0.8b \
  --input-price 0.05 \
  --output-price 0.15

# Output:
# Pricing updated:
#   Model: qwen3.5-0.8b
#   Input:  0.05 TNZO / 1K tokens
#   Output: 0.15 TNZO / 1K tokens

# Show current pricing
tenzro provider pricing show

# Output:
# Provider Pricing:
#   qwen3.5-0.8b
#     Input:  0.05 TNZO / 1K tokens
#     Output: 0.15 TNZO / 1K tokens

Step 6: Advertise to the Network

Use the --advertise flag to register your endpoint with the model registry. This makes your model discoverable by users and agents:

# Serve the model on the network (registers endpoint with registry)
tenzro model serve qwen3.5-0.8b.Q4_K_M \
  --port 8081 \
  --context-length 4096 \
  --threads 8 \
  --gpu-layers 35 \
  --advertise  # makes endpoint visible to the network

# Output:
# Model endpoint registered on network
#   Model: qwen3.5-0.8b
#   Endpoint: https://your-ip:8081/v1/chat/completions
#   MCP endpoint: https://your-ip:8081/mcp
#   Status: serving

Step 7: Set Availability Schedule

Configure when your provider is active. Useful for home setups where you want to serve during off-peak hours:

# Set provider availability schedule
tenzro schedule set \
  --weekdays 09:00-21:00 \
  --weekends 10:00-18:00

# Output:
# Schedule updated:
#   Mon-Fri: 09:00 - 21:00 UTC
#   Sat-Sun: 10:00 - 18:00 UTC

# Enable/disable serving
tenzro schedule enable
tenzro schedule disable

Step 8: Handle Inference Requests (Rust SDK)

The full programmatic flow for registering, pricing, and serving. The node handles request routing, payment verification, and settlement automatically:

use tenzro_sdk::{TenzroClient, config::SdkConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = SdkConfig::testnet();
    let client = TenzroClient::connect(config).await?;

    // Register as provider programmatically
    let provider = client.provider();
    let tx_hash = provider.register(
        vec!["qwen3.5-0.8b".to_string()],
        1000
    ).await?;
    println!("Provider registration tx: {}", tx_hash);

    // Serve model and handle requests
    provider.serve_model("qwen3.5-0.8b.Q4_K_M").await?;

    // The node handles incoming inference requests automatically.
    // Each request triggers:
    //   1. Payment verification (MPP or x402)
    //   2. Inference execution via the local inference server
    //   3. Token counting and billing
    //   4. Settlement on-chain

    println!("Serving model. Press Ctrl+C to stop.");
    tokio::signal::ctrl_c().await?;

    // Stop serving
    provider.stop_model("qwen3.5-0.8b").await?;
    Ok(())
}

Payment flow. When a user sends an inference request, the node verifies payment via MPP (session-based streaming) or x402 (stateless one-shot). Tokens are counted after generation, the cost is calculated at your pricing rate, and settlement happens on-chain. The 0.5% network fee is deducted automatically.

Step 9: Test with Chat

# Test your model with the interactive chat
tenzro chat --model qwen3.5-0.8b

# Or send a single message via CLI
tenzro model inference \
  --model qwen3.5-0.8b \
  --prompt "What is the Tenzro Network?"

# Output:
# The Tenzro Network is a decentralized protocol designed for the AI age...
#
# Usage: 12 input tokens, 87 output tokens
# Cost:  0.0139 TNZO

Step 10: Monitor Health and Earnings

# Check provider status and earnings
tenzro provider status

# Output:
# Provider Status:
#   Provider ID:     prov-7f3a9c1e
#   Role:            ModelProvider
#   Stake:           1000 TNZO
#   Status:          Active
#   Uptime:          4h 23m
#   Models Serving:  1
#
#   Model: qwen3.5-0.8b
#     Total Requests:   847
#     Input Tokens:     1,234,567
#     Output Tokens:    456,789
#     Revenue:          91.35 TNZO
#     Avg Latency:      234ms
#     Error Rate:       0.2%

# Check endpoint health
tenzro model endpoints

# Output:
# Model Endpoints:
#   qwen3.5-0.8b
#     API:    https://your-ip:8081/v1/chat/completions  [healthy]
#     MCP:    https://your-ip:8081/mcp                   [healthy]
#     Load:   43% (847 req/4h)

Beyond LLMs: Serving Multi-Modal Models

The same provider node can serve seven modalities, not just chat. Each modality has its own ONNX runtime, its own catalog, and its own RPC method. The InferenceRouter reads the model's modalityfield from the registry and dispatches incoming requests to the correct runtime — pricing, latency, and reputation strategies apply per-modality with independent provider pools.

Chat (LLM): tenzro_chat — covered above
Forecast: tenzro_forecast — Chronos-2, Chronos-Bolt small/base, TimesFM 2.5, Granite-TTM-r2
Vision: tenzro_visionEmbed / tenzro_visionSimilarity / tenzro_visionClassify — CLIP, SigLIP2, DINOv3 / DINOv2
Text embeddings: tenzro_textEmbed — Qwen3-Embedding, EmbeddingGemma, BGE-M3, Snowflake Arctic
Segmentation: tenzro_segment — SAM 3 / 3.1, SAM 2, EdgeSAM, MobileSAM
Detection: tenzro_detect — RF-DETR (nano-2xl), D-FINE (n-x)
Audio (ASR): tenzro_transcribe — Moonshine v2, Distil-Whisper, Whisper-Turbo, Parakeet-TDT, Canary

A node serves a non-LLM model with the same workflow as above — tenzro model download, tenzro model serve --advertise, and the registry handles routing. Multi-file bundles (encoder + decoder, or encoder + decoder + joiner for Parakeet) are downloaded atomically by HfArtifactDownloader in Bundle mode; single-file ONNX exports use SingleFile mode. Both verify SHA-256 per file and finalize via tmp-dir-rename.

License-tier gating. Models are tagged Permissive (default), Attribution (CC-BY-4.0 logged), CommercialCustom (DINOv3, SAM family, Gemma family — require --accept-license <id>), or NonCommercial (refused without --accept-non-commercial). The registry refuses to load license-gated models without the matching flag. Acknowledgments are recorded per-family in CF_MODELS so the gate clears across restarts.

For per-modality walkthroughs see: Forecast with Chronos-Bolt, Embed images with DINOv3, Embed text with Qwen3-Embedding, Transcribe with Whisper-Turbo, Segment with SAM 2, and Detect with RF-DETR.

Client-Side Usage (TypeScript)

Users consume your model using the TypeScript or Rust SDK. Here is how a client sends a chat completion request:

import { TenzroClient, TESTNET_CONFIG } from "tenzro-sdk";

const client = new TenzroClient(TESTNET_CONFIG);

// Request inference from a provider on the network
const response = await client.inference.request(
  "qwen3.5-0.8b",
  "Explain zero-knowledge proofs in one paragraph.",
  256,
);

console.log(response.output);
console.log(`Tokens: ${response.tokens_used}`);
console.log(`Cost: ${response.cost} TNZO`);

What You Learned

Hardware detection — profiling your machine for optimal model selection
Model downloading — HuggingFace GGUF downloads with SHA-256 verification
the local inference server serving — GPU-offloaded inference with configurable context and threading
Provider registration — staking TNZO and setting per-token pricing
Network advertising — making endpoints discoverable via the model registry
Payment verification — automatic MPP/x402 payment handling per request
Monitoring — tracking request volume, latency, error rates, and revenue

Next Steps

See the TEE Confidential Computing tutorial to serve models inside a secure enclave
See the Join Testnet as Provider for the quickstart version
Read the Run a Validator Node tutorial to also validate blocks

← Cross-Chain DeFi App Next: TEE Confidential Computing →