Run a Tenzro Train trainer node

This tutorial walks the trainer flow end to end: install the Python reference trainer, enroll into a posted run, run inner SGD steps locally, and submit outer gradients to the syncer. Phase 1 is timeseries-first on the Open tier — any GPU (or even CPU for small fragments) works; no TEE required.

Architecture: Rust protocol + Python trainer

Tenzro Train splits cleanly into two layers. The Rust protocol crate (tenzro-training) handles coordination, aggregation, and on-chain settlement. The Python reference trainer (integrations/trainer/) wraps PyTorch FSDP2 + Hivemind + safetensors and owns the inner training loop. Communication is JSON-RPC plus the gossip topic tenzro/training/1.0.0.

You run both: the Rust node binds to the network and routes RPC calls; the Python trainer talks to the node and runs the actual SGD steps on your GPU.

Prerequisites

A registered did:tenzro:machine:... identity (auto-created when you join the network).
Enough TNZO to post the trainer stake (example: 100 TNZO).
A GPU with 16+ GB of VRAM for the TimesFM 200M reference run. Smaller models work on consumer hardware.
Python 3.11+, Rust toolchain (for building the node), CUDA 12+.

1. Install the Python reference trainer

# Clone the repo and install the reference trainer
git clone https://github.com/tenzro/tenzro-network.git
cd tenzro-network/integrations/trainer

# Python 3.11+ recommended; CUDA-enabled PyTorch for GPU runs
pip install -e .[timeseries]

# Verify the trainer can talk to the local node
tenzro-trainer --version

The [timeseries] extra pulls in PyTorch, Hivemind, safetensors, and the TimesFM / Chronos-Bolt / Granite-TTM adapters from the Phase 1 model catalog.

2. Start a Tenzro node

The Python trainer dispatches every gradient through tenzro_training_*RPC. Run a local node so you control the keypair signing those calls. A light-client role is sufficient — you don't need to validate consensus.

# In a separate terminal — start a Tenzro node so the trainer
# has somewhere to send gradients and receive aggregation results.
cargo run --release --bin tenzro-node -- \
  --role light-client \
  --data-dir /var/lib/tenzro \
  --listen-addr /ip4/0.0.0.0/tcp/9000 \
  --rpc-addr 127.0.0.1:8545 \
  --boot-nodes /dnsaddr/rpc.tenzro.network/tcp/9000

3. Find a run to join

List runs that are still in awaiting_enrollment and pick one whose architecture matches your hardware. Confirm the tier is Open (Phase 1 only ships Open) and that any min_throughput requirement is achievable on your GPU.

# List runs that are awaiting trainers
tenzro train list-runs --rpc http://127.0.0.1:8545

# Get full task spec for one — confirm tier=Open, modality=Timeseries,
# and that you have hardware that meets min_throughput (if set).
tenzro train get-run \
  --task-id train-2026-04-25-timesfm-200m \
  --rpc http://127.0.0.1:8545 \
  --format json | jq '.task_spec.architecture, .task_spec.min_throughput'

4. Enroll

Enrollment posts a stake bond, registers your DID with the syncer, and receives shard assignments. Stake is slashed if you submit malicious gradients (caught by the aggregation rule) or if you go offline for too many consecutive rounds.

tenzro train enroll-trainer \
  --task-id train-2026-04-25-timesfm-200m \
  --did did:tenzro:machine:abc123... \
  --stake 100000000000000000000 \
  --rpc http://127.0.0.1:8545

# Returns:
# {
#   "task_id": "train-2026-04-25-timesfm-200m",
#   "trainer_did": "did:tenzro:machine:abc123...",
#   "shard_assignments": [0, 3, 7],
#   "status": "enrolled"
# }

shard_assignmentstells you which fragments you're responsible for. The syncer balances assignments across enrolled trainers and adjusts on eviction or grace-window timeouts.

5. Configure the trainer

The trainer config wires together your DID, the local node, the task ID, and the inner-loop hyperparameters. Inner-loop values must match the task spec — mismatched inner_steps will produce gradients the syncer rejects.

# trainer.yaml
node_rpc: "http://127.0.0.1:8545"
trainer_did: "did:tenzro:machine:abc123..."
trainer_keypair: "/etc/tenzro/trainer.ed25519"
task_id: "train-2026-04-25-timesfm-200m"

# Inner training loop (Phase 1 reference)
inner:
  optimizer: "adamw"
  lr: 3.0e-4
  weight_decay: 0.01
  beta1: 0.9
  beta2: 0.95
  inner_steps: 24                # H — must match task spec
  batch_size: 64

# Hardware
device: "cuda:0"
dtype: "bf16"
gradient_checkpointing: true

# Networking
gossip_listen: "/ip4/0.0.0.0/tcp/9100"
hivemind_initial_peers: []        # populated from syncer announce

6. Run the trainer

The reference trainer enters a continuous loop: pull weights, run H inner steps, submit outer gradient, wait for syncer aggregation, repeat. It writes structured logs and exposes a Prometheus endpoint at :9101 by default.

# Start the inner training loop. The trainer:
#   1. Pulls the latest aggregated weights from the syncer
#   2. Runs H=24 inner SGD steps on its assigned shards
#   3. Computes the outer gradient delta
#   4. Submits to tenzro_training_submitOuterGradient
#   5. Repeats until the run finishes or it's evicted
tenzro-trainer run --config trainer.yaml

If you want to drive submission yourself (custom inner loop, alternate framework, or testing), here's the underlying RPC call:

# If you're driving submission from your own code, the underlying
# RPC is straightforward — gradients are submitted per-fragment, per-round.
curl http://127.0.0.1:8545 \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tenzro_training_submitOuterGradient",
    "params": {
      "task_id": "train-2026-04-25-timesfm-200m",
      "trainer_did": "did:tenzro:machine:abc123...",
      "round": 12,
      "fragment_index": 3,
      "gradient_payload": "<base64 safetensors>",
      "payload_hash": "0x...",
      "signature": "0x..."
    }
  }' | jq

7. Watch your acceptance rate

Submitted gradients can be rejected for three reasons: missed grace window τ (your GPU is too slow), aggregation filtering (Phase 2 only — Mean accepts everything), or signature/format errors. Above ~95% acceptance is healthy.

# Watch your acceptance rate. Below 50% across many rounds is a
# signal that your hardware is too slow for the grace window τ, or
# that your gradients are being filtered by aggregation.
tenzro train get-run \
  --task-id train-2026-04-25-timesfm-200m \
  --rpc http://127.0.0.1:8545 \
  --format json | jq '.trainer_metrics["did:tenzro:machine:abc123..."]'

# {
#   "submitted": 384,
#   "accepted": 376,
#   "missed_rounds": 2,
#   "current_reward": "84150000000000000000"   // attoTNZO accrued
# }

Rewards

Reward share is proportional to the count of accepted outer gradients across all rounds, weighted by fragment size. Final payouts are atomic — when the run seals, the syncer disburses reward_pool across enrolled trainers and refunds your stake (minus any slashing penalties).

Next steps

Post a training task — the sponsor side: write the spec, escrow TNZO, fetch the receipt.
Tenzro Train docs — full RPC reference, trust tiers, aggregation rules.
Tenzro Train whitepaper — Decoupled DiLoCo design, fragment-wise quorum aggregation, fraud-proof challenges.