Tenzro Train.

Decentralized training splits cleanly into a Rust protocol layer (aggregation, sync rounds, on-chain commitments) and a Python reference trainer (PyTorch FSDP2 + Hivemind + safetensors). The same architectural split adopted by production decentralized-training protocols.

STATUS: Phase 1+ live
PROTOCOL CRATE: tenzro-training
PYTHON PKG: integrations/trainer
TIER: Open / Verified / Confidential

Two-layer architecture

tenzro-training (Rust) owns the protocol: OuterGradient, Fragment, SyncRound, aggregation rules (Mean / LoraAlternating / TrimmedMean / CoordinateMedian / Krum), the OuterOptimizer (Nesterov SGD), the syncer state machine, TrainingTaskSpec, TrainingReceipt, gossip topic handling, on-chain commitments, fraud-proof verification, RPC, and CLI. No tensor library lives in the Rust workspace.

integrations/trainer (Python) wraps PyTorch FSDP2 + Hivemind + safetensors. Per-modality inner training loops use the leading Python library: transformers for language, gluonts for timeseries, timm for vision. The trainer talks to the Rust syncer over JSON-RPC and the gossip topics.

Why Python for the inner loop

Production decentralized training runs in 2026 use Python + PyTorch for the inner training engine. Rust ML frameworks exist (Candle, Burn, tch-rs) but no production decentralized training project picks them — PyTorch's FSDP2/DTensor/torch.compile/Hivemind ecosystem and the long tail of per-architecture implementations (TimesFM, Chronos, Qwen, Gemma, Mistral, ViT, DINOv3, …) are irreplaceable for training. Rust shines at the protocol/orchestration layer.

Witness committee + idempotent finalize

Phase 2c runs training-side multi-syncer coordination as a k-of-N witness committee with idempotent on-chain finalize and a no-quorum-cert carry-forward. This follows the established production pattern for decentralized-training coordination.

SyncerState::finalize_round is idempotent: redundant submissions from concurrent witnesses for the same (round, state_root) return Ok. Conflicting state_roots return TrainingError::ConflictingFinalize for fork detection. When the committee cannot assemble a quorum within grace_window_ms, SyncerState::build_nec_sync_round produces a no-endorsement-cert sync round and the run advances to round+1 carrying forward the prior state_root.

Tier-gated aggregation

validate_aggregation_rule_for_tier() admits aggregation rules per trust tier:

Open — Mean and LoraAlternating (the alternating-freeze rule for LoRA/QLoRA adapter runs)
Verified — Mean, LoraAlternating, TrimmedMean, CoordinateMedian, Krum
Confidential — same set, with TEE-sealed shards

Confidential sealed shards

SealedDatasetManifest carries SealedShardEnvelope rows: shard_ciphertext_hash, shard_ciphertext_bytes, wrapped_data_key, wrap_alg = "hpke-x25519-hkdf-sha256-aes-256-gcm", enclave_pubkey, enclave_measurements_hex, created_at. validate_confidential_enrollment() enforces attestation ↔ enclave_pubkey ↔ enclave_measurements_hex parity at enroll time.

RPC + CLI surface

# JSON-RPC namespace: tenzro_training_*
postTask | listRuns | getRun | getReceipt
enrollTrainer | submitOuterGradient | finalizeRound
installSealedManifest | getSealedManifest

# CLI
tenzro train post-task ...
tenzro train enroll-trainer ...
tenzro train submit-gradient ...
tenzro train finalize-round ...
tenzro train install-sealed-manifest ...

Reference adapters

The Python reference trainer includes three per-modality adapters: timeseries (TimesFM-class 200M), language (Qwen 3 0.6B default — any catalog-member LM family is swappable via architecture.metadata.hf_repo: Qwen 2/3/3.5/3.6, Gemma 3/4, Mistral, Phi 3, DeepSeek V3, Granite, Granite-H), vision (timm ViT-B/16 default matching the inference-side DINOv3/SigLIP2/CLIP-B/16 family, swappable via architecture.metadata.timm_model).

Activation commitments (Open tier)

The Open tier has no TEE attestation; trust comes from stake bonding plus a TOPLOC-class commitment scheme. Every Open-tier OuterGradient carries an ActivationCommitment: the per-inner-step loss trajectory and the top-k probes (largest-magnitude coordinates) of the flattened fragment delta. The commitment hash is bound into the gradient's Ed25519 signature, so a trainer cannot swap the commitment after signing.

Verification runs in two layers. Structural, accept-time: the syncer validates fail-closed on submission — probe count, trajectory length equal to the task's inner_steps, finite values, descending-magnitude probe order with unique indices. A violation is slashable, because it requires deviating from the spec the trainer enrolled under. Fuzzy, challenge-time: a challenger re-executes the inner loop from the same checkpoint and shard, rebuilds the commitment, and compares with tolerance bands — bounded relative loss drift, bounded probe-index churn, bounded probe-value drift — sized for floating-point nondeterminism across GPU architectures and kernel schedules. A fabricated gradient lands far outside the bands; a failed challenge evicts and slashes.

RL post-training (GRPO)

TrainingTaskSpec.objective selects the inner loop: Supervised (default H-step SGD) or RlPostTraining — a GRPO loop with group_size, kl_coeff, clip_epsilon, max_new_tokens, temperature, and a sponsor-referenced reward callable (reward_ref = "py:<module>:<callable>"). Per step the trainer samples a rollout group from one shard prompt, scores it with the reward, computes group-relative advantages, and takes one optimizer step on the clipped surrogate with a k3 KL penalty against the sampling-time policy. No value model, no frozen reference copy — the same pattern as prime-rl and TRL's GRPOTrainer.

RL admits Language modality only; hyperparameters are validated at tenzro_training_postTask. The outer-gradient contract is unchanged — fragment partitioning, quantization, Open-tier activation commitments, and submission work verbatim on the RL delta.

← All docs