Tenzro Testnet is live. Get testnet TNZO

Inference

The inference system provides intelligent request routing to AI model providers. It implements multiple routing strategies (price, latency, reputation, weighted), handles failover, calculates costs, and integrates with TEE attestation for verifiable inference results.

Inference Request Lifecycle

An inference request flows through the Tenzro Network from user to model provider and back. The router handles provider selection, request forwarding, result verification, and settlement.

// Example inference request (conceptual) { "model_id": "llama-3-70b", "input": "Explain quantum computing", "parameters": { "max_tokens": 500, "temperature": 0.7 }, "strategy": "HighestReputation" } // Response includes: // - output: generated text // - input_tokens: 3 // - output_tokens: 487 // - cost: 0.0525 TNZO // - provider_id: provider address // - attestation: optional TEE proof

The request includes the model ID, input text, optional parameters (temperature, max_tokens, etc.), requester address, and timestamp. The response contains the model output, token counts, cost, and optional attestation proof.

Routing Strategies

The inference router supports four routing strategies, each optimizing for different objectives. Users can specify their preferred strategy per request.

Routing Strategies: - Cheapest: Minimize cost - LowestLatency: Minimize response time - HighestReputation: Maximize quality - Weighted: Balance all factors Examples: strategy: "Cheapest" → selects lowest-cost provider strategy: "LowestLatency" → selects fastest provider strategy: "HighestReputation" → selects most reliable provider strategy: "Weighted" → balances price, latency, and reputation

Cheapest Strategy

Selects the provider with the lowest total cost (input tokens + output tokens). Ideal for batch processing and non-latency-sensitive workloads. Filters active providers, calculates estimated cost for each, and selects the minimum.

Lowest Latency Strategy

Selects the provider with the lowest average latency based on historical metrics. Ideal for interactive applications like chatbots. Tracks average response time per provider and routes to the fastest.

Highest Reputation Strategy

Selects the provider with the highest reputation score. Reputation is calculated from success rate, uptime, and user feedback. Ideal for critical applications requiring maximum reliability.

Weighted Strategy

Balances price, latency, and reputation using configurable weights. This is the most flexible strategy for production workloads. Each factor is normalized to 0-1 range and combined using weighted sum.

// Weighted strategy example { "weights": { "price": 0.2, // 20% weight on cost "latency": 0.6, // 60% weight on speed "reputation": 0.2 // 20% weight on reliability } } // Each provider gets a score: // score = (price_weight × price_score) + // (latency_weight × latency_score) + // (reputation_weight × reputation_score)

Cost Calculation

Inference costs are calculated based on input and output token counts. Providers set per-token pricing in TNZO, and the router calculates the total cost before forwarding the request.

// Provider pricing example Input price: 15 TNZO per 1M tokens Output price: 60 TNZO per 1M tokens // Cost calculation for 1500 input, 500 output tokens: Input cost: (1500 / 1,000,000) × 15 TNZO = 0.0225 TNZO Output cost: (500 / 1,000,000) × 60 TNZO = 0.03 TNZO Total cost: 0.0525 TNZO

Cost estimation uses historical token counts for the model. The actual cost is calculated after inference based on the real token counts. Any overpayment is refunded via micropayment channels.

Payment and Settlement

Inference payments use micropayment channels for per-token billing. Users open a channel with prepaid TNZO, and the provider deducts costs for each inference request.

// Micropayment channel example 1. Open channel with 100 TNZO prepayment 2. Each inference deducts from channel balance (e.g., 0.0525 TNZO) 3. Check remaining balance: 99.9475 TNZO 4. Close channel and settle final balance on-chain Benefits: - No on-chain transaction per inference - Low-latency payments - Batch settlement when closing channel

Micropayment channels enable high-frequency inference requests without on-chain transaction overhead. Channels are settled on-chain when closed or when balance runs low.

TEE-Attested Inference

Providers running in Trusted Execution Environments (TEE) can attest their inference results. This provides cryptographic proof that the output was computed securely without tampering.

// TEE-attested inference flow 1. User requests inference with require_tee: true 2. Provider runs model in trusted execution environment 3. TEE generates cryptographic attestation binding input→output 4. Response includes attestation proof 5. User verifies attestation to confirm secure execution Attestation proves: - Inference ran in genuine TEE hardware - Output was computed from stated input - No tampering with model or execution

TEE attestation binds the inference output to the specific model and provider. Users can verify that the output came from the claimed model running in a secure enclave, preventing model substitution attacks.

Zero-Knowledge Proof Verification

In addition to TEE attestation, providers can generate zero-knowledge proofs of correct inference execution. This enables verification without revealing the model weights.

// Zero-knowledge proof flow 1. Provider generates ZK proof of inference: proof(model_hash, input_hash, output_hash) 2. Proof confirms output was computed using registered model 3. Model weights remain private (zero-knowledge property) 4. User verifies proof without trusting provider hardware Benefits over TEE: - No hardware trust required - Cryptographic verification - Model weights never revealed

ZK proofs enable verification without trusting the provider hardware. The proof confirms that the output was computed using the registered model weights without revealing those weights.

Failover and Retry Logic

The inference router implements automatic failover when a provider is unavailable. If the primary provider fails, the router selects a backup provider using the same strategy.

// Failover configuration max_retries: 3 timeout: 30 seconds failover_enabled: true // Automatic failover sequence: 1. Try primary provider (selected by routing strategy) 2. On failure, mark provider temporarily unavailable (circuit breaker) 3. Select backup provider using same strategy 4. Retry up to max_retries 5. Return error if all providers fail Circuit breaker prevents cascading failures

Failover integrates with the circuit breaker pattern. Providers that fail consecutively are temporarily removed from the routing pool, preventing cascading failures.

Gossipsub Inference Protocol

Inference requests can be broadcast via the tenzro/inference/1.0.0 gossipsub topic. Multiple providers can compete to fulfill the request, with the fastest response winning.

// Broadcast inference request to network { "type": "inference_request", "request_id": "req_123", "model_id": "0x1234...", "input": "Summarize this document: ...", "max_tokens": 200, "max_cost": 100000000000000000, // 0.1 TNZO "requester": "0x5678...", "timestamp": 1711234567 } // Provider responds via gossipsub { "type": "inference_response", "request_id": "req_123", "output": "This document describes...", "cost": 75000000000000000, // 0.075 TNZO "provider": "0xabcd...", "attestation": "0x...", "timestamp": 1711234570 }

Gossipsub-based inference enables decentralized request routing without a central router. Providers compete on speed and price, with the requester accepting the first valid response.

RPC and MCP Integration

The tenzro-node RPC server exposes inference routing via JSON-RPC. The MCP server on port 3001 also provides inference capabilities for AI agents.

// JSON-RPC inference request curl -X POST https://rpc.tenzro.network \ -H "Content-Type: application/json" \ -d '{ "jsonrpc": "2.0", "method": "tenzro_inferenceRequest", "params": [{ "model_id": "0x1234...", "input": "What is the capital of France?", "strategy": "Cheapest" }], "id": 1 }' // Response { "jsonrpc": "2.0", "result": { "output": "The capital of France is Paris.", "input_tokens": 8, "output_tokens": 8, "cost": "0.000024", "provider_id": "0x5678..." }, "id": 1 }

The desktop app uses these RPC methods to provide a ChatGPT-like interface. The Inference page displays available models, routing strategy selection, and inference history.

Chat Interface

The desktop app and web interface provide a chat-style UI for multi-turn conversations. The chat history is maintained client-side and included in subsequent inference requests.

// Multi-turn conversation example Turn 1: Input: "What is quantum computing?" Output: "Quantum computing uses quantum bits (qubits)..." Turn 2 (includes history): Input: "User: What is quantum computing? Assistant: Quantum computing uses quantum bits... User: Give an example" Output: "For example, Shor's algorithm can factor large numbers..." History is client-side to minimize costs. Only necessary context included.

Chat history management is client-side to minimize costs. Only the necessary context is included in each request, and older turns are pruned when the context window fills.

Inference History and Analytics

The wallet and desktop app track inference history for cost analysis and usage monitoring. Users can view total spend, token counts, and provider distribution.

// Query inference history curl -X POST https://rpc.tenzro.network \ -H "Content-Type: application/json" \ -d '{ "jsonrpc": "2.0", "method": "tenzro_getInferenceHistory", "params": ["0x5678..."], // user address "id": 1 }' // Response { "jsonrpc": "2.0", "result": [ { "timestamp": 1711234567, "model_id": "0x1234...", "provider_id": "0xabcd...", "input_tokens": 1500, "output_tokens": 500, "cost": "0.0525", "strategy": "Cheapest" } ], "id": 1 }

Analytics help users optimize their inference costs by identifying expensive models, high-volume periods, and routing strategy effectiveness.