Inference
The inference system provides intelligent request routing to AI model providers. It implements multiple routing strategies (price, latency, reputation, weighted), handles failover, calculates costs, and integrates with TEE attestation for verifiable inference results.
Inference Request Lifecycle
An inference request flows through the Tenzro Network from user to model provider and back. The router handles provider selection, request forwarding, result verification, and settlement.
The request includes the model ID, input text, optional parameters (temperature, max_tokens, etc.), requester address, and timestamp. The response contains the model output, token counts, cost, and optional attestation proof.
Routing Strategies
The inference router supports four routing strategies, each optimizing for different objectives. Users can specify their preferred strategy per request.
Cheapest Strategy
Selects the provider with the lowest total cost (input tokens + output tokens). Ideal for batch processing and non-latency-sensitive workloads. Filters active providers, calculates estimated cost for each, and selects the minimum.
Lowest Latency Strategy
Selects the provider with the lowest average latency based on historical metrics. Ideal for interactive applications like chatbots. Tracks average response time per provider and routes to the fastest.
Highest Reputation Strategy
Selects the provider with the highest reputation score. Reputation is calculated from success rate, uptime, and user feedback. Ideal for critical applications requiring maximum reliability.
Weighted Strategy
Balances price, latency, and reputation using configurable weights. This is the most flexible strategy for production workloads. Each factor is normalized to 0-1 range and combined using weighted sum.
Cost Calculation
Inference costs are calculated based on input and output token counts. Providers set per-token pricing in TNZO, and the router calculates the total cost before forwarding the request.
Cost estimation uses historical token counts for the model. The actual cost is calculated after inference based on the real token counts. Any overpayment is refunded via micropayment channels.
Payment and Settlement
Inference payments use micropayment channels for per-token billing. Users open a channel with prepaid TNZO, and the provider deducts costs for each inference request.
Micropayment channels enable high-frequency inference requests without on-chain transaction overhead. Channels are settled on-chain when closed or when balance runs low.
TEE-Attested Inference
Providers running in Trusted Execution Environments (TEE) can attest their inference results. This provides cryptographic proof that the output was computed securely without tampering.
TEE attestation binds the inference output to the specific model and provider. Users can verify that the output came from the claimed model running in a secure enclave, preventing model substitution attacks.
Zero-Knowledge Proof Verification
In addition to TEE attestation, providers can generate zero-knowledge proofs of correct inference execution. This enables verification without revealing the model weights.
ZK proofs enable verification without trusting the provider hardware. The proof confirms that the output was computed using the registered model weights without revealing those weights.
Failover and Retry Logic
The inference router implements automatic failover when a provider is unavailable. If the primary provider fails, the router selects a backup provider using the same strategy.
Failover integrates with the circuit breaker pattern. Providers that fail consecutively are temporarily removed from the routing pool, preventing cascading failures.
Gossipsub Inference Protocol
Inference requests can be broadcast via the tenzro/inference/1.0.0 gossipsub topic. Multiple providers can compete to fulfill the request, with the fastest response winning.
Gossipsub-based inference enables decentralized request routing without a central router. Providers compete on speed and price, with the requester accepting the first valid response.
RPC and MCP Integration
The tenzro-node RPC server exposes inference routing via JSON-RPC. The MCP server on port 3001 also provides inference capabilities for AI agents.
The desktop app uses these RPC methods to provide a ChatGPT-like interface. The Inference page displays available models, routing strategy selection, and inference history.
Chat Interface
The desktop app and web interface provide a chat-style UI for multi-turn conversations. The chat history is maintained client-side and included in subsequent inference requests.
Chat history management is client-side to minimize costs. Only the necessary context is included in each request, and older turns are pruned when the context window fills.
Inference History and Analytics
The wallet and desktop app track inference history for cost analysis and usage monitoring. Users can view total spend, token counts, and provider distribution.
Analytics help users optimize their inference costs by identifying expensive models, high-volume periods, and routing strategy effectiveness.