Tenzro Testnet is live. Get testnet TNZO

Streaming Inference

Tenzro supports real-time streaming inference where AI model responses are delivered token-by-token via Server-Sent Events (SSE). This enables responsive chat interfaces and real-time AI applications. Streaming is paired with micropayment channels so providers are paid per-token as the response is generated — no upfront payment required.

SSE Streaming Flow

TypeScript SSE Client

import { TenzroClient } from "@tenzro/sdk";

const client = new TenzroClient("https://rpc.tenzro.network");

// Stream inference response token by token
const stream = await client.inference.stream({
  model: "qwen3.5-0.8b",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Write a poem about decentralized AI." },
  ],
  maxTokens: 500,
  temperature: 0.7,
});

// Process tokens as they arrive
for await (const chunk of stream) {
  if (chunk.type === "token") {
    process.stdout.write(chunk.content);
  } else if (chunk.type === "done") {
    console.log("\n\nTokens used:", chunk.usage.totalTokens);
    console.log("Cost:", chunk.usage.cost, "TNZO");
  } else if (chunk.type === "error") {
    console.error("Stream error:", chunk.message);
  }
}

curl Example

# Stream inference via RPC
curl -N -X POST https://rpc.tenzro.network \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tenzro_chat",
    "params": [{
      "model": "qwen3.5-0.8b",
      "messages": [
        {"role": "user", "content": "Hello, Tenzro!"}
      ],
      "stream": true,
      "max_tokens": 200
    }],
    "id": 1
  }'

# Response is streamed as SSE events:
# data: {"token": "Hello"}
# data: {"token": "!"}
# data: {"token": " How"}
# data: {"token": " can"}
# data: {"token": " I"}
# data: {"token": " help"}
# data: [DONE]

Rust Streaming Client

use tenzro_sdk::TenzroClient;
use futures::StreamExt;

let client = TenzroClient::new("https://rpc.tenzro.network")?;

// Create a streaming inference request
let mut stream = client.inference().stream(
    "qwen3.5-0.8b",
    vec![
        ChatMessage::system("You are a helpful assistant."),
        ChatMessage::user("Explain zero-knowledge proofs."),
    ],
).await?;

// Process tokens as they arrive
while let Some(event) = stream.next().await {
    match event? {
        StreamEvent::Token(text) => print!("{}", text),
        StreamEvent::Done(usage) => {
            println!("\nTokens: {}", usage.total_tokens);
            println!("Cost: {} TNZO", usage.cost);
        }
        StreamEvent::Error(e) => eprintln!("Error: {}", e),
    }
}

Per-Token Billing

Streaming inference is billed per-token through micropayment channels. A channel is opened at the start of the stream and settled when the stream completes. This means you only pay for tokens actually generated — if you cancel the stream early, you only pay for what was delivered.

// Open a micropayment channel for streaming
const channel = await client.settlement.openChannel({
  provider: "0xModelProvider...",
  deposit: "1000000000000000000", // 1 TNZO max budget
  tokenRate: "100000000000000",   // 0.0001 TNZO per token
});

// Stream with per-token billing
const stream = await client.inference.stream({
  model: "qwen3.5-0.8b",
  messages: [{ role: "user", content: "Write a long essay." }],
  channel: channel.id,
});

// Cancel early if needed — only charged for tokens received
await stream.cancel();

// Close channel and settle remaining balance
const receipt = await client.settlement.closeChannel(channel.id);
console.log("Total charged:", receipt.totalCharged, "TNZO");
console.log("Refunded:", receipt.refunded, "TNZO");

A2A Streaming

The A2A protocol server also supports streaming task updates via SSE at the /a2a/stream endpoint:

# Stream agent task updates via A2A protocol
curl -N -X POST https://a2a.tenzro.network/a2a/stream \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tasks/send",
    "params": {
      "task": {
        "skill": "inference",
        "input": "Analyze this dataset"
      }
    },
    "id": 1
  }'

Related Documentation

Inference — Non-streaming inference requests
Micropayments — Channel-based per-token billing
A2A Protocol — Agent-to-agent streaming
Models — Available models and pricing