Streaming Inference
Tenzro supports real-time streaming inference where AI model responses are delivered token-by-token via Server-Sent Events (SSE). This enables responsive chat interfaces and real-time AI applications. Streaming is paired with micropayment channels so providers are paid per-token as the response is generated — no upfront payment required.
SSE Streaming Flow
TypeScript SSE Client
import { TenzroClient } from "@tenzro/sdk";
const client = new TenzroClient("https://rpc.tenzro.network");
// Stream inference response token by token
const stream = await client.inference.stream({
model: "qwen3.5-0.8b",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Write a poem about decentralized AI." },
],
maxTokens: 500,
temperature: 0.7,
});
// Process tokens as they arrive
for await (const chunk of stream) {
if (chunk.type === "token") {
process.stdout.write(chunk.content);
} else if (chunk.type === "done") {
console.log("\n\nTokens used:", chunk.usage.totalTokens);
console.log("Cost:", chunk.usage.cost, "TNZO");
} else if (chunk.type === "error") {
console.error("Stream error:", chunk.message);
}
}curl Example
# Stream inference via RPC
curl -N -X POST https://rpc.tenzro.network \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "tenzro_chat",
"params": [{
"model": "qwen3.5-0.8b",
"messages": [
{"role": "user", "content": "Hello, Tenzro!"}
],
"stream": true,
"max_tokens": 200
}],
"id": 1
}'
# Response is streamed as SSE events:
# data: {"token": "Hello"}
# data: {"token": "!"}
# data: {"token": " How"}
# data: {"token": " can"}
# data: {"token": " I"}
# data: {"token": " help"}
# data: [DONE]Rust Streaming Client
use tenzro_sdk::TenzroClient;
use futures::StreamExt;
let client = TenzroClient::new("https://rpc.tenzro.network")?;
// Create a streaming inference request
let mut stream = client.inference().stream(
"qwen3.5-0.8b",
vec![
ChatMessage::system("You are a helpful assistant."),
ChatMessage::user("Explain zero-knowledge proofs."),
],
).await?;
// Process tokens as they arrive
while let Some(event) = stream.next().await {
match event? {
StreamEvent::Token(text) => print!("{}", text),
StreamEvent::Done(usage) => {
println!("\nTokens: {}", usage.total_tokens);
println!("Cost: {} TNZO", usage.cost);
}
StreamEvent::Error(e) => eprintln!("Error: {}", e),
}
}Per-Token Billing
Streaming inference is billed per-token through micropayment channels. A channel is opened at the start of the stream and settled when the stream completes. This means you only pay for tokens actually generated — if you cancel the stream early, you only pay for what was delivered.
// Open a micropayment channel for streaming
const channel = await client.settlement.openChannel({
provider: "0xModelProvider...",
deposit: "1000000000000000000", // 1 TNZO max budget
tokenRate: "100000000000000", // 0.0001 TNZO per token
});
// Stream with per-token billing
const stream = await client.inference.stream({
model: "qwen3.5-0.8b",
messages: [{ role: "user", content: "Write a long essay." }],
channel: channel.id,
});
// Cancel early if needed — only charged for tokens received
await stream.cancel();
// Close channel and settle remaining balance
const receipt = await client.settlement.closeChannel(channel.id);
console.log("Total charged:", receipt.totalCharged, "TNZO");
console.log("Refunded:", receipt.refunded, "TNZO");A2A Streaming
The A2A protocol server also supports streaming task updates via SSE at the /a2a/stream endpoint:
# Stream agent task updates via A2A protocol
curl -N -X POST https://a2a.tenzro.network/a2a/stream \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "tasks/send",
"params": {
"task": {
"skill": "inference",
"input": "Analyze this dataset"
}
},
"id": 1
}'Related Documentation
Inference — Non-streaming inference requests
Micropayments — Channel-based per-token billing
A2A Protocol — Agent-to-agent streaming
Models — Available models and pricing