Embed Text with Qwen3-Embedding

Text EmbeddingsIntermediate15 min

Tenzro's text-embedding runtime serves four model families: qwen3-embedding (0.6B / 4B / 8B), embeddinggemma-300m (Matryoshka 768 / 512 / 256 / 128), bge-m3, and snowflake-arctic-embed-l-v2.0. This tutorial walks through embedding text with the smallest, permissively-licensed entry — qwen3-embedding-0.6b— for retrieval and semantic search.

1. Download the model

# Qwen3-Embedding-0.6B is permissively licensed (Apache-2.0)
tenzro model download qwen3-embedding-0.6b

# Output:
# Resolving artifact bundle from HuggingFace Hub...
#   Source: Qwen/Qwen3-Embedding-0.6B (ONNX export)
#   License tier: Permissive
#   Files: model.onnx, tokenizer.json, config.json
#   SHA-256 verified for all files
#   Saved to: ~/.tenzro/models/qwen3-embedding-0.6b/

2. Load into the runtime

# Load into the text-embedding runtime
tenzro text-embedding load qwen3-embedding-0.6b

# Output:
# Text-embedding runtime loaded:
#   Model: qwen3-embedding-0.6b
#   Modality: text-embedding
#   Output dim: 1024
#   Max sequence length: 8192
#   Tokenizer: loaded from tokenizer.json

3. Embed text via the CLI

# Embed a single passage
tenzro embed-text \
  --model qwen3-embedding-0.6b \
  --text "Tenzro is a settlement layer purpose-built for the AI age."

# Output:
# Embedding (dim=1024):
#   [0.0231, -0.0118, 0.0492, ..., -0.0067]
#   tokens: 14
#   latency_ms: 22

Repeat --text to batch passages in a single forward pass:

# Embed multiple passages in one call (batched)
tenzro embed-text \
  --model qwen3-embedding-0.6b \
  --text "Validators secure the network." \
  --text "Providers serve AI models for TNZO." \
  --text "Agents pay for inference per token."

# Output:
# Batch embeddings (3 x 1024):
#   passage[0]: [0.0103, ...]
#   passage[1]: [-0.0421, ...]
#   passage[2]: [0.0298, ...]
#   total_tokens: 18
#   latency_ms: 47

4. Embed via JSON-RPC

# Equivalent JSON-RPC call against the public testnet
curl https://rpc.tenzro.network \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tenzro_textEmbed",
    "params": {
      "model_id": "qwen3-embedding-0.6b",
      "texts": [
        "Tenzro is a settlement layer purpose-built for the AI age.",
        "Validators secure the network.",
        "Providers serve AI models for TNZO."
      ]
    }
  }' | jq

A typical response:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "model_id": "qwen3-embedding-0.6b",
    "embeddings": [
      [0.0231, -0.0118, 0.0492, "..."],
      [0.0103, 0.0211, -0.0344, "..."],
      [-0.0421, 0.0156, 0.0287, "..."]
    ],
    "dim": 1024,
    "total_tokens": 31,
    "latency_ms": 47
  }
}

5. Matryoshka truncation

For storage-sensitive deployments, EmbeddingGemma-300M supports Matryoshka representation learning — truncate to 768, 512, 256, or 128 dimensions and re-normalize without losing the bulk of recall: