Tutorial — Multi-modal AI
Embed images with DINOv3
The vision runtime exposes DINOv3, SigLIP2, and CLIP families. Use DINOv3 ViT-B/16 for self-supervised image embeddings that work well for similarity, retrieval, and clustering.
- Level
- Beginner
- Time
- ~10 min
- Prerequisites
- Tenzro CLI installed, sample image
- Stack
- CLI · JSON-RPC
01
Load DINOv3 on a provider
DINOv3 ships under Meta's commercial-custom terms — accept the license once at load time.
curl -X POST https://rpc.tenzro.network -H 'content-type: application/json' -d '{"jsonrpc":"2.0","id":1,"method":"tenzro_loadVisionModel","params":["dinov3-vitb16"]}'02
Embed a single image
The CLI handles PNG/JPEG/WebP decode and ImageNet-style normalization automatically.
# Image embedding ships as a JSON-RPC method; no CLI wrapper yet.
curl -X POST https://rpc.tenzro.network -H 'content-type: application/json' -d '{"jsonrpc":"2.0","id":1,"method":"tenzro_imageEmbed","params":["dinov3-vitb16","<base64-image>"]}'03
Compute similarity between two images
Cosine similarity over L2-normalized embeddings is the standard retrieval metric.
# image embedding is RPC-only — pass base64 images to tenzro_imageEmbed
# and run cosine similarity over the returned vectors client-side.
tenzro vision similarity --left a.vec --right b.vec04
Call from JSON-RPC
Send raw base64 bytes for server-side embedding when integrating with backends.
curl -s https://rpc.tenzro.network -H 'content-type: application/json' \
-d '{"jsonrpc":"2.0","id":1,"method":"tenzro_imageEmbed","params":{"model":"dinov3-vitb16","image_b64":"..."}}'Related