DigiNews

Tech Watch by Johan Denoyer

← Back to articles

Zero-Copy GPU Inference from WebAssembly on Apple Silicon

Quality: 8/10 Relevance: 9/10

Summary

The article demonstrates zero-copy GPU inference by sharing WebAssembly linear memory with the Apple Silicon GPU, enabled by UMA. It details a three-link chain (mmap, Metal, Wasmtime), presents measurements showing true zero-copy behavior and comparable latency to copy paths, and explores stateful AI inference with Llama 3.2 1B and portable KV-cache snapshots via safetensors, outlining Driftwood's goals for stateful actors and model portability.

🚀 Service construit par Johan Denoyer