How an inference provider can prove they're not serving a quantized model
Summary
Tinfoil introduces Modelwrap, a cryptographic approach to prove inference servers run exactly the committed model weights. It uses a Merkle-tree commitment, dm-verity runtime verification, and hardware enclaves to bind run-time data to the original weights, addressing concerns about quantized or tampered models in public and private deployments. The article covers architecture, building blocks, verification flow, and performance considerations.