Hyper-DERP: C++/io_uring DERP relay - Same throughput as Tailscale's derper, half the cores
Summary
Hyper-DERP presents a C++ DERP relay using io_uring, reporting higher throughput on half the cores than Tailscale's derper, backed by a multi-VM benchmarking suite and analysis of TLS offload and kernel interactions. The piece covers architecture choices, per-core shard design, the kTLS cache cliff, and future enhancements like a UDP fast path and NIC TLS offload. It serves as a performance-centric case study in user-space relays and NAT-traversal networks.