LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory
Summary
LoGeR introduces a long-context 3D reconstruction method that processes video streams in chunks using a hybrid memory module. By combining Sliding Window Attention for local precision with Test-Time Training for global consistency, it reduces drift across sequences up to 19,000 frames without post-hoc optimization. The work demonstrates strong results on KITTI and long VBR sequences, highlighting a scalable approach to large-scale reconstruction.