LoRA and Weight Decay
Summary
LoRA does not perfectly emulate full finetuning because weight decay interacts with the adapter-based setup in a way that biases updates toward the original frozen weights rather than toward zero. The post analyzes why LoRA solves a different optimization problem, derives the gradient dynamics, and proposes a concrete modification to weight decay to align LoRA more closely with full finetuning if desired. It also discusses practical considerations for momentum-based optimizers and provides code-oriented guidance.