LoRA and Weight Decay
Summary
The piece analyzes LoRA finetuning and how weight decay interacts with adapter matrices, showing that LoRA does not simply approximate full finetuning because its objective is biased toward the initial frozen weights. It presents a mathematical derivation of a corrected weight-decay approach for LoRA, offers concrete update equations and code snippets (including Optax/AdamW context), and discusses momentum considerations and practical implications for ML practitioners.