Your Language Model Secretly Contains Personality Subnetworks
Summary
This arXiv paper argues that language models contain persona-specific subnetworks embedded in their parameters. It introduces a training-free masking approach to identify and isolate subnetworks corresponding to different personas and discusses how to derive opposing persona subnetworks to enable controllable personalization.