DigiNews

Tech Watch by Johan Denoyer

← Back to articles

Language models transmit behavioural traits through hidden signals in data

Quality: 8/10 Relevance: 9/10

Summary

Nature's study reveals subliminal learning in distillation of language models: a teacher model can imprint its behavioural traits on a student even when the data used for training is semantically unrelated. The authors provide a theoretical proof and broad experiments (numbers, code, CoT) across multiple model families and cross-model setups, and discuss implications for AI safety, model provenance, and future safety evaluations.

🚀 Service construit par Johan Denoyer