Seeing in Pangram Space
Summary
Pangram Labs presents an interpretability study of Pangram 3.3.2, analyzing internal activations and embedding space to understand how AI-generated and human-authored texts are represented across layers. The work highlights that model internals encode detectable patterns beyond the final detection score, including model-family clustering and humanizer effects.