Poster Presentation

Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall

Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT’s Representations

Louis Jalouzot¹, Robin Sobczyk², Bastien Lhopitallier³, Jeanne Salle⁴, Nur Lan⁵, Emmanuel Chemla⁶, Yair Lakretz⁷; ¹Ecole Normale Supérieure de Lyon, ²Ecole Normale Supérieure – PSL, ³Ecole Normale Superieure, ⁴MATS, ⁵École Normale Supérieure, ⁶Earth Species Project, ⁷Ecole Normale Supérieure de Paris

Presenter: Robin Sobczyk

We introduce Metric-Learning Encoding Models (MLEMs), a new framework to learn a feature-based metric explaining the geometry of neural representations. Applying MLEMs to BERT, we track various linguistic features (e.g., tense, subject number) and find distinct importance profiles across layers. For a given layer, feature importance ranking corresponds to a hierarchical geometry of representations. A univariate variant of our model reveals remarkable spontaneous disentanglement: in all layers, distinct neuron groups specialize in encoding single, specific linguistic features. MLEMs are more robust than popular decoding methods, offering a powerful tool for analyzing representations in artificial and biological neural systems.

Topic Area: Language & Communication

Extended Abstract: Full Text PDF