Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall

Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT’s Representations

Louis Jalouzot1, Robin Sobczyk2, Bastien Lhopitallier3, Jeanne Salle4, Nur Lan5, Emmanuel Chemla6, Yair Lakretz7; 1Ecole Normale Supérieure de Lyon, 2Ecole Normale Supérieure – PSL, 3Ecole Normale Superieure, 4MATS, 5École Normale Supérieure, 6Earth Species Project, 7Ecole Normale Supérieure de Paris

Presenter: Robin Sobczyk

We introduce Metric-Learning Encoding Models (MLEMs), a new framework to learn a feature-based metric explaining the geometry of neural representations. Applying MLEMs to BERT, we track various linguistic features (e.g., tense, subject number) and find distinct importance profiles across layers. For a given layer, feature importance ranking corresponds to a hierarchical geometry of representations. A univariate variant of our model reveals remarkable spontaneous disentanglement: in all layers, distinct neuron groups specialize in encoding single, specific linguistic features. MLEMs are more robust than popular decoding methods, offering a powerful tool for analyzing representations in artificial and biological neural systems.

Topic Area: Language & Communication

Extended Abstract: Full Text PDF