Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers
Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall
Understanding Diverse Reasoning Procedures in Foundation Models via Mechanistic Interpretability
Mohanna Hoveyda1, Jasmin Kareem2, Roxana Petcu3, Angela van Sprang3, Ana Lucic3; 1Radboud University, 2Eindhoven University of Technology, 3University of Amsterdam
Presenter: Mohanna Hoveyda
Foundation models exhibit impressive performance on tasks that appear to require a wide range of reasoning abilities. However, they struggle to generalize under distribution shifts and struggle with reasoning problems that are trivial for humans. These inconsistencies raise a critical question: which internal mechanisms, if any, underlie the successes and failures of these models in reasoning tasks? While numerous benchmarks have been proposed to probe reasoning capabilities, our understanding of the underlying mechanisms responsible for such reasoning-like behavior remains limited. We hypothesize that \textit{distinct reasoning procedures are supported by specialized, possibly modular, computational pathways} in large-scale models. Mechanistic interpretability (MI) offers a promising set of tools to identify and analyze such pathways. However, most existing work operates in an isolated manner: evaluating a particular model for a particular reasoning task, often in a single modality. To address this gap, we first lay out a high-level taxonomy of reasoning processes and then conduct a systematic analysis of how mechanistic interpretability has been used to investigate diverse reasoning processes in various foundation models, across three main axes: (i) reasoning type, (ii) MI technique, and (iii) modality. We aim to develop a broader understanding of whether (A) different reasoning processes share computational mechanisms or are supported by distinct subsystems, and whether (B) such mechanisms are consistent across modalities other than text, such as vision.
Topic Area: Methods & Computational Tools
Extended Abstract: Full Text PDF