Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall

Understanding Diverse Reasoning Procedures in Foundation Models via Mechanistic Interpretability

Mohanna Hoveyda1, Jasmin Kareem2, Roxana Petcu3, Angela van Sprang3, Ana Lucic3; 1Radboud University, 2Eindhoven University of Technology, 3University of Amsterdam

Presenter: Mohanna Hoveyda

Foundation models exhibit impressive performance on tasks that appear to require a wide range of reasoning abilities. However, they struggle to generalize under distribution shifts and struggle with reasoning problems that are trivial for humans. These inconsistencies raise a critical question: which internal mechanisms, if any, underlie the successes and failures of these models in reasoning tasks? While numerous benchmarks have been proposed to probe reasoning capabilities, our understanding of the underlying mechanisms responsible for such reasoning-like behavior remains limited. We hypothesize that \textit{distinct reasoning procedures are supported by specialized, possibly modular, computational pathways} in large-scale models. Mechanistic interpretability (MI) offers a promising set of tools to identify and analyze such pathways. However, most existing work operates in an isolated manner: evaluating a particular model for a particular reasoning task, often in a single modality. To address this gap, we first lay out a high-level taxonomy of reasoning processes and then conduct a systematic analysis of how mechanistic interpretability has been used to investigate diverse reasoning processes in various foundation models, across three main axes: (i) reasoning type, (ii) MI technique, and (iii) modality. We aim to develop a broader understanding of whether (A) different reasoning processes share computational mechanisms or are supported by distinct subsystems, and whether (B) such mechanisms are consistent across modalities other than text, such as vision.

Topic Area: Methods & Computational Tools

Extended Abstract: Full Text PDF