Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers
Poster Session C: Friday, August 15, 2:00 – 5:00 pm, de Brug & E‑Hall
Encoding Brain Regions with Sentiment-Relevant Circuits in LLMs
Nursulu Sagimbayeva1, Dota Tianai Dong2; 1Universität des Saarlandes, 2Max Planck Institute for Psycholinguistics
Presenter: Nursulu Sagimbayeva
Large language models (LLMs) generate representations that effectively predict brain responses to natural language, yet the specific circuits within LLMs that drive this alignment remain largely unexplored. We here apply techniques from mechanistic interpretability (MI) to identify LLM circuits (i.e., attention heads) causally relevant to sentiment processing and assess their impact on LLM–brain alignment. Our results show that removing sentiment-related attention heads leads to a greater decrease in alignment with language-processing brain regions compared to random head removal, although this difference does not reach statistical significance. Ongoing work aims to further improve LLM circuit identification in naturalistic settings, enabling more precise mapping of circuits to plausible brain mechanisms and ultimately providing deeper insights into LLM–brain alignment.
Topic Area: Language & Communication
Extended Abstract: Full Text PDF