Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session C: Friday, August 15, 2:00 – 5:00 pm, de Brug & E‑Hall

Encoding Brain Regions with Sentiment-Relevant Circuits in LLMs

Nursulu Sagimbayeva1, Dota Tianai Dong2; 1Universität des Saarlandes, 2Max Planck Institute for Psycholinguistics

Presenter: Nursulu Sagimbayeva

Large language models (LLMs) generate representations that effectively predict brain responses to natural language, yet the specific circuits within LLMs that drive this alignment remain largely unexplored. We here apply techniques from mechanistic interpretability (MI) to identify LLM circuits (i.e., attention heads) causally relevant to sentiment processing and assess their impact on LLM–brain alignment. Our results show that removing sentiment-related attention heads leads to a greater decrease in alignment with language-processing brain regions compared to random head removal, although this difference does not reach statistical significance. Ongoing work aims to further improve LLM circuit identification in naturalistic settings, enabling more precise mapping of circuits to plausible brain mechanisms and ultimately providing deeper insights into LLM–brain alignment.

Topic Area: Language & Communication

Extended Abstract: Full Text PDF