Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers
Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall
Not all induction heads are created equal
Tankred Saanum1, Eric Schulz2; 1Max Planck Institute for Biological Cybernetics, Max-Planck Institute, 2Max Planck Institute for Biological Cybernetics
Presenter: Tankred Saanum
Induction heads are specialized Transformer heads thought to be central for in-context learning. These heads work by having a token $x$ attend to tokens that \emph{succeeded} $x$ in the past, allowing Transformers to predict repetitive structures in the input prompt. When training large-scale Transformer models on text corpora, multiple such heads emerge. In this paper we show that induction heads in a Large Language Model (Qwen2.5-1.5B) exhibit diverse and context-dependent strategies for attending to these successor tokens when there are multiple successor candidates that can be attended to. Some heads prefer to attend to the very \emph{last} successor token, others to the very first. Some heads even "learn" to incorporate second-order contextual cues (e.g. what tokens preceded $x$) to attend to the successors that are actually predictive of future tokens. Overall, our findings show that induction heads are more sophisticated than previously believed, implement context-dependent computations for predicting future tokens based on patterns observed in the past.
Topic Area: Language & Communication
Extended Abstract: Full Text PDF