Poster Presentation

Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall

Not all induction heads are created equal

Tankred Saanum¹, Eric Schulz²; ¹Max Planck Institute for Biological Cybernetics, Max-Planck Institute, ²Max Planck Institute for Biological Cybernetics

Presenter: Tankred Saanum

Induction heads are specialized Transformer heads thought to be central for in-context learning. These heads work by having a token $x$ attend to tokens that \emph{succeeded} $x$ in the past, allowing Transformers to predict repetitive structures in the input prompt. When training large-scale Transformer models on text corpora, multiple such heads emerge. In this paper we show that induction heads in a Large Language Model (Qwen2.5-1.5B) exhibit diverse and context-dependent strategies for attending to these successor tokens when there are multiple successor candidates that can be attended to. Some heads prefer to attend to the very \emph{last} successor token, others to the very first. Some heads even "learn" to incorporate second-order contextual cues (e.g. what tokens preceded $x$) to attend to the successors that are actually predictive of future tokens. Overall, our findings show that induction heads are more sophisticated than previously believed, implement context-dependent computations for predicting future tokens based on patterns observed in the past.

Topic Area: Language & Communication

Extended Abstract: Full Text PDF