Poster Presentation

Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster A29 in Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall

How Does an LLM Process Conflicting Information In-Context?

Ivan Andre Naranjo Coronel¹, Can Demircan¹, Eric Schulz²; ¹Helmholtz Zentrum München, ²Max Planck Institute for Biological Cybernetics

Presenter: Ivan Andre Naranjo Coronel

Large language models (LLMs) gain understanding from vast training datasets during the pretraining phase. Although prior research has examined how these models store knowledge, how they distinguish between accurate and false information in context is yet to be explored. In this paper, we presented LLMs with correct and false information in context and prompted them to discriminate between the two. To understand which model components carry out this ability, we performed activation patching. We showed in detail, how much different model components contribute to this behavior. Furthermore, we analyzed how prompt order and content affect our patching results. Overall, we reveal which model components separate factual from false information. We intend to advance this study by investigating how these results hold up under different influences.

Topic Area: Language & Communication

Extended Abstract: Full Text PDF