Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall

Glimpse prediction fosters graph-oriented scene representations aligned with the ventral visual cortex

Sushrut Thorat1, Adrien Doerig2, Alexander Kroner3, Carmen Amme3, Tim C Kietzmann3; 1University of Osnabrück, 2Freie Universität Berlin, 3Universität Osnabrück

Presenter: Sushrut Thorat

Understanding how the visual system responds to natural scenes remains a central challenge in vision science. Research shows that the ventral visual cortex (VVC) encodes objects, textures, and the spatial and semantic relationships between them—forming a structured scene representation, akin to a scene graph. However, inferring such representations from images in an interpretable, image-computable way is still an open problem. We propose glimpse prediction—predicting the upcoming visual input given an eye movement (saccadic efference copy)—as a training objective that encourages the emergence of representations with graph properties in artificial neural networks. A recurrent neural network trained with this objective learns spatial covariance between glimpses, across scenes (in‑weight learning) and in novel scenes (in‑context learning). Importantly, the model’s internal representations align closely with VVC responses to natural scenes (Natural Scenes Dataset), despite never observing the full scene or receiving explicit semantic labels. Thus, glimpse prediction offers a principled route to building graph-oriented representations mirroring those in the human ventral visual stream. Combining interpretable concepts from cognitive neuroscience and image‑computable neuroconnectionist models, this work advances a comprehensive understanding of the visual system's response to natural scenes.

Topic Area: Object Recognition & Visual Attention

Extended Abstract: Full Text PDF