Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers
Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall
Can Scene Graph Properties Explain the Neural Encoding Performance of Vision Transformers?
Helena Balabin1, Rik Vandenberghe, Marie-Francine Moens2; 1KU Leuven, 2KU Leuven, KU Leuven
Presenter: Helena Balabin
Neural encoding models allow for the exploration of hypotheses about cognitive processes by linking brain activations to representations derived from large language or image models. However, such representations often remain poorly understood, limiting the interpretability of neural encoding models. Therefore, we set out to examine the effect of scene graph properties on image model representations and neural encoding performances to functional magnetic resonance imaging (fMRI) data from the Natural Scenes Dataset (NSD). Specifically, we used the overlap between the NSD and the Visual Genome to characterize each image using the number of relationships, objects and depth of the accompanying scene graph annotations. We found that relationships and depth measures could be decoded more accurately both from fMRI activations and from image embeddings compared to objects, aligning with an afforance-based scene perception approach.
Topic Area: Object Recognition & Visual Attention
Extended Abstract: Full Text PDF