Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall

Ventral Stream Responses to Inanimate Objects are Equally Aligned with AlexNet (2012) and Modern Deep Neural Networks

Leilany Torres Diaz1, George A. Alvarez1; 1Harvard University

Presenter: Leilany Torres Diaz

The field of deep learning is continuously developing novel neural network architectures, from residual connections in CNNs (He et al., 2016) to vision transformers with self-attention mechanisms (Dosovitskiy et al., 2020). While these advances appear to increase models' visual capabilities, it remains unclear whether modern architectures are becoming more brain-aligned. To address this question, we examined an Inanimate Objects neuro-imaging dataset with reliable neural responses to 72 inanimate objects in early visual cortex (EarlyV), posterior occipito-temporal cortex (pOTC), and anterior occipito-temporal cortex (aOTC). We compared alignment between model feature spaces and voxel-space using classic, unweighted representational similarity analysis. We included top-performing models on the Brain-Score platform (Schrimpf et al., 2018) and a leading visual foundation model (DinoV2, Oquab, 2023). We found that an AlexNet baseline model (Krizhevsky et al., 2012) matches or exceeds these models in alignment with each brain region, with substantial reliable variance in aOTC remaining unexplained by any model. These results suggest that progress in deep neural network development is missing key aspects of high-level visual representation in the human brain.

Topic Area: Visual Processing & Computational Vision

Extended Abstract: Full Text PDF