Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall

Topographic Vision Transformers

Yash Shah1, Daniel LK Yamins1; 1Stanford University

Presenter: Yash Shah

Functional organization in the form of topographic maps is a hallmark of many cortical systems and is believed to arise from biophysical efficiency, such as the minimization of neuronal wiring length. Recently, Margalit et al. (2024) developed the TDANN as a topographic convolutional neural network (CNN) that recapitulated gross ventral stream topography while minimizing feedforward wiring length. However, standard CNNs lack mechanisms for within-layer long-range interactions that are well identified in the primate visual cortex. Here we leverage a vision transformer (ViT), which learns to behave locally like CNNs through training and possesses long-range interactions via self-attention, to learn topographic properties. We find that a topographic ViT reproduces key topographic motifs, maintains high object categorization performance, and shows reduced inter- and intra-layer wiring length. We thus introduce a new class of topographic models that can express hypotheses about the roles of local vs. long-range cortical interactions in the brain.

Topic Area: Visual Processing & Computational Vision

Extended Abstract: Full Text PDF