Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session B: Wednesday, August 13, 1:00 – 4:00 pm, de Brug & E‑Hall

Neural networks and brains share the gist but not the details

Yingqi Rong1, Colin Conwell1, Michael Bonner1; 1Johns Hopkins University

Presenter: Michael Bonner

Artificial neural networks (ANNs) excel in complex visual tasks and exhibit a notable alignment with neural representations in the ventral visual stream. Despite these advances, debates continue over whether ANNs fully capture the intricacies of the visual cortex. In this work, we explore test-time augmentation (TTA) as a novel strategy to enhance model-brain similarity without modifying model architectures. Traditionally employed to improve prediction accuracy through the averaging of augmented inputs, TTA here is leveraged to generate feature maps that more accurately predict neural responses to visual stimuli in the high-level ventral visual stream. We demonstrate that TTA consistently improves the prediction of neural activations across diverse architectures—including AlexNet, ResNet50, Vision Transformers, and robustified models—irrespective of their training datasets. Remarkably, averaging features from semantically similar but structurally varied augmentations, including those generated via diffusion models conditioned on text, can outperform representations derived from the original images. These results suggest that the conceptual gist, rather than detailed visual properties, underpins the enhanced model-brain alignment facilitated by TTA. We discuss potential mechanisms driving this effect and outline future directions for dissecting the interplay between augmented representations and neural processing in the visual cortex.

Topic Area: Visual Processing & Computational Vision

proceeding: Full Text on OpenReview