Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers
Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall
Straightening of Natural Visual Sequences in Video DNNs: the Role of Locality and Temporal Coherence
Anne W. Zonneveld1, Pascal Mettes1, Iris Groen1; 1University of Amsterdam
Presenter: Anne W. Zonneveld
Predictions about future states of the world play an important role in guiding human behavior. In vision, internal representations are straightened relative to input space, supporting linear extrapolation and predictability. While DNNs are promising models of human visual processing, standard image-trained architectures lack this property. We assessed straightening, quantified as a reduction in curvature in feature vs. pixel space, across 19 different video DNNs, including both CNNs and Transformers. Straightening occurred in late CNN layers, but was absent in Transformers. Critically, models with global attention lacked temporal coherence in feature space, a prerequisite for straightening. These findings suggest that temporally localized processing, like 3D convolutions, enables brain-like invariant representations, whereas Transformers, despite their strong performance, may rely on mechanisms that diverge from biological vision.
Topic Area: Visual Processing & Computational Vision
Extended Abstract: Full Text PDF