Poster Presentation

Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster A142 in Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall

Straightening of Natural Visual Sequences in Video DNNs: the Role of Locality and Temporal Coherence

Anne W. Zonneveld¹, Pascal Mettes¹, Iris Groen¹; ¹University of Amsterdam

Presenter: Anne W. Zonneveld

Predictions about future states of the world play an important role in guiding human behavior. In vision, internal representations are straightened relative to input space, supporting linear extrapolation and predictability. While DNNs are promising models of human visual processing, standard image-trained architectures lack this property. We assessed straightening, quantified as a reduction in curvature in feature vs. pixel space, across 19 different video DNNs, including both CNNs and Transformers. Straightening occurred in late CNN layers, but was absent in Transformers. Critically, models with global attention lacked temporal coherence in feature space, a prerequisite for straightening. These findings suggest that temporally localized processing, like 3D convolutions, enables brain-like invariant representations, whereas Transformers, despite their strong performance, may rely on mechanisms that diverge from biological vision.

Topic Area: Visual Processing & Computational Vision

Extended Abstract: Full Text PDF