Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Contributed Talk Session: Thursday, August 14, 10:00 – 11:00 am, Room C1.03
Poster Session C: Friday, August 15, 2:00 – 5:00 pm, de Brug & E‑Hall

Incorporating foveal sampling and integration to model 3D shape inferences

Stephanie Fu1, tyler bonnen2, Trevor Darrell3; 1University of California, Berkeley, 2Electrical Engineering & Computer Science Department, University of California, Berkeley, 3Electrical Engineering & Computer Science Department

Presenter: Stephanie Fu

Human vision is inherently sequential. This is largely because of foveal constraints on the retina, which demand that we shift our gaze to collect high-resolution information from throughout the environment. Within the domain of visual object perception, the neural substrates that support these sequential visual inferences have been well characterized: ventral temporal cortex (VTC) rapidly extracts visual features at each spatial location, while medial temporal cortex (MTC) integrates over the sequential outputs of VTC. This neurocomputational motif is absent in contemporary deep learning models of human vision. Not surprisingly, contemporary models approximate the rapid visual inferences that depend on VTC, but not those behaviors that depend on MTC (e.g., novel 3D shape inference). Here we develop a modeling framework that embodies the sequential sampling/integration strategy emblematic of human vision. Given an image, this model first determines relevant locations to attend to, sequentially processes these locations as ‘foveated’ inputs using a VTC-like model, then integrates over these sequential visual features within a MTC-like model. Here we report preliminary results on the design choices that lead to stable model optimization and subsequent model behaviors.

Topic Area: Visual Processing & Computational Vision

Extended Abstract: Full Text PDF