Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session C: Friday, August 15, 2:00 – 5:00 pm, de Brug & E‑Hall

Foveated sensing in KNN-convolutional neural networks based on isotropic cortical magnification

Nicholas Blauch1, Talia Konkle1; 1Harvard University

Presenter: Nicholas Blauch

Human vision prioritizes the center of gaze through spatially-variant retinal sampling, leading to magnification of the fovea in cortical visual maps. In contrast, deep neural network models (DNNs) typically operate on spatially uniform inputs, limiting their use in understanding the active and foveated nature of human vision. Prior works exploring foveated sampling in DNNs have introduced anisotropy in their attempts to wrangle retinal samples into a grid-like representation, sacrificing faithful cortical retinotopy and creating undesirable warped receptive field shapes that depend on eccentricity. Here, we offer an alternative approach by adapting the model architecture to enable realistic foveated sensing. First, we develop a spatially-variant input sensor derived from the assumption of isotropic cortical magnification. Second, as this produces a curved sensor manifold, we devise a novel method for hierarchical convolutional processing that defines receptive fields as k-nearest-neighborhoods on the sensor manifold. This approach allows us to build hierarchical KNN convolutional neural networks (KNN-CNNs) closely matched to their CNN counterparts. Architecturally, these models have more realistic cortical retinotopy and desirable receptive field properties, such as increasing size and approximately constant shape as a function of eccentricity. Training foveated KNN-CNNs end-to-end over natural images on a categorization task, we find that they provide improved performance relative to non-foveated CNNs when retinal resources are constrained relative to the native image resolution. Moreover, they exhibit increasing performance with multiple fixations that encode different parts of the image in high-resolution. Broadly, this model class offers a more biologically-aligned sampling of the visual world, enabling future computational work to model the active and spatial nature of human vision, and to build more neurally mappable models.

Topic Area: Visual Processing & Computational Vision

Extended Abstract: Full Text PDF