Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session C: Friday, August 15, 2:00 – 5:00 pm, de Brug & E‑Hall

Getting into Shape: The Impact of Early Visual Development on Object Recognition

Zejin Lu1, Sushrut Thorat2, Radoslaw Martin Cichy1, Tim C Kietzmann3; 1Freie Universität Berlin, 2University of Osnabrück, 3Universität Osnabrück

Presenter: Zejin Lu

A prolonged period of immaturity is a key feature distinguishing humans from artificial neural networks (ANNs) and many other animals. For example, various aspects of the visual experience of babies are rather poor and only slowly improve over time. In stark contrast, AI vision models are presented with mature, adult-like input from the start. Here we wondered whether there is a computational advantage to this developmental trajectory, and how far it would impact artificial vision models. Indeed, recent studies indicate that injecting several fixed levels of blur into training can improve robustness and shape bias — longstanding challenges for vision models in object recognition. However, a large margin to human visual robustness remains for all publicly available vision systems. To further explore the possibilities of a human-adjusted visual diet, we introduce a visual training trajectory that simulates the progressive developmental visual diet (DVD) of humans, spanning from newborns to 25-year-old adults. Three aspects of vision are considered: visual acuity, chromatic sensitivity, and contrast sensitivity. We demonstrate that training on vision tasks with DVD yields models with near-human-level shape bias, better alignment with robust human perception under signal deterioration, and much enhanced robustness to a variety of adversarial attacks. Importantly, the DVD improvements are observed on regular vision models, such as ResNet, trained on regular vision tasks, such as ecoset or ILSVRC, thus enabling robust visual inference outside of large-scale, large-data, multimodal models. DVD thereby offers a promising approach to bridging the gap between artificial neural networks and human visual systems.

Topic Area: Visual Processing & Computational Vision

Extended Abstract: Full Text PDF