Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers
Poster Session B: Wednesday, August 13, 1:00 – 4:00 pm, de Brug & E‑Hall
Learning visual cognition from children's visual experiences
Khai Loong Aw1, Wanhee Lee1, Klemen Kotar1, Rahul Mysore Venkatesh1, Honglin Chen1, Michael Frank1, Daniel LK Yamins1; 1Stanford University
Presenter: Khai Loong Aw
Even in infancy, children display sophisticated visual reasoning abilities, prompting long-standing debates over whether they are innate or learned. To study this, recent studies have trained computational models on egocentric video datasets from children, e.g., BabyView. However, they focused on perceptual tasks rather than more sophisticated visual reasoning, and used labels to fine-tune model readout layers, thus limiting the strength of their claims. In this work, we apply the recently developed Local Random Access Sequence (LRAS) framework, which progressively trains a series of self-supervised models. We train LRAS on 800 hours (approximately 0.2 child-years) of BabyView data. Our models successfully perform a range of 3D perceptual tasks for objects, depth, and scenes, as well as cognitive tasks such as simulation of future object motion and viewpoint changes, and physical reasoning about object cohesion, solidity, agent-object motion, and multi-object interactions. Notably, our models perform tasks in a unified, zero-shot manner, thus providing stronger evidence for the learning-based hypothesis. Overall, we establish a computational proof-of-concept that visual cognitive abilities can emerge from developmentally realistic experience through statistical learning with minimal innate priors.
Topic Area: Visual Processing & Computational Vision
Extended Abstract: Full Text PDF