Poster Presentation

Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session B: Wednesday, August 13, 1:00 – 4:00 pm, de Brug & E‑Hall

Learning visual cognition from children's visual experiences

Khai Loong Aw¹, Wanhee Lee¹, Klemen Kotar¹, Rahul Mysore Venkatesh¹, Honglin Chen¹, Michael Frank¹, Daniel LK Yamins¹; ¹Stanford University

Presenter: Khai Loong Aw

Even in infancy, children display sophisticated visual reasoning abilities, prompting long-standing debates over whether they are innate or learned. To study this, recent studies have trained computational models on egocentric video datasets from children, e.g., BabyView. However, they focused on perceptual tasks rather than more sophisticated visual reasoning, and used labels to fine-tune model readout layers, thus limiting the strength of their claims. In this work, we apply the recently developed Local Random Access Sequence (LRAS) framework, which progressively trains a series of self-supervised models. We train LRAS on 800 hours (approximately 0.2 child-years) of BabyView data. Our models successfully perform a range of 3D perceptual tasks for objects, depth, and scenes, as well as cognitive tasks such as simulation of future object motion and viewpoint changes, and physical reasoning about object cohesion, solidity, agent-object motion, and multi-object interactions. Notably, our models perform tasks in a unified, zero-shot manner, thus providing stronger evidence for the learning-based hypothesis. Overall, we establish a computational proof-of-concept that visual cognitive abilities can emerge from developmentally realistic experience through statistical learning with minimal innate priors.

Topic Area: Visual Processing & Computational Vision

Extended Abstract: Full Text PDF