Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session C: Friday, August 15, 2:00 – 5:00 pm, de Brug & E‑Hall

Quantifying infants’ everyday experiences with objects in a large corpus of egocentric videos

Jane Yang1, Tarun Sepuri1, Alvin Wei Ming Tan2, Michael Frank2, Bria Lorelle Long1; 1University of California, San Diego, 2Stanford University

Presenter: Jane Yang

While modern vision-language models are typically trained on millions of curated photographs, infants learn visual categories and the words that refer to them from very different training data. Here, we investigate which objects infants actually encounter in their everyday environments, and how often they encounter them. We use a large corpus of egocentric videos taken from the infant perspective (N = 868 hours, N = 31 participants), applying and validating a recent object detection model (YOLOE) to detect a set of categories that are frequently named in children’s early vocabulary. We find that infants’ visual experience is dominated by a small set of objects, with differences in individual children’s home environments driving variability. We also find that young children tend to learn words earlier for more frequently encountered categories. These results suggest that visual experience scaffolds young children’s early category and language learning and highlight that ecologically valid computational models of category learning must be able to accommodate skewed input distributions.

Topic Area: Object Recognition & Visual Attention

Extended Abstract: Full Text PDF