Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers
Poster Session B: Wednesday, August 13, 1:00 – 4:00 pm, de Brug & E‑Hall
Examining the precision of infants’ visual concepts by leveraging vision-language models and automated gaze coding
Tarun Sepuri1, Martin Zettersten1, Bria Lorelle Long1; 1University of California, San Diego
Presenter: Tarun Sepuri
Infants rapidly develop knowledge about the meanings of words in the first few years of life. Previous work has examined this word knowledge by measuring how much infants look at a named target image over a distractor. Here, we examine the specificity of that knowledge by manipulating the similarity of the target and distractor. We measured looking behavior in 91 14- to 24-month-old infants, enabled by automatic gaze annotation and online data collection. Using a vision-language model to quantify target-distractor image and text similarity, we find that infants' looking behavior is shaped by the high-level visual similarity of competitors: infants' looking to the target image was inversely correlated with image similarity but not with visual saliency. Our findings demonstrate how multimodal models can be used to systematically examine the content of infants' early visual representations.
Topic Area: Visual Processing & Computational Vision
Extended Abstract: Full Text PDF