Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session B: Wednesday, August 13, 1:00 – 4:00 pm, de Brug & E‑Hall

Examining the precision of infants’ visual concepts by leveraging vision-language models and automated gaze coding

Tarun Sepuri1, Martin Zettersten1, Bria Lorelle Long1; 1University of California, San Diego

Presenter: Tarun Sepuri

Infants rapidly develop knowledge about the meanings of words in the first few years of life. Previous work has examined this word knowledge by measuring how much infants look at a named target image over a distractor. Here, we examine the specificity of that knowledge by manipulating the similarity of the target and distractor. We measured looking behavior in 91 14- to 24-month-old infants, enabled by automatic gaze annotation and online data collection. Using a vision-language model to quantify target-distractor image and text similarity, we find that infants' looking behavior is shaped by the high-level visual similarity of competitors: infants' looking to the target image was inversely correlated with image similarity but not with visual saliency. Our findings demonstrate how multimodal models can be used to systematically examine the content of infants' early visual representations.

Topic Area: Visual Processing & Computational Vision

Extended Abstract: Full Text PDF