Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session C: Friday, August 15, 2:00 – 5:00 pm, de Brug & E‑Hall

A Large-Scale Study of Social Scene Judgments Reveals Alignment with Deep Neural Networks and Social-Affective Features

Kathy Garcia1, Leyla Isik1; 1Johns Hopkins University

Presenter: Kathy Garcia

Similarity judgments offer a window into the mental representations people use to make sense of objects and events---yet most prior work has focused on static images, leaving dynamic scene understanding underexplored. We introduce a novel large-scale dataset of 56,000+ odd-one-out similarity judgments for 250 3-second videos of everyday naturalistic social actions. Using representational similarity analysis (RSA), we compare human similarity judgments to behavioral ratings of social-visual features, fMRI responses, and embeddings from pretrained deep neural networks (DNNs). Language model embeddings from human video annotations explained the most unique variance in behavior, followed by social-affective features and visual DNNs. Neural activity in high-level social perception regions (EBA, LOC, STS, FFA) mirrored the behavioral similarity structure, whereas early visual and scene-selective areas did not. Variance partitioning showed that behavioral and model-derived features captured both overlapping and complementary structure, and their combination reached the level of split-half agreement in the data. This highlights how current AI models, especially language models encoding semantic information, perform well in approximating human judgments. Together, these findings and dataset reveal the nature of social event representations and offer a framework for evaluating model–brain–behavior alignment in dynamic social perception.

Topic Area: Reward, Value & Social Decision Making

Extended Abstract: Full Text PDF