Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session B: Wednesday, August 13, 1:00 – 4:00 pm, de Brug & E‑Hall

Relational Information Predicts Human Behavior and Neural Responses to Complex Social Scenes

Wenshuo Qin1, Manasi Malik1, Leyla Isik1; 1Johns Hopkins University

Presenter: Wenshuo Qin

Understanding social scenes depends on tracking relational visual information, which is prioritized behaviorally and represented in the superior temporal sulcus (STS). However, computational models often overlook these cues. Here, we evaluate two social interaction recognition models, SocialGNN and RNN Edge, that explicitly incorporate relational signals—gaze direction or physical contact—and compare their predictions to human behavioral and neural responses. SocialGNN organizes video frames into a graph structure with nodes representing faces and objects, and edges encoding relational signals. RNN Edge is simpler, processing only relational information over time without node features. We found both models strongly predicted human behavioral ratings of social interactions and were comparable to state-of-the-art AI models with far less training data and simpler architectures. Both models also better predict STS responses of people watching social interaction videos than a matched visual model trained without relational cues. These findings underscore the value of integrating relational cues into computational models of social vision.

Topic Area: Reward, Value & Social Decision Making

Extended Abstract: Full Text PDF