Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers
Poster Session B: Wednesday, August 13, 1:00 – 4:00 pm, de Brug & E‑Hall
Relational Information Predicts Human Behavior and Neural Responses to Complex Social Scenes
Wenshuo Qin1, Manasi Malik1, Leyla Isik1; 1Johns Hopkins University
Presenter: Wenshuo Qin
Understanding social scenes depends on tracking relational visual information, which is prioritized behaviorally and represented in the superior temporal sulcus (STS). However, computational models often overlook these cues. Here, we evaluate two social interaction recognition models, SocialGNN and RNN Edge, that explicitly incorporate relational signals—gaze direction or physical contact—and compare their predictions to human behavioral and neural responses. SocialGNN organizes video frames into a graph structure with nodes representing faces and objects, and edges encoding relational signals. RNN Edge is simpler, processing only relational information over time without node features. We found both models strongly predicted human behavioral ratings of social interactions and were comparable to state-of-the-art AI models with far less training data and simpler architectures. Both models also better predict STS responses of people watching social interaction videos than a matched visual model trained without relational cues. These findings underscore the value of integrating relational cues into computational models of social vision.
Topic Area: Reward, Value & Social Decision Making
Extended Abstract: Full Text PDF