Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers
Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall
Evaluating Model-to-Human Alignment on Image Occlusion
Ehsan Tousi1, Haider Al-Tahan2, Farzad Shayanfar3, Marieke Mur1; 1University of Western Ontario, 2Georgia Institute of Technology, 3Shahid Beheshti University of Medical Sciences
Presenter: Ehsan Tousi
Object recognition under occlusion is a significant challenge for both artificial and biological visual systems. We compared human performance with that of deep neural networks on a rapid object categorization task using systematically occluded natural images. Vision Language Models (VLMs) not only matched but exceeded human performance on heavily occluded images. In contrast, most Vision Only models showed steep performance drops. Confusion matrix analyses revealed that VLMs make semantically meaningful errors similar to humans, whereas Vision Only models show systematic biases toward specific categories regardless of input. VLMs' integration of linguistic context appears to enable more human-like inference of occluded object parts, suggesting a more object-centric approach compared to traditional pixel-based models. These results highlight the importance of multi-modal integration in developing more human-aligned visual recognition systems that maintain robustness under challenging viewing conditions.
Topic Area: Object Recognition & Visual Attention
Extended Abstract: Full Text PDF