Poster Presentation

Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall

Evaluating Model-to-Human Alignment on Image Occlusion

Ehsan Tousi¹, Haider Al-Tahan², Farzad Shayanfar³, Marieke Mur¹; ¹University of Western Ontario, ²Georgia Institute of Technology, ³Shahid Beheshti University of Medical Sciences

Presenter: Ehsan Tousi

Object recognition under occlusion is a significant challenge for both artificial and biological visual systems. We compared human performance with that of deep neural networks on a rapid object categorization task using systematically occluded natural images. Vision Language Models (VLMs) not only matched but exceeded human performance on heavily occluded images. In contrast, most Vision Only models showed steep performance drops. Confusion matrix analyses revealed that VLMs make semantically meaningful errors similar to humans, whereas Vision Only models show systematic biases toward specific categories regardless of input. VLMs' integration of linguistic context appears to enable more human-like inference of occluded object parts, suggesting a more object-centric approach compared to traditional pixel-based models. These results highlight the importance of multi-modal integration in developing more human-aligned visual recognition systems that maintain robustness under challenging viewing conditions.

Topic Area: Object Recognition & Visual Attention

Extended Abstract: Full Text PDF