Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session B: Wednesday, August 13, 1:00 – 4:00 pm, de Brug & E‑Hall

Contrast Sensitivity Function of Multimodal Vision-Language Models

Pablo Hernández-Cámara1, Alexandra Gomez-Villa2, Jose Manuel Jaén-Lorites3, Jorge Vila-Tomás1, Jesus Malo4, Valero Laparra5; 1Universidad de Valencia, 2Universitat Autónoma de Barcelona, 3Universidad Politécnica de Valencia, 4Universitat de Valencia, 5Universitat de València

Presenter: Alexandra Gomez-Villa

Assessing the alignment of multimodal vision-language models (VLMs) with human perception is essential to understand how they perceive low-level visual features. A key characteristic of human vision is the contrast sensitivity function (CSF), which describes sensitivity to spatial frequency at low-contrasts. Here, we introduce a novel behavioral psychophysics-inspired method to estimate the CSF of chat-based VLMs by directly prompting them to judge pattern visibility at different contrasts for each frequency. This methodology is closer to the real experiments in psychophysics than the previously reported. Using band-pass filtered noise images and a diverse set of prompts, we assess model responses across multiple architectures. We find that while some models approximate human-like CSF shape or magnitude, none fully replicate both. Notably, prompt phrasing has a large effect on the responses, raising concerns about prompt stability. Our results provide a new framework for probing visual sensitivity in multimodal models and reveal key gaps between their visual representations and human perception.

Topic Area: Visual Processing & Computational Vision

Extended Abstract: Full Text PDF