Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers
Poster Session A: Tuesday, August 12, 1:30 – 4:30 pm, de Brug & E‑Hall
Modeling human visual neural representations by combining vision deep neural networks and large language models
Boyan Rong1, Emrah Düzel, Radoslaw Martin Cichy1; 1Freie Universität Berlin
Presenter: Boyan Rong
Human visual perception involves transforming low-level visual features into high-level semantic representations. While deep neural networks (DNNs) trained on object recognition tasks have been promising models for predicting hierarchical visual processing, they often fail to capture higher-level semantic representations. In contrast, large language models (LLMs) encode rich semantic information that aligns with later stages of visual processing. Here we investigated whether combining vision DNN features and semantic embeddings from LLMs can better account for the neural dynamics of visual perception in electroencephalography (EEG) data. We demonstrated that their combination significantly improved the prediction of neural responses compared to either model alone. This approach outperformed multimodal modelling, and model comparison showed that the observed improvement was due to capturing complex information rather than a single factor.
Topic Area: Visual Processing & Computational Vision
Extended Abstract: Full Text PDF