Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers
Poster Session C: Friday, August 15, 2:00 – 5:00 pm, de Brug & E‑Hall
MIRAGE: Robust multi-modal architectures translate fMRI-to-image models from vision to mental imagery
Reese Kneeland1, Cesar Torrico2, Tong Chen3, Jordyn Antonio Ojeda4, Shubh Khanna5, Jonathan Xu1, Paul Steven Scotti6, Thomas Naselaris7; 1Alljoined, 2MedARC, 3University of Sydney, University of Sydney, 4University of Minnesota - Twin Cities, 5Stanford University, 6Princeton University, 7University of Minnesota, Minneapolis
Presenter: Reese Kneeland
To be practically useful, vision decoding models trained on perceived images must generalize effectively to mental images—internally generated visual representations. Analysis of the NSD-Imagery dataset revealed that achieving state-of-the-art (SOTA) results on seen images does not guarantee similar performance on mental images. To address this gap, we developed **MIRAGE**, a model explicitly designed to generalize from visually perceived data to mental imagery decoding. \textbf{MIRAGE} employs a robust ridge regression approach, utilizes multi-modal conditioning with text and small-dimension image embeddings, and leverages the Stable Cascade diffusion model. Extensive human evaluations and image metrics establish **MIRAGE** as the SOTA method for mental image reconstruction on NSD-Imagery. Our ablation studies emphasize that mental imagery decoding benefits from simple architectures robust to low signal-to-noise conditions, explicit low-level guidance, multi-modal semantic features, and lower-dimensional embeddings compared to typical vision decoders. These findings highlight the potential of existing visual datasets for training models capable of effective mental imagery decoding.
Topic Area: Visual Processing & Computational Vision
Extended Abstract: Full Text PDF