Poster Presentation

Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session C: Friday, August 15, 2:00 – 5:00 pm, de Brug & E‑Hall

MIRAGE: Robust multi-modal architectures translate fMRI-to-image models from vision to mental imagery

Reese Kneeland¹, Cesar Torrico², Tong Chen³, Jordyn Antonio Ojeda⁴, Shubh Khanna⁵, Jonathan Xu¹, Paul Steven Scotti⁶, Thomas Naselaris⁷; ¹Alljoined, ²MedARC, ³University of Sydney, University of Sydney, ⁴University of Minnesota - Twin Cities, ⁵Stanford University, ⁶Princeton University, ⁷University of Minnesota, Minneapolis

Presenter: Reese Kneeland

To be practically useful, vision decoding models trained on perceived images must generalize effectively to mental images—internally generated visual representations. Analysis of the NSD-Imagery dataset revealed that achieving state-of-the-art (SOTA) results on seen images does not guarantee similar performance on mental images. To address this gap, we developed **MIRAGE**, a model explicitly designed to generalize from visually perceived data to mental imagery decoding. \textbf{MIRAGE} employs a robust ridge regression approach, utilizes multi-modal conditioning with text and small-dimension image embeddings, and leverages the Stable Cascade diffusion model. Extensive human evaluations and image metrics establish **MIRAGE** as the SOTA method for mental image reconstruction on NSD-Imagery. Our ablation studies emphasize that mental imagery decoding benefits from simple architectures robust to low signal-to-noise conditions, explicit low-level guidance, multi-modal semantic features, and lower-dimensional embeddings compared to typical vision decoders. These findings highlight the potential of existing visual datasets for training models capable of effective mental imagery decoding.

Topic Area: Visual Processing & Computational Vision

Extended Abstract: Full Text PDF