Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session C: Friday, August 15, 2:00 – 5:00 pm, de Brug & E‑Hall

From Sound to Source: Human and Model Recognition in Auditory Scenes

Sagarika Alavilli1, Josh Mcdermott2; 1Harvard University, Harvard University, 2Massachusetts Institute of Technology

Presenter: Sagarika Alavilli

Our ability to recognize sound sources in the world is critical to daily life, but it is not well documented or understood in computational terms. We developed large-scale behavioral benchmarks of human environmental sound recognition, built signal-computable models of sound recognition, and used the benchmarks to compare models to humans. The behavioral tests measured how sound recognition abilities varied with the source category, audio distortions of different types, and concurrent sound sources, all of which influenced recognition performance in humans. Artificial neural network models trained to classify sounds in multi-source scenes reached near-human accuracy and qualitatively matched human patterns of performance in many (but not all) conditions. By contrast, traditional models of the cochlea and auditory cortex produced worse matches to human performance. The results suggest that many aspects of human sound recognition emerge in systems optimized for the problem of real-world recognition. The benchmark results clarify the factors that constrain human recognition, setting the stage for future explorations of auditory scene perception involving salience, attention, and memory.

Topic Area: Memory, Spatial Cognition & Skill Learning

Extended Abstract: Full Text PDF