Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers

Poster Session B: Wednesday, August 13, 1:00 – 4:00 pm, de Brug & E‑Hall

A Model of Continuous Speech Recognition Reveals the Role of Context in Human Phoneme Perception

Gasser Elbanna1, Josh Mcdermott2; 1Harvard University, 2Massachusetts Institute of Technology

Presenter: Gasser Elbanna

Humans successfully transform acoustic signals into meaning despite great variability in the speech signal. The underlying mechanisms that enable such robust perception remain unclear, in part due to the absence of models that replicate human performance and that could be used to test mechanistic hypotheses. We built an artificial neural network model of continuous speech perception, optimized to recognize sequences of sub-lexical units from cochlear representations of acoustic signals. We then developed non-word recognition benchmarks to evaluate human and model speech perception. The model closely matched human performance and replicated human-like patterns of phoneme recognizability and confusions. However, human-model similarity was dependent on recurrent processing, suggesting that human recognition depends critically on bidirectional integration of information in the speech signal. The model and benchmark set the stage for future investigations into the neural and perceptual mechanisms underlying speech.

Topic Area: Language & Communication

Extended Abstract: Full Text PDF