Contributed Talk Sessions | Poster Sessions | All Posters | Search Papers
Poster Session B: Wednesday, August 13, 1:00 – 4:00 pm, de Brug & E‑Hall
A Model of Continuous Speech Recognition Reveals the Role of Context in Human Phoneme Perception
Gasser Elbanna1, Josh Mcdermott2; 1Harvard University, 2Massachusetts Institute of Technology
Presenter: Gasser Elbanna
Humans successfully transform acoustic signals into meaning despite great variability in the speech signal. The underlying mechanisms that enable such robust perception remain unclear, in part due to the absence of models that replicate human performance and that could be used to test mechanistic hypotheses. We built an artificial neural network model of continuous speech perception, optimized to recognize sequences of sub-lexical units from cochlear representations of acoustic signals. We then developed non-word recognition benchmarks to evaluate human and model speech perception. The model closely matched human performance and replicated human-like patterns of phoneme recognizability and confusions. However, human-model similarity was dependent on recurrent processing, suggesting that human recognition depends critically on bidirectional integration of information in the speech signal. The model and benchmark set the stage for future investigations into the neural and perceptual mechanisms underlying speech.
Topic Area: Language & Communication
Extended Abstract: Full Text PDF