Plenary Speakers

Pascal Belin

Aix-Marseille University (AMU) · Institute of Neurosciences of Timone

On The Cerebral Processing of Voice Information and its evolution

The human voice is the most important sound category in our auditory environment: because it carries speech, but also because it is an “auditory face” which we are expert at decoding. Neuroimaging studies have identified Temporal Voice Areas (TVAs) in the human auditory cortex, key nodes of a cerebral network of cortical and subcortical areas involved in processing voice information. But are the TVAs uniquely human? Comparative fMRI reveals that macaque monkeys also possess TVAs that are not only analogous, but also functionally homologous to the human TVAs in categorizing conspecific vocalizations apart from other sounds. This indicates a long evolutionary history of the vocal brain.

Yiya Chen

Leiden University Centre for Linguistics

Perceptual Learning of Tone

Mark Gibson

Universidad de Navarra

Listening to speech in noise: a psychoacoustic, computational and neurological approach

Our previous psychoacoustic work showed general confusion in discriminating the Spanish rounded back vowels [o,u] in contexts of noise (with background babble comprised of 1-16 speakers  and the signal-to-noise ratio, set at 0, -6, and -12 decibels, henceforth dB, and their interaction) by different populations (native monolingual Spanish-speaking adults, native monolingual Spanish-speaking children, ages 6-12, and native monolingual Spanish-speaking children with cochlear implants, ages 6-12). We attributed this confusion to the fact that tongue height, detectable through F1, is obfuscated by F3 (lip rounding) and that in the absence of a visual input by which a listener can discriminate mid and high vowels by a control parameter such as lip aperture (or jaw angle), listeners experience notable difficulty in discerning vowel categories. For the present work, we are training a series of Random Forest models in an unsupervised learning environment in addition to K-means clustering with visual (video) and audio (acoustic) data, with parameters specified for two noise conditions (mimicking our psychoacoustic tests) to test whether the integration of visual and auditory information computationally increases perception accuracy. Results from the models seem to indicate that the access to a visual stimulus increases discrimination accuracy (by decreasing entropy) in noise conditions, though not equally so for all empirically tested populations. Further studies are planned in which we use EEG and psychoacoustic tests with auditory and visual stimuli in order to test with empirical data the validity of the Random Forest models and to better understand how visual and auditory information interact while discriminating different phonological contrasts in different populations.