Conferenciantes plenarios

Pascal Belin

Aix-Marseille University (AMU) · Institute of Neurosciences of Timone

On The Cerebral Processing of Voice Information and its evolution

The human voice is the most important sound category in our auditory environment: because it carries speech, but also because it is an “auditory face” which we are expert at decoding. Neuroimaging studies have identified Temporal Voice Areas (TVAs) in the human auditory cortex, key nodes of a cerebral network of cortical and subcortical areas involved in processing voice information. But are the TVAs uniquely human? Comparative fMRI reveals that macaque monkeys also possess TVAs that are not only analogous, but also functionally homologous to the human TVAs in categorizing conspecific vocalizations apart from other sounds. This indicates a long evolutionary history of the vocal brain.

Yiya Chen

Leiden University Centre for Linguistics

Perceptual Learning of Tone

A hallmark of speech is the presence of abundant variations in the acoustic realizations of sounds and words, both within and across different speakers. Yet listeners show a remarkable ability to adapt quickly and comprehend speech fluently. It is commonly recognized that perceptual learning of speech, or phonetic recalibration, plays a crucial role in facilitating this adaptive perception. What has remained open is how exactly our auditory perceptual system dynamically adjusts itsmapping of incoming acoustic stimuli to sound categories and how generalizable and stable such recalibration remains. Existing research on perceptual learning of speech has drawn evidence primarily from segmental studies.  

In this talk, I will address these inquiries by discussing three sets of results on the perceptual learning of lexical tone, a suprasegmental feature of speech cued mainly via pitch variations to distinguish word meanings. I will show that tonal perceptual learning (to a novel artificial accent) can be observed across lexical items, tonal contexts, and even tonal contours that are phonetically similar but phonologically distinct. This confirms the automaticity and generalization of tonal perceptual learning in real time. Despite the immediate and robust recalibration of tone perception, no lasting learning effect was observed during subsequent (sleep-induced) consolidation. These new insights on the perceptual learning of tone suggest that while our perceptual system flexibly adapts to sensory inputs in real time, the constancy of phonetic categories is ensured by a consolidation mechanism that selectively consolidates informative cues and maximizes the utility of memories for robust and functional variations. Our findings complement existing research on perceptual learning of speech segments and help to constrain theories regarding the adaptive perceptual processing of speech.

Mark Gibson

Universidad de Navarra

Listening to speech in noise: a psychoacoustic, computational and neurological approach

Our previous psychoacoustic work showed general confusion in discriminating the Spanish rounded back vowels [o,u] in contexts of noise (with background babble comprised of 1-16 speakers  and the signal-to-noise ratio, set at 0, -6, and -12 decibels, henceforth dB, and their interaction) by different populations (native monolingual Spanish-speaking adults, native monolingual Spanish-speaking children, ages 6-12, and native monolingual Spanish-speaking children with cochlear implants, ages 6-12). We attributed this confusion to the fact that tongue height, detectable through F1, is obfuscated by F3 (lip rounding) and that in the absence of a visual input by which a listener can discriminate mid and high vowels by a control parameter such as lip aperture (or jaw angle), listeners experience notable difficulty in discerning vowel categories. For the present work, we are training a series of Random Forest models in an unsupervised learning environment in addition to K-means clustering with visual (video) and audio (acoustic) data, with parameters specified for two noise conditions (mimicking our psychoacoustic tests) to test whether the integration of visual and auditory information computationally increases perception accuracy. Results from the models seem to indicate that the access to a visual stimulus increases discrimination accuracy (by decreasing entropy) in noise conditions, though not equally so for all empirically tested populations. Further studies are planned in which we use EEG and psychoacoustic tests with auditory and visual stimuli in order to test with empirical data the validity of the Random Forest models and to better understand how visual and auditory information interact while discriminating different phonological contrasts in different populations.