.

"The perceptual organization of speech: Contributions of general and speech-specific factors"

A summary of key results, papers, and posters for EPSRC Research Grant EP/F016484/1 (Brian Roberts & Peter J. Bailey) is presented here:

Purpose of the project

Spoken communication is a fundamental human activity. However, it is uncommon in everyday life for us to hear the speech of a single talker in the absence of other background sounds, and so our auditory system is faced with the challenge of grouping together those sound elements that come from one source and segregating them from those arising from other sources. Without a solution to this “auditory scene analysis” (ASA) problem, our perceptions of speech and other sounds would not correspond to the events producing them. The fact that we can focus our attention on one person speaking in the presence of other talkers indicates that our auditory system is generally successful at grouping together the sound elements from a source in a complex auditory scene, and segregating them from other sounds, but our understanding of how this is achieved remains limited. Most ASA research has focused on relatively simple sounds and has identified a number of general principles for the grouping of sound elements. However, these principles often seem inadequate to explain the perceptual grouping of speech, because speech has acoustic properties that are diverse and rapidly changing. Also, speech is a highly familiar stimulus, and so our auditory system has had the opportunity to learn about speech-specific properties that may assist in the successful perceptual grouping of speech. This project’s aim was to explore how much of our ability to segregate a talker’s speech from a sound mixture depends on general-purpose grouping principles, applicable to all sounds, and how much depends on speech-specific principles.

Published papers (pre-prints)

Included here are downloadable pre-prints of published papers arising from this project:

(1) Roberts, B., Summers, R.J., and Bailey, P.J. (2010). “The perceptual organization of sine-wave speech under competitive conditions,” Journal of the Acoustical Society of America, 128, 804-817. Pre-print

(2) Summers, R.J., Bailey, P.J., and Roberts, B. (2010). “Effects of differences in fundamental frequency on across-formant grouping in speech perception,” Journal of the Acoustical Society of America, 128, 3667-3677. Pre-print

(3) Roberts, B., Summers, R.J., and Bailey, P.J. (2011). “The intelligibility of noise-vocoded speech: Spectral information available from across-channel comparison of amplitude envelopes,” Proceedings of the Royal Society of London Series B: Biological Sciences, 278, 1595-1600. Pre-print

Papers under submission and in preparation

Included here are pre-prints of submitted papers and titles of papers in preparation. In addition, it is anticipated that at least two papers will arise from Marcin Stachurski's research towards the PhD; see also Poster presentations (verbal transformation effect).

(4) Summers, R.J., Bailey, P.J., and Roberts B. (submitted). "Effects of the rate of formant-frequency variation on the grouping of formants in speech perception," under submission to the Journal of the Association for Research in Otolaryngology (JARO). Pre-print

(5) Roberts, B., Summers, R.J., and Bailey, P.J. (in preparation). "The role of formant-frequency contours in the perceptual grouping of speech formants: Evidence against speech-specific constraints." Abstract

Poster presentations (formant-competitor paradigm)

Included here are posters containing material from the project which has not yet appeared in published journal articles. See also "papers under submission and in preparation."

(1) Poster by Roberts, Summers, and Bailey (presented in September 2010). The perceptual organization of noise-vocoded speech under competitive conditions

(2) Poster by Roberts, Summers, and Bailey (presented in May 2011). The role of formant-frequency contours in the perceptual grouping of speech formants

Poster presentations (verbal transformation effect)

Included here are posters containing material from the part of the project relating directly to Marcin Stachurski's research towards the PhD. None of this material has yet appeared in the form of published journal articles. Marcin is currently writing up his thesis.

(1) Poster by Stachurski, Summers, and Roberts (presented in September 2009). Grouping and the Verbal Transformation Effect - The influence of fundamental frequency, ear of presentation, and interaural time-difference cues

(2) Poster by Stachurski, Summers, and Roberts (presented in May 2011). Grouping and the Verbal Transformation Effect - The influence of formant transitions

Project outcomes summary

Our approach was to generate artificial speech-like stimuli with precisely controlled properties, particularly the spectral prominences called formants. These are important because they arise as a result of resonances in the air-filled cavities of the talker’s vocal tract. Variation in the frequency and amplitude of a formant is an inevitable consequence of change in the size of its associated cavity as the tongue, lips, and jaw move when the talker produces speech. Hence, knowledge of formant frequencies and their change over time is of great benefit to listeners trying to understand a spoken message, and so choosing the right set of formants from a mixture is critical for intelligibility. Simplified versions of target sentences were synthesised and then mixed with carefully designed “competitors” offering alternative grouping possibilities for the formants in the target sentence. The impact of these competitors on listeners’ recognition of the target sentence in the mixture was measured as the properties of the competitors were manipulated.

The key findings of the project are: (a) Modulation of the formant-frequency contour, but not the amplitude contour, is critical for across-formant grouping; (b) The ability of listeners to reject a competitor formant declines as either the rate or depth of modulation of its frequency contour increases, relative to that of the target sentence; (c) The impact of a competitor does not depend on whether its pattern of variation in formant frequency is plausibly speech-like; (d) The ability of listeners to reject a competitor increases as the pitch difference between target and competitor formants increases; (e) Formant-frequency variation conveys information important for speech intelligibility even in contexts often regarded as conveying information about speech-sound identity mainly through other cues. In summary, the results of this project have shown that our ability to segregate a talker’s speech from a sound mixture depends heavily on general-purpose grouping principles and rather less on speech-specific principles than has been suggested by some researchers. The results also suggest approaches by which engineers and computer scientists might improve the performance of devices such as hearing aids and automatic speech recognizers when they are operating in noisy environments.

Last updated 23 August 2011


Employable Graduates; Exploitable Research