Comprehension of Synthesized Speech
While Driving and in the Lab

Lai, J., Tsimhoni, O., and Green, P.

April 2001

In this era of increasing mobility in the work force much attention is being devoted to the development of pervasive computing applications. Since in most cases, pervasive computing devices lack a usable keyboard and have limited display capability, both speech recognition technology and speech synthesis are being used in the design of these solutions. While extensive research has been conducted in the past on the perception of synthesized speech, many of these studies are out-of-date and were focused primarily on segment intelligibility at a word or a sentence level. Even though segmental intelligibility is a critical component in the overall acceptability of a speech synthesizer, there is a significant difference between the task of recognizing words and sentences, and the task of understanding the intended meaning of the message or spoken passage.

This paper reports on two studies that measured comprehension levels for synthetic speech, focusing on longer passages of text with word counts between 100 and 500 words. Both studies collected quantitative and qualitative data on the comprehensibility of synthetic speech; examining how comprehension performance was effected by the nature of the message and listening conditions. The goal in varying the message type was to understand if certain types of messages are better suited for use with synthesized speech than others. The listening conditions varied dramatically from one study to the other. The first study was conducted in a lab and the second in a driving simulator. In both cases the comprehension of messages with a recorded human voice (natural speech) was used as a baseline for each subject.

In the first study, five commercially available text-to-speech (TTS) engines were used for the delivery of messages. A total of 78 subjects listened to passages of text ranging from short e-mail messages to longer more complicated news articles. Half the subjects were allowed to take notes while listening and the other half were not. Following passage presentation, subjects answered multiple choice questions testing comprehension. In all cases the subjects listened to the passages on a telephone. Listening conditions were close to ideal since each subject was alone in a quiet lab with no distractions. Findings from this research are presented and implications are discussed.

In the second study, 24 new subjects were presented with messages while driving in a simulator. Each subject listened to synthesized speech and natural speech, both when driving and while the car was parked. As with the first study, comprehension was measured with multiple choice questions following passage presentation. Driving performance was also measured. The IBM TTS engine for embedded platforms was used to generate the synthesized speech.


Close This Window