InterCaption - Automated Captioning

Automated Captioning

INTRODUCTION
Today we have the capability of incorporating captioning into all visual communications. As discussed in the previous diagram, virtually everyone today has caption decoding capabilities in their households, but few people have caption encoding capabilities. We suggest that giving people the power of captioning, in parallel with growing integration of visual communications, would present a powerful addition to all basic communications. We present the following diagram as one viable example of a visual communication system with caption encoding capabilities.

Diagram 2: Automated Captioning System

Click on the yellow components for further information.

ASSUMPTIONS
This diagram is based upon the following assumptions:

In the near future computers will be a part of the daily life in all homes.
Voice recognition (trained to individual users) will be part of these computers.
Computers will become integrated into the primary communication systems in homes.
(i.e. Current telephone systems will merge with video and computer technologies.)

SYSTEM TOUR
We have added a computer and a caption encoder to the previously discussed visual communication system. Instead of passing the video and audio directly into the modulator, the audio is split off and fed into a personal computer. The computer will be running speech recognition software which will convert the audio signal into an ASCII text stream. This ASCII text is then routed into the caption encoder, along with the video from the camera, which encodes the captions into a video stream. The modulator then treats this new video stream as it would have without the encoded captioning information.

Back to Diagram

COMPONENT DETAILS

Computers
The computer in this system runs voice recognition software. The Dragon Systems product, NaturallySpeaking, is an example of a consumer-level program that can manage continuous speech recognition. The minimum system requirements include a 133Mhz processor, Windows 95/98/NT, a 16 bit sound card, 60MB hard disk space and 32MB (95/98) or 48MB (NT) of RAM. (There is presently no Macintosh version.) Many computers now, and all computers in the future, could easily meet these requirements and thus make this a viable option for homes. Back to Diagram

Speech Recognition
Currently, few people use speech recognition software on their home systems yet there are robust programs available at the consumer level. The following bullets are characterize this speech recognition software:

The software needs to be trained to a specific user. This is not a major constraint if you consider that each user will most likely have a primary workstation that he/she will use.
Users need to enunciate clearly and thus cannot mumble or slur their words.
The Dragon Systems product, NaturallySpeaking, is an example of a consumer-level program that is promoted as "continuous speech general dictation software." We spoke with a user who had worked with an earlier product, DragonDictate, for many years and is now using NaturallySpeaking. She explained that the system needed significant training, but that she was then able to speak reasonably quickly. She offered the analogy that she could speak more quickly than if she were talking in a meeting where she was aware that someone was trying to taking notes on what she said.

Back to Diagram

Caption Encoder
Caption encoders are used in television broadcasting and video production. Captioning can be performed in one of the following two manners:

"Open Captions" are directly overlayed onto the video stream, and can not be suppressed by users. They are analogous to subtitles in a movie.
"Closed Captions" are transmitted by encoding text into one of several channels in the Vertical Blanking Interval. (The Vertical Blanking Interval, also known as the VBI, is extra bandwidth which is not used in transmission of broadcast video.) Characters can then be generated by the decoder on the user's end if desired.

Readers interested in additional information on caption encoders can visit these sites: Ultech (midrange) and Norpak (high-end/broadcast quality).

Automated Captioning

"InterCaption: Facilitating Communication Through Captioning" © K. Acker, T. Lytle, J. Porvin (1998) Back to VisCom 98 Projects Homepage

"InterCaption: Facilitating Communication Through Captioning"
© K. Acker, T. Lytle, J. Porvin (1998)
Back to VisCom 98 Projects Homepage