|
Automated Captioning
INTRODUCTION
Today we have the capability of incorporating captioning into all
visual communications. As discussed in the previous diagram,
virtually everyone today has caption decoding capabilities in
their households, but few people have caption encoding
capabilities. We suggest that giving people the
power of captioning, in parallel with growing integration of
visual communications, would present a powerful addition to
all basic communications. We present the following diagram
as one viable example of a visual communication system with
caption encoding capabilities.
Diagram 2: Automated Captioning System
|
Click on the yellow components for further information.
|
|
ASSUMPTIONS
This diagram is based upon the following assumptions:
- In the near future computers will be a part of the
daily life in all homes.
- Voice recognition (trained to individual users) will
be part of these computers.
- Computers will become integrated into the primary
communication systems in homes.
(i.e. Current
telephone systems will merge with video and computer
technologies.)
SYSTEM TOUR
We have added a computer and a caption encoder to the
previously discussed visual communication system.
Instead of passing the video and audio directly into
the modulator, the audio is split off and fed into a
personal computer. The computer will be running
speech recognition software which will convert
the audio signal into an ASCII text stream. This ASCII
text is then routed into the caption encoder, along with the video
from the camera, which encodes the captions into a video stream. The modulator then treats
this new video stream as it would have without the encoded captioning information.
Back to Diagram
COMPONENT DETAILS
Computers
The computer in this system runs voice recognition software. The
Dragon Systems product,
NaturallySpeaking, is an example of a consumer-level program that
can manage continuous speech recognition. The minimum system
requirements include a 133Mhz processor, Windows 95/98/NT, a 16
bit sound card, 60MB hard disk space and 32MB (95/98) or 48MB (NT)
of RAM. (There is presently no Macintosh version.) Many computers
now, and all computers in the future, could easily meet these
requirements and thus make this a viable option for homes.
Back to Diagram
Speech Recognition
Currently, few people use speech recognition software on their home
systems yet there are robust programs available at the consumer
level. The following bullets are characterize this speech
recognition software:
- The software needs to be trained to a specific user. This is
not a major constraint if you consider that each user will most
likely have a primary workstation that he/she will use.
- Users need to enunciate clearly and thus cannot mumble or slur
their words.
- The Dragon Systems
product, NaturallySpeaking, is an example of a consumer-level program
that is promoted as "continuous speech general dictation software."
We spoke with a user who had worked with an earlier product,
DragonDictate, for many years and is now using NaturallySpeaking.
She explained that the system needed significant training, but that
she was then able to speak reasonably quickly. She offered the analogy
that she could speak more quickly than if she were talking in a meeting
where she was aware that someone was trying to taking notes on what she
said.
Back to Diagram
Caption Encoder
Caption encoders are used in television broadcasting and video production.
Captioning can be performed in one of the following two manners:
- "Open Captions" are directly overlayed onto the video stream,
and can not be suppressed by users. They are analogous to
subtitles in a movie.
- "Closed Captions" are transmitted by encoding text into one
of several channels in the Vertical Blanking Interval. (The
Vertical Blanking Interval, also known as the VBI, is extra
bandwidth which is not used in transmission of broadcast
video.) Characters can then be generated by the decoder on
the user's end if desired.
Readers interested in additional information on caption
encoders can visit these sites:
Ultech (midrange) and
Norpak (high-end/broadcast quality).
Back to Top
"InterCaption: Facilitating Communication Through Captioning"
© K. Acker, T. Lytle, J. Porvin (1998)
Back to VisCom 98
Projects Homepage
|