Have you ever thought that the captions on television are generated by lightning-quick typists, spelling out each word one keystroke at a time? Or maybe you thought they were generated by automatic speech recognition programs?
Welcome to the world of respeaking, a much faster alternative to conventional typing, and a captioning method gaining popularity in broadcast applications as well as in other workplace and education contexts.
What is respeaking?
Respeaking, as carried out by a respeaker, is the process of repeating what is heard into voice recognition software, which is trained to that specific individual’s voice and pronunciation. This software uses the audio input from the respeaker to generate the caption text.
Opinions vary in terms of what style of diction is most effective – some respeakers swear by a robotic tone of voice, while others take a more conversational approach. Regardless, in all cases, respeakers need to enunciate clearly, as human errors and software misrecognitions are all part of the nature of the beast.
One thing people often don’t realise about respeaking is that punctuation and other formatting also need to be added so that the output makes sense to the end user, which means that words like “comma” and “full stop” have to be said out loud – respeakers have been known to let slip the odd verbalised punctuation mark in everyday conversation!
Additionally, in order to provide the best replication of the hearing experience for a deaf or hard-of-hearing user, respeakers need voice commands (often known as ‘macros’) for special formatting, such as to indicate a change of speaker or song lyrics. This means respeakers often need to speak at a faster rate than the speaker they are captioning.
Respeaking presents a number of cognitive challenges for an individual – they have to hold previous sentences in memory while listening to the next one, analyse what’s been said and insert punctuation as required, paraphrase if necessary and then actually speak it all out to produce captions.
At the same time, they need to be monitoring the software output to identify and correct any errors – and, in the case of live broadcast television, move the captions around the screen so they don’t obscure any speakers’ mouths, graphics or any other onscreen information.
All up, it’s a challenging but rewarding pursuit which requires a good head for multitasking – and a love of language certainly doesn’t hurt!
What types of work do respeakers do?
Respeakers are typically located remotely from the captioning location, which allows them to caption a variety of jobs from anywhere in the world on short notice, as long as they have an audio source. They can be used for any and all captioning jobs, including closed and open captions, both real-time or otherwise.
The most visible and widely used example is captioning for broadcast television. Taking the example of a news bulletin, respeakers have the ability to send out blocked captions of the newsreaders’ scripts using additional captioning software, and can use respeaking for any live or ad-libbed portions. Respeakers can also be used for live sports broadcasts, be it anything from a stately tennis match to fast-paced horse racing.
However, respeakers also do a lot of work in the workplace and classroom contexts – for example, increasing access to lectures and meetings for deaf and hard-of-hearing students and employees. Working remotely means that they are able to caption unobtrusively and discreetly, opening up access where, say, a student might have been anxious about having a note-taker present in their classes.
Live captioning is also often required at events and conferences, where a screen might display the captions behind a presenter at a podium. As these events tend to be of higher production value, with the audio sent through a mixing desk or streamed online, it’s typically a simple matter for respeakers to receive the audio from the event and provide live captioning.
How long do respeakers caption for?
Respeakers’ voices are essentially their livelihoods, so it’s important to protect them with responsible guidelines. They are trained in maintaining good vocal cord health, including warm-up exercises to get their voices ready ahead of shifts – respeaking song lyrics along to the radio is one many of our captioners like!
Respeakers typically work in pairs, swapping over every 15 minutes or so for a maximum of two hours. However, the length of time for which they can caption also depends on the content, as the toll taken on a respeaker’s voice can vary wildly.
For example, in the case of broadcast sport, tennis or rhythmic gymnastics might be captioned by the same respeakers for hours at a time, as commentators tend to limit their chatter to periods of less onscreen action. This is in contrast to faster-paced sports like football, where gameplay is much more continuous and commentators are constantly chattering away (and, indeed, often over the top of each other). Captioners focus on communicating the commentary of most value, so play of the ball, which is visible on screen and less useful due to the inherent delay in broadcast captioning, can often be skipped over, easing the task of captioning such fast-paced sports.
How does respeaking vary from other types of captioning?
The other main providers of live captioning are stenographers (also known as palantypists in the UK), perhaps most familiar in court reporting contexts, where they sit in courtrooms and transcribe court proceedings verbatim. Stenography can be used for both transcription and live captioning applications, and in the case of the latter, it is known as stenocaptioning.
Text is output through the use of shorthand or stenotype machines. These machines, which look like little typewriters with a screen, have keys which correspond to phonetic sounds. Pressing multiple keys at the same time produces combinations of phonemes – these correspond to words, and allow the stenographer to create captions at a much faster speed than with regular keyboards.
Stenographers hence do not need to work in pairs, and don’t face the same vocal maintenance concerns that respeakers do. However, the paired working nature of respeakers also allows for benefits that are not available to solo captioners – for example, the standby captioner is able to research unfamiliar terms and correct mistakes or misrecognitions by the live captioner.
Preparation is very important for both respeakers and stenocaptioners in order to add unusual names or terminology to their dictionaries, but the large amounts of jargon and rapidly-coined neologisms across various captioning contexts means it’s easy for things to slip through the cracks! There are ways both types of captioners can circumvent unfamiliar terms they don’t have trained in – stenocaptioners can spell them out phonetically, while respeakers can manually type them.
Additionally, a common misconception is that respeakers are far less accurate than other types of captioning methods. At Ai-Media, both stenographers and respeakers are used for live broadcast captioning, and in the latest audit carried out by an independent monitor, our average live caption quality score was 99.5%. This score is an evaluation of content selected at random, according to the international NER scale of broadcast caption quality, and this level has been maintained for two years.
What does it take to become a respeaker?
There aren’t any particular qualities that are required upfront to be a respeaker – people from all walks of life can be trained to respeak, although it is thought the natural aptitude for the skill varies from person to person. Being a word nerd certainly helps, though, as you’ll be thinking about grammar, spelling and punctuation for many more of your waking hours!
Like any other skill, learning to respeak requires continual training and practice. While getting the initial hang of the technique and software might seem straightforward, getting up to an appropriate captioning standard is no mean feat. As speech recognition software becomes more sophisticated and new versions are released, these advances can go some way to correcting sloppy diction – but of course it’s no substitute for diligent training and upskilling.
At the end of a session, respeakers will review their output and optimise their software to address any mistakes that might have come up during a session. This might involve retraining their pronunciation of certain words so the software can better recognise those words, or adding new names or terms to their dictionary. This also ensures that the software is as up-to-date as possible with current affairs, and is essential to keep up with the rapid rate of change of today’s society and the language that describes it.
Ultimately, all live captioning, be it stenography or respeaking, aims to increase access for individuals who are deaf or hard of hearing. This goal is always kept front of mind by all captioners, and so a commitment to accessibility is the one essential for anybody thinking about becoming a respeaker.