Captions for Live Streaming

This post is a part of our educational series for those new to Live Captioning.

In this post, we discuss how captions greatly increase the accessibility of your content and examine how accuracy and cost should be considered when selecting a live captioning solution.

Promoting Accessibility

Captions are transcriptions displayed on-screen for viewers who are deaf or hard-of-hearing as well as viewers who aren’t proficient in the language of the live video content. These captions allow viewers to follow what the talent/presenter is saying. Offering captions with live streams has become essential for the following:

Enterprises building a supportive and inclusive organization.
Companies in heavily regulated industries such as pharma and medical industries.
Government organisations (counties, educational institutions) that must comply with legal regulations for equality via accessibility.

Beyond accessibility, viewers in noisy environments or situations where they are unable to play the sound out loud, prefer to watch the live stream muted and read the captions.

Example of Live Captions displayed on the StreamShark player

The two most important considerations when evaluating live captioning solutions are accuracy and cost. These are directly impacted by the way the captions are generated, whether the captions are human-generated or computer-generated (leverage Automatic Speech Recognition, commonly known as ASR).

Accuracy of Live Captioning Solutions

In a scenario where live captions are required for a high profile global live stream with VIPs, accuracy is critical and we recommend using human-generated captions. Human captioners excel at understanding human speech (accents, speech variations, technical language, context) and generating the words they hear quickly and accurately.

StreamShark Enterprise customers use professional captioning services such as Ai-Media or EEG Video’s Falcon service. Ai-Media hires and trains their captioners while EEG Video’s Falcon service leverages EEG’s iCap (the global network of caption partners including Ai-Media).

Computer-generated ASR live captions have lower accuracy which may annoy some viewers but are suitable for use-cases such as transcription of everyday meetings where the user can review the transcript and correct it. For solutions leveraging machine learning, the transcription can be assisted with the help of custom dictionaries (containing industry-specific terminology) to train the captioning model. With continued use over a period of time, machine learning based ASR models may achieve an accuracy of 96% to 99%.

EEG Video also offers Lexi, an automatic captioning service with over 90% accuracy in English, Spanish, and French languages. Epiphan Video, one of our Encoder partners, recently released LiveScrypt, a dedicated automatic transcription device leveraging ASR technology with support for English and major European languages.

The models used for measuring accuracy of captions can vary across providers, so it isn’t easy to compare accuracy levels.

Word Error Rate (WER Model)

The most common model used to measure the accuracy of captions has been the Word Error Rate, or WER, model. Word Error Rate (WER) is calculated as follows:

Equation for calculating Word Error Rate (WER)

Where,

Substitutions are any time a word gets replaced (for example, “twist” is transcribed as “wrist”)
Insertions are anytime a word gets added that wasn’t said (for example, “go-getter” becomes “go get her”)
Deletions are anytime a word is omitted from the transcript (for example, “let it go” becomes “let go”)

While the WER model is popular, it doesn’t reflect the impact the quality of the captions will have on the viewers, especially those who are deaf or are hard of hearing. While a captioned piece may have high accuracy, it may contain major errors that change the meaning of some of the sentences. It is also possible that there might be a captioned piece with a lower accuracy score but with errors that don’t change the meaning of some of the sentences.

NER Model

The NER model is an alternative to the WER model and was proposed by Prof. Pablo Romero-Fresco and Juan Martinez, a respeaking consultant. ‘Respeaking’ is the term used for a captioner repeating the dialogue of a TV program or other medium into a microphone, which is then turned into captions by text-to-speak software. NER is calculated as follows:

Where,

Edition errors are words that have been spoken but do not appear in the captions, or words that have been added to the captions but have not been spoken (for example, “let it go” becomes “let go” or “this is chaos” becomes “this is a chaos”)
Recognition errors are incorrect word(s) appearing in the captions (for example, “twist” is transcribed as “wrist”)

The NER model considers that all errors do not pose the same problems in comprehension. So it considers the impact of the accuracy of captions on the viewers. This measurement process is already used for public television broadcasts in several European countries like Italy and Switzerland as well as in Australia. Ai-Media uses the NER Model and its human-generated captions have been externally audited as having up to 99.6% accuracy.

Cost of Live Captioning Solutions

Different vendors offer different pricing models, including:

Flat rate charges for booking a human captioner for a half-day/full-day
Flat rate pricing based on 10 minute slots ($/ten-minutes)
Flat rate hourly pricing ($/hour)
Tiered pricing based on live stream duration and the quality level you prefer ($/minute)

If cost is a key concern and there is a limited budget, we recommend computer-generated captions. ASR captions can usually be 10 or 20 times cheaper than human-generated captions with the trade-off being lower accuracy.

In summary, there is a growing demand for live captioning for events and meetings to increase the accessibility of content. Human captioning services offer higher accuracy than machine learning based captioning services but are typically more expensive.

To learn more about Live captioning, please visit the following posts:

608 vs 708 Closed Captions for Live Streams

References:

Ai-Media, ‘Should You Use Computer-Generated or Human-Generated Captions?’, Ai-Media, https://www.Ai-Media.tv/should-you-use-computer-generated-or-human-generated-captions, (accessed 02 October 2020)

Ai-Media, ‘External Captioning Quality Audit’, Ai-Media, https://www.Ai-Media.tv/external-captioning-quality-audit, (accessed 02 October 2020)

Allison Koo, ‘How to Calculate Word Error Rate’, Rev, https://www.rev.ai/blog/how-to-calculate-word-error-rate, (accessed 02 October 2020)

EEG Enterprises, ‘EEG iCap’, EEG Video, https://eegent.com/icap, (accessed 18 October 2020)

EEG Enterprises, ‘Lexi™ Automatic Captioning Service for Live Video’, https://eegent.com/products/QAM44XW07EVXHHZS/lexiTM-automatic-captioning, (accessed 18 October 2020)

Epiphan Systems Inc, ‘Epiphan LiveScrypt’, Epiphan Video , https://www.epiphan.com/products/livescrypt/, (accessed 05 September 2020)

Romero-Fresco P., Pérez J.M. (2015) Accuracy Rate in Live Subtitling: The NER Model. In: Piñero R.B., Cintas J.D. (eds) Audiovisual Translation in a Global Context. Palgrave Studies in Translating and Interpreting. Palgrave Macmillan, London. https://doi.org/10.1057/9781137552891_3

Cookie	Duration	Description
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
_GRECAPTCHA	5 months 27 days	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	New Relic uses this cookie to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__hssc	session	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
client_token	never	Description is currently not available.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
messagesUtk	5 months 27 days	HubSpot sets this cookie to recognize visitors who chat via the chatflows tool.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__hstc	5 months 27 days	This is the main cookie set by Hubspot, for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gat_gtag_UA_24477138_4	1 minute	Set by Google to distinguish users.
_gat_UA-*	1 minute	Google Analytics sets this cookie for user behaviour tracking.
_gat_UA-24477138-6	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
hubspotutk	session	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.
km_ai	5 years	No description available.
ln_or	1 day	Linkedin sets this cookie to registers statistical data on users' behaviour on the website for internal analytics.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
NID	6 months	Google sets the cookie for advertising purposes; to limit the number of times the user sees an ad, to unwanted mute ads, and to measure the effectiveness of ads.
test_cookie	15 minutes	doubleclick.net sets this cookie to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_cfuvid	session	Description is currently not available.
AnalyticsSyncHistory	1 month	No description
cf_clearance	1 year	Description is currently not available.
cp_session	1 month	No description available.
km_ai	5 years	No description available.
km_vs	30 minutes	No description available.
kvcd	session	No description available.
li_gc	2 years	No description
SSN-agxzfm1ldGFjZG4taHJyDwsSBU1lZGlhGInLpo0EDA-1	30 minutes	Description is currently not available.
SSN-agxzfm1ldGFjZG4taHJyDwsSBU1lZGlhGLmWxfEBDA-1	30 minutes	Description is currently not available.
SSN-agxzfm1ldGFjZG4taHJyDwsSBU1lZGlhGPnS2oAEDA-1	30 minutes	Description is currently not available.
VISITOR_PRIVACY_METADATA	5 months 27 days	Description is currently not available.
visitorId	1 year	No description

Captions For Live Streaming – Accuracy and Cost

Promoting Accessibility

Accuracy of Live Captioning Solutions

Word Error Rate (WER Model)

NER Model

Cost of Live Captioning Solutions

Leave A Comment Cancel reply

Popular Posts

Product.

Solutions for.

Features.

Other.

Resources.

Company.

Captions For Live Streaming – Accuracy and Cost

Promoting Accessibility

Accuracy of Live Captioning Solutions

Word Error Rate (WER Model)

NER Model

Cost of Live Captioning Solutions

Share:

Leave A Comment Cancel reply

Popular Posts

Product.

Solutions for.

Features.

Other.

Resources.

Company.