Streaming Forum: Captioning for Streaming Video Still a "Wild West"

Captioning video on demand “is a wild west,” at least in the U.S, where the service is mandated by law and therefore a key topic for content owners and their partners to get their heads around.

Delivering a measured analysis of the issues, Telestream product manager Kevin Louden captioned his own session, "Practicalities of Putting Captions on IP-Delivered Video Content" with the question: "Can anyone see your subtitles?"

Louden began by pointint out that there are legal, moral, and business reasons to make sure your content is captioned.

“The 21st Century Communications Act in the U.S. mandates that content previously broadcast or intended for broadcast have captions to it,” he explained. “This comes into effect in stages between now and 2016.

“Even if you don't do it by law [no other region of the world has quite the same legislation] some people say it's simply the right thing to do, and from a business perspective you can broaden audiences for your content by reaching out to multiple language groups.”

So how is it done? Just as there are lots of different video and audio formats for streaming and progressive download protocols there are lots of caption file formats for video on demand, the main ones being W3C TT/DFXP and WebVTT/SRT.

The former is an open standard which contains lot of information about position, font size, color, and so on for a rich presentation of the information and is “potentially very complicated,” he said.

WebVTT/SRT, on the other hand, is a text-based format native to HTML5 video tags, “very simple in its current iteration” but with little or no control of presentation features in the file.

“This is what people cobbled together before there were any standards in place, and because of that there are a lot of entrenched workflows,” Louden said.

To smooth the multiplicity of formats, two leading standards bodies are attempting to create a universal file interchange format, as a sort of mezzanine or master level.

SMPTE 2052, being proposed in the U.S., is an XML-based time text file which emerged from the Act so that content owners or their partner organisations could create deliverable formats from broadcast content for IP distribution in all its streaming media end user forms.

In Europe, EBU-TT is a similar proposition and a subset of the TTML format, for use as a universal handoff.

For organisations wanting to generate captioning information for linear video on their own websites there are several options. JW Player, for example, has in built support for WebTT, SRT and DFXP while Flow Player supports W3C TT, SRT

Numerous video encoding tools, perhaps already in situ at a facility, contain subtitling and captioning capabilities for translating between formats.

Alternatively one can employ graphical overlays or physically burn the subtitles onto the picture, a pracitce which is still remarkably common, reported Louden. “You don't need special players or sidecar files, but obviously there's not much flexibility.”

Charging a third-party service provider is a useful way of delegating the problem but, says Louden, “in theory you hand over your master SMPTE TT or EBU TT safe harbour file as the interchange format, but the reality is that people are used to their own existing profiles and will request an SRT, WebVTT format since this is the way it's always been done.”

Turning to adaptive bitrate provision, Louden noted that the main ABR formats cater for different captioning files.

The HLS specification for iOS devices contains a means of embedding 608 captions in a video's MPEG headers, while Smooth Streaming and HTTP Dynamic Streaming both support the sidecar formats DXFP and TTML (useful for repurposing linear and non-linear VoD). Where MPEG-DASH fits into this equation is up in the air.

Louden pointed out a couple of bumps in the road for anyone looking to caption their content, which included taking care of rights, especially when repurposing legacy broadcast content.

“If you sent the work out to a caption house then beware that many of them work on individual negotiations, so while you may have a licence to broadcast that information you may not have the web rights for it,” he advised.

“Also be careful editing content," he said. "Any retiming of the content will have a knock-on to the timecode-synced caption information. You have to be sure when you do your format translation that the captions are retimed too, perhaps manually.”

The demand for a universal captioning standard was agreed on by delegates in the room, but no one really believed that a standard could be agreed or made to work in practice because of commercial pressures among competing vendors.

By way of addendum: Louden noted the little differences in definition between the two continents.

“In the U.S. 'captions' display text and other sound information for the hearing impaired," he said. "Subtitles are translations to different languages, whereas in Europe both of these things are commonly referred to as the same thing—a subtitle.”

To view Louden's full presentation, watch the video below.

Video Platform Video Management Video Solutions Video Player

Streaming Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues