Encoding & Transcoding 2018: Part 1
"Great floods begin with the pitter-patter of tiny raindrops, and so it is with video on the web. The tiny, jerky videos that we see on websites today will soon become a flood of full-frame, full-motion video integrated into many, if not most, corporate and entertainment websites…”— Streaming Media Magazine, August 2000
The key takeaway from this year's International Broadcasting Convention (IBC) in Amsterdam was that the flood of streaming video is finally flowing.
With billions of HD video, streaming-enabled devices in the hands and homes of consumers, and high-speed broadband a fact of life in most parts of the world, over-the-top (OTT) video streaming is now understood—even by traditional over-the-air broadcasters—as the path to the video future. Advertisers are following the trend, too; Magna’s US Ad Spend Forecast, estimates OTT annual ad spend will grow 40% this year to reach $2 billion by the end of 2018.
Buffering-related video freezes have become increasingly rare, and talk at this year’s IBC in September was of more nuanced challenges, like how to reduce latency or how to use per-title or content-aware encoding (CAE) to minimize bitrates, thus saving millions of dollars in bandwidth and infrastructure costs.
Video engineering teams continue to improve the consumer's quality of experience (QoE) while minimizing required resources. Morten Rasmussen, Cisco senior product manager for video compression, cites an example: "A few years ago, we were not able to do even one Ultra HD encode of HEVC on a single server. Today, we can do two HEVC encodes of Ultra HD on an off-the-shelf, non-accelerated pure CPU."
These teams are being challenged to support a growing variety of streaming video use cases, including:
- The proliferation of 24/7 linear channels
- Sports, music, and other events drawing massive audiences
- Vast VOD offerings from Netflix, Amazon Prime, Hulu, and other providers
- eSports events drawing millions of viewers
- 8K UHD programming in Japan and elsewhere
- Video content in emerging markets with 2G or 3G networks
While the streaming landscape has changed in dramatic ways, some current challenges, such as codec uncertainty, look familiar. Before we examine how some vendors are addressing those challenges, here’s a quick refresher on the technologies that define encoding, transcoding, packaging, and delivery in late 2018.
Adaptive bitrate encoding (ABR), through which encoders generate a "ladder" of resolutions at various bitrates from which device decoders "choose" the bitrate that delivers the optimal QoE, has become a standard in our multiscreen world.
Today, almost all encoding companies generate the majority of their encoded output as H.264 ABR files.
According to the 2018 Bitmovin format report, and anecdotally at the IBC, HEVC (High Efficiency Video Codec), or H.265 accounts for less than 10 percent of encoded files. HEVC can achieve the same QoE with smaller files, which cost less to deliver and store. Thus, the deployment of HEVC is largely driven by potentially significant savings on bandwidth and infrastructure costs. HEVC (or a similarly efficient codec) is also required to generate reasonably sized 4K and 8K UHD (Ultra High Definition) files.
However, lingering uncertainty about future royalties for HEVC has limited adoption and has precluded some companies from using the codec at all. For many companies, it's not that HEVC royalties are too high. It's that they are too uncertain.
AV1 is a codec developed by the Alliance for Open Media in part to alleviate the royalty problem faced by HEVC. It is still early days for AV1, and while many companies at IBC are developing AV1 codecs and are bullish on AV1’s potential, the codec’s slow performance to date has dampened enthusiasm.
“For AV1, you need hardware decoding in the [phones]. Software playback just won’t work - it would drain the battery. And hardware decode isn’t expected until 2020," says Dror Gill, Beamr's chief technology officer. "The second thing about AV1 is encode performance. Today, it you try to encode an AV1 stream, it’s 100 times slower than encoding to HEVC in software. Some people say that in a few years it will only be 5-10 times more complex. That's still too complex. Why would I spend 10 times as much to encode AV1 when I get maybe 20 or 25 percent gain in compression efficiency?”
VP9, developed by Google, is enabled in billions of devices. Every Android phone and Chrome browser offers support for VP9. With much of VP10's technology being incorporated into AV1, some think of VP9 as "AV0."
“Look for AV1 to have meaningful market share in 2021 or 2022. Up until that point, all the reasons why people might consider AV1—compression gains and the lack of royalties—are available today with VP9,” says Oliver Gunasekara, NGCodec's CEO.
Although few people are currently using VP9, NGCodec sees opportunity for the codec using FPGA (Field Programmable Gated Array) accelerated appliances. Gunasekara notes, "Until we came along, there was no live VP9 encoder. The software encoders for live performed worse than H.264."
Shawn Hsu is Product Manager at Twitch, the leading live eSports and video game streaming service that was bought by Amazon in 2014 for $970 million. He, too, is bullish on VP9 for the short term.
“We believe VP9 provides better quality at the same bitrate compared to H.264, and this will help us deliver better quality content to our users," says Hsu. "VP9 is also widely supported on browsers such as Chrome and Firefox. A large percentage of Twitch users are on browsers on desktops ... VP9 is not supported today, but we’re going to be launching it soon ... We need something that we can use until AV1 is ready to go [in the next] two to three years."
VVC (Versitile Video Coding) will likely be the next generation after HEVC, which means its deployment will be quite a way down the road. While VVC may include some IP that could allow it to achieve better compression than AV1, the thorny question of royalties remains.
“It’s okay to pay royalties as long as you know how much you need to pay and when. With H.264, it was very clear how much you need to pay, there was one body collecting all the royalties, and this became the world’s most prominent video codec," says Beamr's Gill. "The same can happen with VVC if they get their act together before releasing the standard.”
Contribution files are high-quality, low-compression files that enter the workflow in the early stages, before content is transcoded into ABR files for distribution. JPEG2000 is one of the few true royalty-free standards, and this is one of the reasons it is ubiquitous for contribution encoding. Also, as Comprimato founder and CEO Jiri Matela notes, "Cinematographers are obsessed about quality, and JPEG2000 can preserve quality very well."
A few years away are two new formats: JPEG XS (a production and contribution format) and High-Throughput JPEG2000 (HT-J2K). With HT-J2K, throughput gains on the order of 10x or more have been touted relative to J2K. Still unknown is whether JPEGXS will be royalty-free. Comprimato’s Matela anticipates “It will be another codec battle [between JPEGXS and HT-J2K], but in the contribution space.”
Until CMAF (Common Media Application Format), encoders have had to produce files conforming to at least two streaming formats, Apple's HLS and the newer MPEG-DASH. Most encoders also support Microsoft's Smooth Streaming protocol. (Adobe's HDS is pretty much out of the picture.)
Many of the presenters at IBC were hopeful that CMAF would not only minimize the number of streaming formats they'd need to create, but also would also help to standardize DRM encryption protocols. Ultimately, as Encoding.com president Jeff Malkin notes, "CMAF is helpful to our clients in minimizing the number of output packages [they need to create and deliver].”
Per-title, content-aware, or content-adaptive encoding (CAE) was a major topic of conversation at IBC, but is really just another way of saying, "make it high quality using as low a bitrate as possible." In reality, this means using low bitrates when there is little movement or detail in the content, and then upping the bitrate when there is a lot of motion or detail, like in sporting events on grass.
For video on demand (VOD), that can be accomplished by analyzing the content in one pass, recording which scenes or segments require more bitrate to achieve high quality, and programming the encoding process accordingly.
CAE for live is a different animal, and very difficult to achieve, given that whatever you're doing on the encoding side, the show goes on in real time. Virtually every encoding company in the live space is seeking to innovate their way to deliver quality live content at the lowest possible bitrates. Some look 50 frames ahead to inform their encoding decisions and live with a little bit extra latency. Others analyze 90-120 seconds in advance and then gradually adjust their encoding algorithms. Yet other encoders analyze frame-by-frame and adjust their encoding algorithms from that analysis in real time.
Live CAE is where some of the most intense head-scratching in the streaming industry is occurring, and where some of the most significant financial savings (on CDN bandwidth costs) might be found.
If OTT's primary competition is cable and over-the-air broadcasting, then latency is OTT's most pressing issue. The most frequently told (and perhaps apocryphal) story at IBC was of the family watching the World Cup OTT and hearing their neighbors watching over-the-air cheer a goal 30 seconds before it showed up OTT. OTT will have truly arrived when it is the OTT family that is cheering first.
"An ABR normal delivery today is about 40 seconds. We’re looking at how we can get that down to 6 seconds, otherwise we’re not really competing, streaming-wise, with the cable and the satellite," says Cisco's Rasmussen.
Some compression engineers see potential reduction in latency through chunked transfer encoding, a streaming data transfer mechanism available in HTTP 1.1.
UHD is a technology driven in large part by the TV manufacturers. 4K TVs are widely available in the U.S. and Europe, but encoding companies at IBC said interest in 8K UHD so far is limited to Japan and some VR applications. Many, if not most, companies felt that HDR (High Dynamic Range), such Dolby Vision, at 1080p provided as good or better QoE as UHD, without the elevated bandwidth requirements.
AI and Machine Learning
"Machine learning" and "AI" were among the most popular buzzwords at IBC, with a variety of suggestions as to how they might most effectively be employed.
“We use AI to estimate the required CPU capacity for certain processing tasks," notes Cisco's Rasmussen. "For example, when you need to do an encode of an HD, how big a CPU do you need? Or, if I have a given CPU capacity, how many services can I process on it?" Minimizing wasted CPU capacity on a massive scale can translate into serious money.
“Machine learning plays a significant role in recommendation engines," says Tony Jones, MediaKind's principal technologist. "For example, if many people who watch movie A also watch movie B, then that can assist a recommendation engine to provide much more likely content to be chosen by a subscriber. That fits into our TV platform business."
“We can also use [machine learning] to learn how to clean up artifacts from incoming video," adds Jones. "The machine learning “learns" how to wake the filters that, for example, recognize edges. It’s all based on training data. If you’ve got the right training data, you can do something.”
Continuing our look at the major players in the encoding and transcoding space, we look at what ATEME has been up to.
We wrap up our survey of the encding and transcoding landscape with looks at Verizon Digital Media Services, Media Excel, Comprimato, Elecard, Capella Systems, Epic Labs, EuclidIQ, and NGCodec.
We continue our survey of the encoding & transcoding market with looks at Encoding.com, Bitmovin, Brightcove, Beamr, Synamedia, and Ericsson/MediaKind
Now widely available, per-title encoding makes whatever codec publishers are already using more efficient by creating a custom optimized encoding ladder.
An insightful new service called Mux Data makes quality of experience monitoring and analysis easy. This illustrated guide explains how to use it when diagnosing problems big and small.
If you're not already using per-title encoding, it's time. Here's a guide to choosing the tool that's best for you.
A new generation of encoders looks at the context of content to deliver better video playback and higher efficiency. Here's what publishers need to know about CAE.
HEVC is barely out of the gate, but the race to achieve better quality and lower bandwidth might soon leave H.265 in the dust, according to speakers at a panel at this year's IBC