Upcoming Industry Conferences
Streaming Forum [27 February 2018]
Content Delivery Summit [7 May 2018]
Streaming Media East 2018 [8-9 May 2018]
Live Streaming Summit [8-9 May 2018]
Past Conferences
Streaming Media West [2-3 November 2017]
Live Streaming Summit [2-3 November 2017]

Buyers' Guide to Content- and Context-Aware Encoding 2018
A new generation of encoders looks at the context of content to deliver better video playback and higher efficiency. Here's what publishers need to know about CAE.

This Buyers' Guide article is not about choosing encoders, but rather about understanding a new breed of encoding solutions that leverage computer vision and machine learning to both reduce the number of encodes necessary for smooth playback and to make the overall encoding process more efficient.

This approach is typically abbreviated CAE, which can either stand for content-aware encoding or context-aware encoding. And, while the concept of CAE has been 20 years in the making, it seems to have finally gained enough traction—and demonstrated enough real-world benefit, at least in initial testing—to warrant closer attention for those considering various encoding options for 2018.

What Does CAE Replace?

The short answer is tons of guesswork and numerous encoding cycles. Here’s why:

From the beginning of streaming, there has been an understanding that not all content needed to be encoded the same way: whether it’s sports content versus talking-head content, or high-action content using a handheld camera versus serene scenery footage captured from a locked-down camera on a tripod, content types are as varied as the cameras used to acquire them.

In the early days, these differences were addressed using various codecs: Indeo was a good general video codec, Apple and Intel both had specific animation codecs, and MPEG-1 and MPEG-2 were good for full-motion action. Encoding experts needed to choose from a dozen or so codecs, deciding which struck the best balance between quality and speed while also keeping in mind the intended playback bandwidth and key software or hardware players.

Even when the dust settled and the industry sorted out the last few proprietary codecs from Microsoft and Real, settling on the MPEG- and ITU-approved joint video codec (JVC), also known as MPEG-4 Part 10 or H.264, the use of a single-codec solution at a set bitrate wasn’t necessarily the right choice if one wanted to avoid buffering on intermittent cellular data or even Wi-Fi networks.

The temporary solution adopted by the industry hinged on encoding content at varying bitrates. This is generally known as the bitrate ladder, with each higher rung being a higher resolution at a progressively higher data rate.

Playback of each rung on the bitrate ladder is determined based on near-real-time feedback from the end user’s device, which sends back confirmation each time a portion of the stream (known interchangeably as a “segment” or “chunk”) is received. If the confirmation takes too long, the media server assumes the cause is network congestion and steps down one or more rungs on the bitrate ladder, sending lower-bandwidth segments for the next portion of the stream.

This approach, though, has led to over-encoding. The number of rungs that need to be on the bitrate ladder is always up for debate. Some content types need more rungs to ensure that the visual difference between segments for two different rungs is minimized for the end user’s viewing experience.

How Does CAE Work?

A natural step in the progression toward better encoding approaches was to identify the overall type of content of a particular title (be it an episode in a TV series or a stand-alone movie) and then choose a set of encoding parameters on a per-title basis.

That’s all well and good, but it’s only a small step on the journey to encoding nirvana. Why? Because everyone’s aware that even a single content title contains multiple types of content: action, talking heads, environmental shots, and everything in between.

A few innovative souls in the industry tried a radical approach a few years ago—using multiple codecs within a single title, with the best codec chosen for each shot or series of shots, known as a scene.

The problem with scene-based encoding isn’t just that the end users’ players need to be able to switch seamlessly between codecs—a herculean feat that proved to be the rather quick undoing of the radical approach mentioned above—but also that it takes an inordinate amount of manual labor to choose the best codec for each scene.

CAE, on the other hand, uses machine learning to compare content against known parameters for a given device and/or media player type. These parameters, coupled with the anticipated bandwidth and optional data regarding the average bandwidth across an over-the-top (OTT) operator’s viewing footprint, allow the CAE approach to take a significant amount of the guesswork out of recommending the bitrate ladder.

In some instances, there will be fewer rungs on the bitrate ladder (e.g., some ladders may have wider “spaces” between the data-rate rungs, since the content does not vary perceptibly across wide bandwidth ranges), but in other instances there may be less “space” between data-rate rungs, when even a few hundred kilobits per second may reveal perceptual visual differences between rungs.

Related Articles
HEVC is barely out of the gate, but the race to achieve better quality and lower bandwidth might soon leave H.265 in the dust, according to speakers at a panel at this year's IBC
The benefits of per-title optimization aren't just for the major players, anymore. Streaming Media reviews the first solution for smaller content owners and finds the results promising.
New offerings in cloud video encoding let companies both large and small find the option that makes the most sense for them.
If you're not already using per-title encoding, it's time. Here's a guide to choosing the tool that's best for you.
While nearly any encoder can connect to any streaming service, some encoders make it easier than others. Here's how to choose the right tool for the job.