December 31, 2010
By Jan Ozer Contributing Editor
Featured Articles

Adaptive Streaming in the Field

VBR or CBR?
My knee-jerk reaction here was to use constant bitrate encoding because the consistent stream should be easier to distribute, particularly in an adaptive setting where the viewer is often retrieving 2- or 3-second chunks of video at a time. This was certainly the view of Microsoft’s Zambelli, who said, “The reason I prefer CBR for adaptive streaming is because it exhibits less oscillation in data rates, making fragment sizes more consistent and consequently making it easier for client heuristics to accurately estimate bandwidth. VBR could work, but one has to be careful with setting average/peak/buffer values in order not to cause issues on playback.”

In practice, all live Olympic streams were CBR with a 5-second VBV buffer, while on-demand streams were VBR constrained to 10% over the target bitrate. Sunday Night Football, which is all live, uses the same CBR configuration as the Olympics.

Deutsche Welle is exclusively CBR, while Harvard is CBR except for the highest quality stream. Indiana University used constrained VBR, limiting the 1500Kbps stream to 2Mbps, the 750Kbps stream to 1Mbps, and the 250Kbps stream to 500Kbps (see Table 2).

Ozer Adaptive Table 2

According to Goldstein, MTV Networks constrains the peak data rate to be no more than two times the average and monitors the buffer in the player to make sure that it can tolerate a brief data spike. Though this obviously works well for MTV Networks, Harvard’s Bouthillier cautions that many of the encoding tools that he’s tested don’t honor VBR constraints, which is why he opts for CBR in all but the highest data rate connections.

What’s the Keyframe Interval?
For single file streaming, I recommend a keyframe interval of 10 seconds with keyframes inserted at scene changes. With adaptive streaming, the rules change, as shown by the results in Table 3.

Ozer Adaptive Table 3

Checking our technical resources, Adobe and Microsoft recommend using a single keyframe interval for all files produced for an adaptive streaming package. Apple does the same, as you can see in Table 4, where the keyframe stays at a consistent 3 seconds, irrespective of the file frame rate.

Ozer Adaptive Table 4

Though Turner didn’t provide stream configurations for this year’s PGA, it did supply last year’s HTTP Live Streaming configurations for an article that I wrote last year, included as Table 5. As you can see, it also changed the keyframe setting for the lowest quality stream from 2 to 3 seconds. (For more on Turner's streaming of the PGA tournament, see "Streaming Spotlight: Peter Scott.")

Ozer Adaptive Table 5

I’d be remiss if I didn’t mention another valuable resource for those attempting to stream adaptively, Bouthillier’s “How to Do Dynamic Streaming With Flash Media Server." In his article, Bouthillier stated:

Keyframe Interval: When the server gets a request to switch to a different stream, it will try to make a smooth switch by lining up keyframes in the two files. If your keyframes are too far apart, the server can force a switch, but it won't necessarily be a graceful one. The best bet is to set a fixed keyframe interval which is the same for all streams to guarantee that there are enough keyframes to facilitate stream switching.

Clearly, for Silverlight and Flash, choose a relatively short keyframe setting such as 2 seconds and apply it consistently across all streams. When working with HTTP Live Streaming, I’d follow the recommendations in the Apple Technical Note in the absence of a very good reason not to.

There are several other points about keyframes. First, you should disable keyframes at scene changes if it’s an option in your encoding tool. You want keyframes at regular intervals and at the same location in all files, and disabling this option helps ensure that this will happen.

Second, for those encoding tools that allow you to select which keyframes will be IDR keyframes (Telestream Episode comes to mind), make all keyframes IDR keyframes. I don’t have the space (or time) to define an IDR keyframe, so just humor me on this one.

Finally, for technologies that divide files into chunks or segments (such as Apple’s HTTP Live Streaming), the keyframe interval should either be the same as the segment size (e.g., 2 seconds for both) or be divisible into the segment size. For example, in its white paper on encoding for HTTP Live Streaming, Akamai recommends a segment duration of 10 seconds and keyframe interval of 5 seconds. That way, each segment starts on a keyframe.

Audio Data Rates
The common wisdom regarding audio is best summarized by this quote from Adobe’s Dynamic Streaming article: “Keep the audio bitrates and sampling rates the same to provide a seamless switch between them as the stream plays. Switching between two streams with the same sampling and bitrates will allow seamless transition. Switching between incompatible bitrates or sampling rates is possible, but could result in a slight audible ‘pop’ sound at the time of the transition.”

Along the same vein, Apple recommends using a single 40Kbps bitrate in all adaptive encodings, while Expression Encoder 4, Microsoft’s primary Smooth Streaming encoder, limits you to a single audio configuration for all files in the group.

Not surprisingly, most of our respondents—including Microsoft, MTV, and Deutsche Welle—did use consistent audio parameters, though two respondents, Indiana and Harvard, varied the data rate of the stream with no reported deleterious effects. More specifically, Indiana used 128Kbps audio for its two highest quality streams and 96Kbps for the lowest quality stream, while Harvard implemented three levels—32, 64, and 96Kbps—on its five on-demand streams and in the live streams used 32, 64, and 128Kbps (see Table 6).

Ozer Adaptive Table 6

As you can see in Table 5, when producing the 2009 PGA Championship, Turner Broadcasting used 16Kbps audio with its 126Kbps video stream and 40Kbps with its two higher quality streams, again with no report of popping noises upon stream switching. Getting back to our current table, MTV produces audio using two different parameters but doesn’t switch between them adaptively.

Potential popping aside, varying audio quality to match video quality in the adaptive streams provides the best possible viewing experience. Several of our respondents are successfully using this strategy now; if you want to go this route, give it a shot, but be sure to test on a range of relevant viewer platforms.

Previous Page Next Page