Achieving Zero Latency for Video and Audio Is a Zero-Sum Game

Article Featured Image

“SDVoE end-to-end latency is around 100 microseconds or 0.1 milliseconds,” says Kennington. Noting how SDVoE rivals the speed of HDBaseT while also allowing content to be packetized and delivered as IP across lower-cost Ethernet switches, he adds, “SDVoE is built the way it is because that is what is required to match the video performance of a matrix switch.”

Given the advances in FPGA encoding for H.264 (AVC) and H.265 (HEVC), some in the streaming industry might argue that frame-by-frame or I-frame AVC or HEVC might work for these zero-frame latency use cases, but professional AV integrators see standard streaming video codecs as falling short of the use-case requirements.

“The SDVoE compression codec, when enabled, adds five lines of latency,” according to Kennington. “At 4K UHD, 60Hz that’s 7.5 microseconds, which blows away even I-frame only AVC/HEVC, etc.”

Kennington is correct in this regard, since the MPEG codecs inherently have been geared toward delivering bandwidth savings across multiple frames, whereas codecs designed for zero-frame latencies are designed to encode video well under 16 ms (or 16,000 microseconds).

Ryohei Iwasaki, executive director of IDK Corp. (HQ) and CEO of IDK America, a company manufacturing professional AV video gear, further explains why there’s room for more than just the standard-based MPEG codecs in the marketplace: “We are not comparing between SDVoE and H.264/265 since IDK’s thinking is that usage and purpose of those codecs are different.

“We decided to go with a 10Gbps AV solution since [a] 4K signal has 18GB,” Iwasaki continues, referring to the fact that an uncompressed 4K60 8-bit video signal is in the 14Gbps range, but rises to 18Gbps when accounting for the word-bit conversion (8b/10b) that HDMI requires for transmitting a 4K60 signal across an HDMI cable.

“We tested many other codecs’ functionalities and scalabilities for the future,” Iwasaki says, “and IDK thought that SDVoE is the one to adapt for now as it satisfies most of pro AV customers’ requirements.”

Ethernet switches are not measured in the 8b/10b word-bit conversion—in fact, a 1Gbps Ethernet switch uses 4b/5b and actually transmits at 1.25Gbps, but is marketed as a 1Gbps switch to avoid any confusion—meaning the compression is fairly light (about 1.4:1) for a 4K60 8-bit signal streamed using the SDVoE approach.

Kennington says that SDVoE also considered other codecs as it developed the SDVoE FPGA-10G phys package. “When the groundwork for what became SDVoE was laid, we did investigate the existing codecs [including the] MPEGs and JPEGs and others. What we found is that they all made too many compromises in the name of bandwidth savings.”

As Kennington explains, “The JPEG-style codecs try to make the same compromise we do: reduce compression efficiency in exchange for better latency and/or image quality. But we find they simply don’t go far enough.”

Kennington then puts a stake through the heart of the JPEG-based codec option for these high-resolution, zero-frame latency use cases by pointing out, “The original DCT [discrete cosine transform]-based JPEG suffers from ringing and block artifacts. And,” he continues, “wavelet-based JPG2000 has its own problems, especially with high-res computer graphics and certain color transitions, where luma is relatively constant and chroma is changing.”

These issues with luminance and chrominance are inherent to certain DCT encoding approaches. In fact, DCT could kindly be considered a long-in-the-tooth approach, since it dates back almost 30 years to the advent of the JPEG still-image compression.

Kennington also notes that, at least from a peak-signal-to-noise ratio (PSNR) quality metric standpoint, the SDVoE solution fares better than JPEG. “Our codec scores for PSNR are often much better than JPEG.” He gives this example: “[O]n the yacht club image we scored a 57dB, compared to 45.5dB for the highest-quality JPEG example shown.”

For traditional AV integrations that require flexibility in choosing which video source to send to one or more output monitors, the primary cost comes from an expensive matrix switch to manage incoming and outgoing video signals. AV-over-IP solutions such as SDVoE allow multicast transmission from the encoder, replacing the matrix switch with a less-costly 10Gb Ethernet switch. (Image courtesy of SDVoE Alliance.)

While H.264 and H.265 don’t necessarily suffer the same fate as JPEG, they do share similarities that may make them less than ideal for use as high-resolution I-frame codecs for the AV-over-IP integration market.

“Standardized MPEG codecs can be tuned to reduce latency, but that’s coming at the expense of image fidelity, and vice versa,” Kennington says.

Bandwidth Is Cheap

While the concept of using a 10Gbps Ethernet switch to live stream 4K60 8-bit or even 10-bit content might sound like overkill, Kennington explains the reasoning for using the codec triangle. “In pro AV, we simply don’t require the kind of bandwidth savings that interframe compression is optimized for.” He goes on to note that most AV-over-IP solutions run at a full 1Gbps or even 10Gbps versus the standard 2.5Mbps or 6Mbps for a streaming video delivery from Netflix.

Referring to the “fairly light” compression for 4K60 content (essentially a 1.4:1 compression ratio), Kennington also provides an answer to a question I’d had about video at data rates below 10Gbps: “SDVoE’s codec doesn’t even use compression unless it is required. Since a 1080p60 8-bit stream is only 3 gigabits per second, we transmit that without any loss at native data rate. Same for 4K30 at 6Gbps. We only compress signals above [the] 10Gbps raw data rate, like 4K60. And we only compress by the minimum amount required to fit into the 10G Ethernet pipe.”

That naturally raises the question of why the pro AV market has settled into the use of a 10G switch for video streaming. After all, a 10G switch is still much more expensive than a 1G switch. Kennington believes it comes down to AV integrators being able to visualize the “trade-off between image quality, latency, and bandwidth.”

The cheapest part of the overall equation for an AV integration, at least one that sits within a single physical location, such as a school or college campus, is the bandwidth. This, Kennington explains, is where AV differs from long-distance streaming: “[I]n pro AV, the latency requirements are basically fixed on a per-use-case basis. Image quality demands are going up—higher resolutions, higher frame rates, higher color bit depths—but bandwidth is unique since bandwidth on an Ethernet switch is cheap and getting cheaper. So use it!”

Kennington agrees that other approaches to moving content across an Ethernet network are valid, adding, “Far be it from me to say Netflix isn’t successful!” But he notes that these approaches “create latency penalties and compromise image quality in ways that the pro AV market cannot easily accept.”

A Middle Ground?

IDK’s Iwasaki notes there is a need for a middle ground between the very high data rate of an SDVoE codec and the typical live-streaming requirement for someone sending a stream from one city or content to another: “Some customers need to stream the video longer distance, for example from Japan to the US. In that case, the customer needs to minimize bandwidth using [another] codec like H.264/265. IDK is also preparing a unit which can bridge SDVoE and H.264 for this purpose.”

Iwasaki adds that the bridge unit is still a concept, and that—to avoid concatenation issues and maintain proper color space—the SDVoE video would be decoded back to baseband video and then re-encoded in H.264 for standard streaming delivery.

“At this year’s InfoComm,” Iwasaki says, “we are going to have a prototype concept encoder which can capture, streaming out, image from our receiver unit and control from our management system. These concepts help people who want to integrate a real-time solution and real out-going streaming signal together. The only current way to do it requires decoding the signal to baseband once [between encoding in H.264 and SDVoE]. Maybe the SDVoE Alliance will provide direct re-encode capability in the future.”

Iwasaki also points out that recording a presentation in the SDVoE codec is not yet possible, and the SDVoE Alliance’s Kennington confirms that the SDVoE codec is only for use in live-transmission scenarios. That’s where a standards-based codec like H.264 or H.265 would come into play.

“If a customer wants to have recording or network streaming capability for these signals,” Iwasaki posits, “H.264/265 will be used since it can reduce the bandwidth of the signal using high compression.”

Losing latency won’t be relevant for recorded content, according to Iwasaki, but the loss of video quality using an MPEG-based video codec will still be apparent for high-resolution content.

A New Stool?

Kennington also suggests what might be a new three-legged stool for the streaming industry to begin measuring itself against for proper balance in encoding and delivery: “Latency, price, and power consumption loom large over this discussion of quality and bandwidth.”

To get to zero-frame latencies requires an extraordinary amount of computational power, and Kennington notes that existing standardsbased MPEG codecs have price and power consumption issues beyond just the fundamental quality and latency questions.

“The computational complexity of those algorithms is also much, much higher, which has implications on cost and power consumption,” Kennington says, “especially in a real-time encoder. The only chip I’m aware of for live HEVC encode is from Socionext, costs over $1,000, and consumes over 35 watts, where our partner manufacturer end points sell for $1,000 to $2,000.” While he doesn’t want to speak to complete details in this forum, he does say, “[W]e’re more than 85% better on price and power than that.”

As we close, here’s a reminder that the AV and streaming industries are on parallel paths. In many ways, the two industries are separated only by a slightly different language and differing approaches around specific live-use cases.

The AV industry has a bit of a flawed understanding of what typical streaming—especially the classic video-on-demand premium asset that’s encoded into thousands of 2–10 second HLS segments—entails in terms of latencies. Walk the show floor at an AV industry event like InfoComm, and you’ll often hear a zero-latency proponent talk about on-demand encoding that requires racks of servers and up to a week of encoding time to “get it right” for quality encoding.

Yet the AV industry fairly questions the efficacy of H.264 and H.265 because both are based on DCT and therefore introduce a number of problems that find the codecs stepping on their own feet when trying to compete in the zero-frame latency dance-off.

Is it time for a new approach to codecs, with a single codec handling zero-frame latency for local delivery as well as scalability for very-low-latency remote delivery? The answer is a definitive, “Yes,” and we, as the streaming industry, would do well to step up our game in driving down latency, price, and power consumption in this new era of IP video delivery.

[This article appears in the Autumn 2019 issue of Streaming Media Europe Magazine as "Zero-Sum Game."]

Streaming Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

The Need for Speed: Demand for Low-Latency Streaming Is High

Low latency video streaming is on everyone's mind, but figuring out the latency your project needs—as well as how to get it—can be a slow process. Let this article be your guide.

DVB-I Promises Sub-Second Latency for Broadcast and IP

DVB will release the first DVB-I specifications at IBC in September, promising low latency as well as the ability to deploy standalone or as a broadcast-OTT hybrid

Mainstreaming Targets New Lows in Encoding and Latency

Fresh funding expected to take MainStreaming's video delivery network into new territories