CDN77's Juraj Kacaba Talks Low-Latency Streaming and the Edge
Juraj Kacaba, Head of Client Implementations at CDN77, sits down with Tim Siglin, Founding Executive Director, Help Me Stream Research Foundation, and Contributing Editor, Streaming Media, to discuss low-latency streaming and edge delivery in this interview from Streaming Media East 2023.
Siglin begins the discussion by mentioning that he saw Kacaba give a presentation at Content Delivery Summit which caught his attention. Kacaba spoke about ways to lower latency without destabilising the stream. “One of the problems we've had in the industry for a number of years is HTTP delivery is great from one standpoint, but when you do fragmented MP4, you have to wait for...the GOPs and all that kind of thing,” Siglin says. “I wrote early papers around Smooth Streaming, so that's always been a classic problem. And one of the issues that I've talked about years back was that we were painting ourselves into a corner where we had to wait for 30 seconds to do delivery. So you talked about ways to potentially solve that problem.”
“Usually, what we do in these use cases is to work with our clients to understand what they need,” Kacaba says. “From the beginning, we've been like, ‘Okay, so we need to stick to HTTP because that's the most commonly used format.’ And when we want to use this solution for things like social media, or popular streaming apps, we'll need a solution that's supported on all of the common iPhones and other devices. Which is the reason why we choose HLS or MPEG-DASH. And then we ask, what's the normal way to lower latency, to manipulate the size of the chunk itself? There's only a certain threshold--you can cut it, and then it becomes unstable.”
“People have tried to take it below that threshold, and it results in really bad video,” Siglin says.
Kacaba says that CDN77’s approach is, “Let's generate the chunk and have a look at how you can treat the chunk on the edge. So, more and more, you learn. Let's say you have the chunk of five seconds. Because it's still a chunk, it’s a piece of static content that you deliver on the CDN as any other static content. But what you can do is to cut the chunk even into small HTTP chunks, and you push that to the server, even before the client [requests it].”
“So you’re pre-populating,” Siglin says.
“Yes,” Kacaba says. “We already push it to the edge. Then also, every small HTTP chunk has information about the next HTTP chunk. So there's a constant HTTP push coming from the origin. We do it to ensure that whenever the player requests the chunk, we already have at least part of it cached, and they're ready to deliver. And of course you need to fit all of this into the four or five seconds, depending on the desired latency.”
“A couple of Streaming Media shows back, we had Roger Pantos come and talk about their low-latency HLS,” Siglin says. “Is what you're describing somewhat similar to that, or essentially you can use the standard HLS or standard DASH segmentation?”
Kacaba says, “One of the things that we wanted to focus on is to purely use what's market standard, so that it's standardised HLS, or MPEG-DASH, because it's packaged by CMAF. So, when you want to deploy these kind of solutions in the market, you need to ensure the availability on the player side and on the end user side, but you also need to ensure that it is standardised and it is easy to deploy for your clients, who are the providers of these apps, and this is the way this is for us to go.”
“So one of the issues of with looking low-latency HLS was you'd have to essentially convert everything over to a low-latency HLS,” Siglin says. “What I sort of hear you describing--and correct me if I'm wrong--is you could deliver it if you had a device that couldn't necessarily receive portions of the chunk, you might have to send it as the full chunk where the other ones who are more aware of that have the ability to repack it.”
“Yes, it is like that,” Kacaba says. “And that's also why we like to use the standardised format. There's a really straight pathway. Even when the device is not capable of low-latency streaming, you can really switch to the normal standardized and supported format.”
“Now talk about [request] coalescing a little bit,” Siglin says. “I don't think a lot of people are familiar with that, and that was something you alluded to.”
“So, coalescing is the kind of thing that has been around…for Varnish-based CDNs for quite a while,” Kacaba says. “The main and first issue for us is that we work with NGINX request coalescing. It is essentially not an existing thing in the normal setup. So that was the first big obstacle we needed to make it through, and we needed to code it from scratch. We managed to do it, so now we can utilise it even on NGINX. That was one of the first steps we actually took. What's the market standard, what is used, what we can utilise, and how we can tweak it on our side? The reason we try to use all the market standards is so you can have your own encoders and your own stack, and we can just fit our CDN into it and help you lower the latency on the network or on your side. And then we also can offer you our own built-in encoding and video processing solution to it. So it's really just the aim to be as open to implementation as possible. You can implement just the CDN side to lower the latency you have, but also handle the content that you already produce on your side. Also, if you are open, you can use our own encoders and have more ways to tweak it and fine tune it to do a specific use case.”
“Are you primarily focused on live delivery?” Siglin asks.
“Well it's sort of half and half now because one of our main use cases is social media,” Kacaba says. “And what we see right now on social media is more and more live content. And more social media apps invest in live streaming as a whole.”
“And because you're doing social media, of course, one of the inherent issues there is you're delivering mostly to mobile devices,” Siglin says. “As opposed to main power devices. One thing that’s always intrigued me is we've tried to lower latency both with HLS, and even the conversation around DASH is when you're talking about these deltas, you're also delivering on an intermittent network. So what happens if you miss one of the deltas in the middle of it? How does the system recover? Does it have to wait for the full segment length of time to get the next set of packets?”
“The way we treat it is, when you have request coalescing, of course it is fully redundant We like to be one step ahead. So we utilise our private backbone to get the chunks where they need to be. And then of course you have multiple layers of cache, so even if a connection on any of these steps falls down and recovers, it's still really close to the next step. So in this way you have a more redundant stack.”
“So essentially you've verified that it's made it out to the edges of your path before you then release,” Siglin says.
“Exactly,” Kacaba says. “With HTTP push you can actually verify you have the segments where they need to be before they're served to the customer.”
“And how much extra time does that add?” Siglin asks. “Because obviously one of the things with TCP/IP, with that handshake confirmation is it takes it a little while to get that.”
“Wen it doesn't move in the ultra-low latency sub-second space, a few milliseconds here and there is exactly the buffer that you need for these types of steps,” Kacaba says.
“And you're staying longer than the windowing issue?” Siglin asks.
“Even with WebRTC, you still have this tiny buffer, but you always need to have a buffer,” Kacaba says.
“That's something we've learned over the decades, is play it from the buffer,” Siglin says. “Don't try to actually get it in and go with it right away, especially if they get backwards to each other.”
Kacaba says, “And you mentioned mobile devices, which is I think is a really good point. More and more people now consume content from mobile devices and over mobile networks, so jitter is quite an issue. Then you start implementing different solutions into the real world and at some point you ask yourself a question like, ‘Okay, so how little of latency is truly latency?' You see how the end users actually use the service, and moving in the space of four to six seconds is usually the ideal space for social media. Because one thing is that the content producer--which is some tupe of person on social media--he wants of course to talk in the chat, but also, social media needs to moderate the chat to have some buffer and to make the content safe.”
“It's the classic US broadcast dump button where you make sure something [inappropriate] is not done [live],” Siglin says.
Learn more about low-latency streaming and the edge at Streaming Media Connect 2023.