How DAZN and TAG Approach Failover, Sync, and Stream Resiliency
For high-stakes, large-scale streams, redundancy is critical for ensuring smooth and reliable delivery. This means careful monitoring, sound decision-making, and seamless network switching–also known as hitless failover–and making sure streams are precisely synced so the switch is invisible to the end user. In this clip from Streaming Media Connect 2023, Mark de Jong, Chairman, CDN Alliance, discusses the challenges of ensuring stream resiliency and strategies for maintaining it with Bob Hannent, Principal Architect: Technology Operations, DAZN, and Michael Demb, VP, Product Strategy, TAG VS.
De Jong says, “If you look at the workflow up to the CDN in general, how do you really ensure redundancy, failover, [ensuring that] you make the right decision to failover at the right time and also don't have switch back and forth, which I have also seen happening in different occasions.” He asks Hannent, “Can you talk a little bit on how you do this on your end?”
“Automatic failover is something that we shy away from,” Hannent says. “It is something that you can have, but as you said, you've got to be really careful about oscillation. So if you end up in a flapping situation where you are going backwards and forwards, you really don't want to do that. But if you build resilience into your design, for instance, [and] you have highly resilient connections, you don't get that. When we're talking about flipping sources, you want to avoid that. But up to that point, you've made sure that even if there is a failure, it's resilient in itself. It doesn't have an impact. But I come from a traditional broadcasting background. I used to work at the BBC and other places where we could have – even in the analog era – failures that you couldn't even see less than one frame of glitch because things were timed and carefully synchronised. We don't really have that much in the cloud world. We do a lot with timing to ensure that it is synchronous through things, but the headend vendors have gone away as we've moved into the cloud from supporting synchronous headends in diverse geographic locations. But actually, that message is getting through now to some of the vendors that we would like synchronous headends in the cloud, which then comes back to the other problem with how you avoid coupling, so one failure in one location affects the failure in another location. But if you are synchronising, they're inherently coupled. But also, as you get that high geographic distance, how do you ensure that synchronisation is real? How do you ensure that the definition of time is the same? And so make sure that your sources of time are reliable and that you are measuring [it] accurately. Because the worst thing you could have is a stream switching between the two headends and time jumping backward and forwards.”
De Jong asks Demb, “What's your take on this from your perspective? And you do this with multiple customers.”
“I agree with Bob's statements,” Demb says. “Automation and switching, especially CDN switching, is something maybe from the not-too-distant future. If AI ML tools become smarter than humans and they can make the right decisions when it's really needed, then we'll see [more automation]. Redundancy is like a spare tire in the car. You don't have a second car with you, so you can't afford to have fully redundant everything, but you have certain spare parts in your car. So if one tire goes flat, you just switch it to another tire because tires are the ones that usually go flat, and other parts may break, but not that often. So, you need to identify what parts are more sustainable or more prone to fail. So, how do you ensure that everything stays in sync and switches seamlessly?
“Bob mentioned time synchronisation. We're working with a number of partners to help them enable better time synchronisation technologies, especially in the cloud. On-prem inside the same data center, we have very well-timed synchronisation mechanisms today. P2P doesn't work in the cloud today. So, we are working on a project with another vendor to enable P2P distribution inside cloud environments. And again, it brings me to the point: it's constant monitoring, knowing that something went wrong. It's looking for all the aspects of video quality being in the content quality. So you can make a switch between the video sources if something happens. So the content quality itself – video, audio, metadata, it is the quality of streaming, quality of delivery of service, whichever we want. And, of course, the quality of experience. You can collect that in real-time with different tools and know what exact experience your customers are having. So when monitoring tells you something, and it alerts you immediately, it is like a sensor in the car. Everybody hates the yellow engine light in the car, but everybody needs that because if something fails, you want to know about it. There are several different technologies in place to help with seamless switching, and yes, you need to have all the streams time synchronised and have the right timestamps in the streams.”
Watch full sessions from Streaming Media Connect November 2023. We'll be back in person for Streaming Media NYC on May 20-22, 2024. More details here.