Struggling With Capacity and Congestion in the Digital Video Age
For the past 20 years in the streaming industry, and in particular the content delivery network sub-sector, pundits and vendors have often posited that overcoming internet congestion/capacity is a key gating factor to the success of its services and models.
I produced some research several years ago highlighting that in any year when telecoms’ capacity was increased, the following year (allowing for fiscal drag) often yielded significant merger and acquisition activity in and around the CDN ecosystems. The increased capacity reduced the scarcity of supply, and this, in turn, reduced the price point that could be achieved. Consequently, it has always been in CDNs’ interest to promote this idea that the internet is highly congested—it’s simple supply-and-demand economics. (See Figure 1 for graphic representation.)
Figure 1: Lit fibre tracked with Closures and Acquisitions in the CDN market. The left vertical axis shows the numbers, closures, (mergers) and acquisitions; the right shows the amount of lit fibre in Gbps.(© Dom Robinson - id3as.co.uk - with permission)
As broadband opened up on-demand delivery at high quality to consumers, there was further consolidation as telcos lit up more IP and the CDN-software layer adapted to ensure a high-quality live service could be offered.
Even today, live video delivery is still maturing, and while small live audiences can be offered fantastic, reliable quality, as those audiences head toward the millions of concurrent viewers, there is no live service delivery platform which can—with certainty—meet the demand and expectation of all those viewers. This is less due to capacity as due to the fact that the IP networks are by nature “best effort” networks: it is impossible to deterministically offer a real service level agreement by any means other than significant over-provisioning of capacity.
It is worth noting at this point that codec innovation is showing diminishing returns. Yes, incremental improvements do happen, but they are further and further between, now taking several years rather than months to significantly improve compression. And with time, the steps are nonetheless getting smaller.
Distributed infrastructure as a concept has evolved too. Where the CDNs (which were arguably the first large “clouds”) used to market themselves on the high volumes of highly distributed points of presence (PoPs) they offered, now just a few data centres are so incredibly well-connected, they can normally meet any latency and capacity demands placed on them—certainly for the majority of content— and can do so with orders of magnitude fewer servers and PoPs than they used to. Where Akamai always used to market its advantage as the fact that it had thousands of PoPs, today we see a huge amount of content delivered from public cloud hosting centres with a few dozen locations, and with a delivery capability that surely must be “good enough,” since large consumer audiences are adopting such services in herds.
There are, of course, issues when it comes to shipping vast quantities of high-bandwidth video from a remote data centre into the operator network, and so operators have also partnered with CDNs or have rolled their own internal CDNs to ensure that peering and transit/interconnect links do not become saturated and that a close attention to the operator’s own customers demand is under the operator’s own control.
There is still little doubt that no CDN, no public cloud, or no operator is quite ready for the traditional broadcast networks to be turned off and for all that demand to be routed through their services. Yes, platforms such as YouTube and Netflix deliver vastly more content than any one traditional broadcaster does. However, these platforms distribute that content to hundreds, if not thousands, of operator networks, with those operators usually hosting media/application serving software internally that helps scaling. However, despite the fact that the subscriber/last mile network operators internally typically do have capacity for the telecoms’ connectivity from their own core to edge, few operators are yet in a position to scale the application/media serving to each and every user. It would seem that that is a logical place to sell in “edge” media servers—as so many CDNs are doing.
So there are some challenges to go, but these are challenges that are largely cost-/benefit-driven rather than technical.
With microservice architecture, many operators are adopting technical strategies which allow them to flex their infrastructure more effectively than ever before. They are no longer wasteful—few servers are left on if they are not in use—and each server can increasingly be used for any task, so cloud computing’s “spin it up/spin it down” thinking has permeated with great effect to offer much better efficiency than ever.
In the meantime, CDNs are working to position their software products and services as something that can be offered as a managed service, a platform, or a licensed product that the operators can deploy themselves. Despite this, there are still some CDNs and technology players out there who like to market with fear. They want to tell their potential customers that the internet is so badly congested, it is either about to fall over or that there is no way that customer could even think about delivering high-quality content at scale because there just isn’t available bandwidth.
Worse, they like to sell the idea that the internet’s capacity is somehow limited and there is an inflection point coming where there will be too many viewers demanding high-quality content for the internet to handle it all (“and so CDN product X is the only way you can reach those customers ...”).
That’s simply wrong, and those who promote the notion are lying to the market and are going to come undone soon.
In fact, if a CDN is selling that idea to you, then they are probably beginners in the space, since they most likely have little or no idea about telecoms. If you have one of these salesmen approach you, try a few questions about their peering, interconnects, and transit links, and ask them why their capacity is better than that provided by Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP).
In the last 2 years, interconnect between continents rose from around 200TBps to around 400TBps worldwide. So broadly speaking, the internet’s lit capacity is expected to double in the 2 years from 2016 to 2018 (see Figure 2). Check it out on Telegeography.
Figure 2. According to Telegeography, the interconnect between continents nearly doubled in the last two years, rising from around 200Tbps in 2016 to nearly 400Tbps in 2018.
However, AWS, Azure, and GCP have their own networks and are very well-interconnected with one another at strategic locations based on customer demand and current traffic flows. This has relieved some of the capacity pressures on the existing internet infrastructure, but not truly at the edge or last mile, which is where the real concerns are. The coming of 5G may or may not result in relieving some of these concerns.
Meanwhile, the private interconnects (not measured by Telegeography) between the largest network operators such as Facebook, Apple, Microsoft, Google, Amazon, and Netflix (sometimes called FAMGAN) are now typically larger than the connections between those individual companies and the internet.
I spoke with Microsoft chief technical advisor Dave Crowley at length about this topic, and he mentioned that companies like Microsoft, Google, Amazon, and others directly connect with one another in order to exchange traffic to better manage the customer experience and overall customer satisfaction. This enables better traffic flow controls and reduces congestion as well as costs while increasing throughput and the overall customer engagement, while at the same time, limiting the need to rely on the public internet.
And note that all this is just the “lit” IP capacity. Which is usually the extent of understanding that many sales guys in the CDN industry really have. The underlying dark fibre that those links are connected to can potentially carry between 8TBps and 13TBps per strand—with technology increasing that capacity frequently—to almost any other point on the planet with similar connectivity. And there are tens of strands in some fibre bundles crossing the oceans.
So you can see that the telecoms’ networks are far from operating IP services at or near capacity from the point of view of the dark fibre that the telcos own.
While it is a fact that the telcos’ lit IP capacity may at times be congested, it is also a fact that it is much easier for a telco to light more IP than it is to branch into the ever-evolving world of codecs and streaming standards way up in the application/software layer. Telcos do fibre, and operate in a world of layers 1 to 3 (physical, link, and data) for a living. These lower layers evolve much more slowly than layer 4 (application), so when they have to make CapEx investments, the familiar layers (0–3) are safe territory for them.
But all that carries Dense Wavelength Division Multiplexing (DWDM) is not gold: what is in practice actually a limiting factor for them is the power supply to the fibre’s DWDM repeaters, routers, and—further up the stack— the servers and to the consumer devices creating that demand.
Microservices architectures are central to the growth of OTT services, but they're widely misunderstood. Here's an intro to the major players.