’Round the Horn at NAB 2024: Video, Telestream, Phenix, Ateme, V-Nova, Twelve Labs, Norsk, and Dolby

Article Featured Image

Any NAB report is like the story of the blind man and the elephant: what you experience is what you touch, representing a fraction of the whole and perhaps not even a good sample. That being said, here’s what I touched during the show. Many of these experiences are accompanied by video that I shot of the interviews.

Videon LiveEdge Node and Max

One of my first stops at the show was at the Videon booth to see the LiveEdge Node and Max (Figure 1) as demonstrated by chief product officer Lionel Bringuier.

Videon LiveEdge Max

Figure 1. Videon’s LiveEdge Max delivers more than twice the performance of node, offers a confidence monitor, and like Node, accepts Docker containers.

Briefly, Node and Max are compact edge live encoders with the specs shown in Table 1. Node is the established product while Max is the new product with more than double the capacity plus an onboard confidence monitor.

Feature

LiveEdge Node

LiveEdge Max

Inputs

1 x 3G-SDI or HDMI

Single or dual 12G-SDI 4Kp60 inputs (w/ 16 channel audio) or HDMI

Outputs

4Kp30/1080p60

Dual 4Kp60

Codecs

H.264/HEVC

H.264/HEVC

Resolution

Up to 4K P30, commonly used for 1080 P 60

Up to dual 4K PS 60

Power

Power over Ethernet (PoE)

Power over Ethernet (PoE+)

Confidence Monitor

Not on the device, available in the Cloud

Yes, both on the front panel of the device and in the Cloud

Cloud Management

API for device and fleet management via the cloud

API for device and fleet management via the cloud

Additional Features

Docker container support for third-party applications

Enhanced processing power, Docker container support

The LiveEdge products include an API for individual device management and a cloud API for overseeing fleets of devices remotely. This dual API system is particularly useful for operations involving multiple devices across various locations, such as stadiums or event venues. Fleet management is facilitated through a cloud platform, which does not process media but offers tools for remote device supervision and control, enhancing efficiency and reducing the need for on-site management.

There are many live transcoders for event shooters, and most have cloud platforms. What distinguishes LiveEdge devices is their support for Docker containers, which allows them to integrate third-party applications directly into the hardware. Videon has a marketplace for such applications, which includes DRM from EZDRM, watermarking from Synamedia, error correction from Zixi, and LCEVC encoding from V-Nova. This lets users customize device functionality to suit specific needs and streamlines workflows by allowing direct on-device processing.

Telestream Vantage: AI-Driven Workflow Creation

My next stop was the Telestream booth for a quick demo of the AI-generated workflows in Telestream's Vantage Workflow Designer by John Maniccia, Director of Sales Engineering and Support. As you may know, Vantage is workflow-driven, so users can easily create different workflows with branching to deliver different outcomes based on file characteristics. For example, Vantage can detect whether a file is 1080p or 4K and assign it to a different encoding ladder based upon that determination.  

In the past, you built workflows via drag and drop, and a completed workflow is shown in Figure 2. What's new is the ability to type in the desired result in English and have Vantage build the workflow for you. On the upper right in Figure 2, you see the text that generated the workflow shown in the main panel.

Telestream Vantage

Figure 2. Vantage will build workflows from plain English commands. (Click the image to see it at full size.)

Given what we've all learned about generative AI over the last 18 months, this is more evolution than revolution, but it takes us one step closer to when you won't have to be a compression expert to create transcoding workflows. Good for management bad for compression experts, but inevitable.

There are still several missing pieces, like how you should configure the ladder for mobile vs. the living room, or how to choose among various codecs and HDR and DRM technologies. Still, that level of automated operation will almost certainly be included in products like Vantage or AWS Elemental MediaConvert within a year or two. Telestream gives us a first glance at what that might look like.

Phenix: Low Latency and Ring Around the Collar

The last time I heard from Phenix Real-Time Solutions was an email pitch to participate in low-latency trials performed while viewing the 2024 Super Bowl. I declined, but when I ran into Phenix COO Kyle Bank at the show, I couldn't resist asking about the result. As shown in Figure 3, the latency figures are shocking, with Paramount + delivering the lowest latency while still 43 seconds behind real time. The report also found that the drift, or ranges of lag experienced by viewers, went from a low of 28 seconds to an astonishing high of 134 seconds. To be clear, this means that viewers watching on the same service were as much as 134 seconds apart.

Phenix Super Bowl 2024

Figure 3. Average lab behind real-time for streaming services at Super Bowl 2024

Interestingly, Kyle mentioned that the 2024 latency results were actually worse than 2023, so it doesn't look like the identified services or their customers care much about latency. This led to a discussion of whether low latency is the Ring Around the Collar of the streaming world, a made-up problem to sell solutions that none of the major services seem to feel are necessary. This is especially so if you don’t have close neighbors who can spoil your experience with a cheer two minutes before the score or interception appear on your SmartTV.

Kyle politely explained that while Boomers may watch an event via a single screen, most younger generations watch with an eye on social media. So even if you don't share a wall with a sports fan with a faster service, posts on X can serve as a similar spoiler.

This prompted a conversation about deficits in WebRTC-based services that limited their attractiveness for traditional broadcasts. Kyle shared that Phenix has integrated server-side ad insertion and supporting adaptive bitrates within their WebRTC-based platform, addressing two of the major shortcomings. Kyle also mentioned that Phenix has served audiences as large as 500,000 viewers and can serve at least 1 million at latencies under .5 seconds.

That said, like most low-latency platforms, Phenix primarily serves the sports betting and gaming sectors, webinar platforms, and social media applications that integrate live content and influencers to drive user engagement. Still, it's good to see that Phenix—and presumably similar services—are advancing their low-latency technologies to serve an ever-broader range of viewers.

V-Nova PresenZ

One of my first stops during the show was at the Ateme booth, where I saw a demonstration of MV-HEVC, an extension of HEVC designed for encoding multiview video content like 3D video. Specifically, MV-HEVC allows for efficient coding of multiple simultaneous camera views, using inter-layer prediction to improve compression by leveraging redundancies between the views.

In the Ateme booth I saw a demo of MV-HEVC on the Apple Vision Pro, and it was impressive, with excellent quality video. In the headset, the video image seemed to hover a few feet away from me. When I turned to the left and right, the video cut off after about 180 degrees, and reached an edge when I looked too high or too low. So it wasn't a full 360-degree experience.

I don't recall if the video responded to my movements; for example, if I took two steps into the video, I don't recall if it moved with me or presented a different perspective, like you would see if you walked two steps into a room. I also don't know if the experience was a function of the demo Ateme was showing or a limitation of MV-HEVC or the Apple Vision Pro.

I do know that when I experienced V-Nova's PresenZ technology (Figure 4), which boasts of 6 Degrees of Freedom (6DoF), it was mind-jarringly different. In the robot fight scene that I viewed, I flinched when debris flew towards my head and the combatants tumbled around me. If I took two steps into a room, I could see around a corner and view what previously was hidden by a wall. I could turn 360 degrees and as high and low as I could without extending beyond the video, though the quality was a bit soft, like 720p video scaled to 1080p. Noticeable to a compression geek, but not distracting. 

V-Nova Presenz

Figure 4. The robot fight scene I experienced with V-Nova’s PresenZ technology.

V-Nova's Tristan Salomé offered a detailed explanation of these technologies. He highlighted that while the Apple Vision Pro creates an impeccable stereoscopic view by tracking the viewer's eye movements, the VR technology I experienced on the device did not support changes in viewer perspective relative to the content—akin to viewing on a standard 3D TV. In contrast, PresenZ reacts when a viewer moves their head in any direction (up, down, forward, backward, or side to side), enriching the sense of immersion and presence in the virtual environment by mimicking real-life interactions more closely.

Producing films for PresenZ involves using computer-generated imagery (CGI) or capturing scenes with multiple cameras positioned around the subject. These methods help create a volumetric or 3D portrayal of the scene that users can interact with in a VR setting. Tristan pointed out the significant computational demands and sophisticated encoding required to manage the extensive data involved in creating these immersive experiences. This is why V-Nova acquired the PresenZ technology, to marry it with their codec, LCEVC.

It's hard to see how a technology like PresenZ scales, though that's an issue with all AR/VR. It's also uncertain if most viewers, who have long enjoyed movies from a static seat or recliner, will find a more immersive experience appealing. Still, of everything I saw at NAB, PresenZ was the most striking.

Twelve Labs: Automated Deep Metadata Extraction

 

For many publishers, metadata is the key to unlocking the value of archived content, but manually creating metadata is expensive, time-consuming, and ultimately incomplete. But what if there were a way to automatically generate extensive metadata that would enable you to find and retrieve footage using an extensive array of prompts?

That's what Twelve Labs has done. I spoke with the Head of Operations, Anthony Giuliani. He explained that the company's technology utilizes advanced multimodal video understanding models that enable a deep understanding of video content similar to human cognition without relying on traditional metadata (Figure 5).

Twelve Labs

Figure 5. Twelve Labs’s AI understands videos like humans.

Instead, the system creates video embeddings, akin to text embeddings in large language models, which facilitate dynamic interaction with video content. This technology allows users to search, classify, and perform other tasks with video data efficiently, complementing any existing metadata. Unlike text-based metadata, the technology harnesses various modalities within a video, including sound, speech, OCR, and visual elements, to enrich the video understanding process.

As an example, Giuliani asked me to think of a scene where the protagonist had to choose between a red pill and a blue pill. If you've seen the The Matrix, you'll instantly flash to the scene where Keanu Reeves has to make that choice. Giuliani explained that this demonstrates how the human mind can instantly recall specific cinematic moments without needing to sift through every watched movie or rely on tagged metadata.

Twelve Labs' technology mimics this human-like recall by creating video embeddings, allowing dynamic interaction with video content. This enables users to quickly and efficiently pull up specific scenes from vast video databases, akin to how one might instantly remember and visualize the iconic Matrix scene.

Twelve Labs offers this technology primarily through an API, making it accessible to developers and enterprises looking to integrate advanced video understanding into their applications. The pricing model is consumption-based, charging per minute of video indexed, with options for indexing on private or public clouds or on-premises. This flexible and scalable approach allows a wide range of users, from individual developers in the playground environment with up to ten free hours to large enterprises, which may require extensive, customized usage.

Currently, the platform serves diverse clients, including major names like the NFL, who utilize the technology to enhance their video content management and retrieval, particularly for managing large archives and post-production processes. The potential applications of this technology are vast, ranging from media and entertainment to security and beyond, indicating a significant advancement in how we can interact with and understand video content at a granular level.

Norsk: No Code/Low Code Media Workflows

<

I next chatted with Adrian Roe from id3as/Norsk, who introduced their new product, Norsk Studio, at NAB. Norsk Studio builds upon the Norsk SDK launched at Streaming Media East in May 2023, providing a graphical interface that enables users to drag, drop and connect pre-built components into a publishing workflow with no coding required.

Studio comes with multiple pre-built inputs, processes, and outputs, ranging from simple ten-line scripts to more complex modules, facilitating customized media workflows that can adapt to the specific needs of any project. Customers can build new reusable components using the Norsk SDK, which is supported by various programming languages. Adrian explained that most customers prefer TypeScript due to its expressiveness and the availability of skilled developers. Adrian also discussed Norsk's deployment options, noting that both SDK and Studio-created programs can be run on-premises or in the cloud.

Finally, Adrian shared that Norsk had won the IABM BaM award in the Produce category (Figure 6), which “celebrates outstanding technological innovations that deliver real business and creative benefits.”

Dolby Professional: Hybrik Cloud Media Processing

Dolby Hybrik is a cloud media processing facility that has long prioritized the ability to build QC into encoding workflows. At NAB, I spoke with David Trescot, Hybrik co-founder, who showed me multiple QC-related innovations, several of which were enabled via AI.

Some of the most useful additions relate to captions, a staple for most premium content. For example, Dolby added a dialogue enhancement capability that separates dialogue from background music. The dialogue can then be transcribed, and if the video doesn't have captions, Hybrik can create them. Hybrik can also compare the transcribed captions to the actual captions in the package to verify that they belong to that video and are in the correct language and can verify all language tracks in the master. From a pure audio mixing perspective, once the dialogue and background are separated, you can remix them to make the dialogue more distinct.

Hybrik also added a useful GUI to the QA function so you can visually examine the video and listen to the audio at the locations of reported problems (Figure 7). For example, on the upper left of the timeline you see a spike in the Blockiness measure that warrants a look, as well as black detection on the upper right. For audio, you see an emergency alert signal on the bottom middle and silence detection on the far right. Absent the GUI, you'd have to download and play the content in your player of choice, which is cumbersome. Now, you can drag the playhead directly to the problem and assess it.

dolby hybrik

Figure 7. Hybrik's new GUI for QA. Click the image to see it at full resolution.

Interestingly, the technology behind the player, called Resource-Agnostic Swarm Processing (RASP), is as interesting as the player itself. Here's why: Most cloud infrastructures can't play media files, particularly files stored in high-quality mezzanine formats like ProRes.

So, imagine if you have your masters in the cloud in ProRes or similar format and must perform some QC function or visual inspection. Your only option would be to either download the file or transcode the file to a friendlier format and inspect that, but you'd still need a frame-accurate player. If you transcode the file, you may have to transcode the entire file, which is expensive, and then you can either store the transcoded file, which adds to your monthly cost or delete it and risk having to create it again for a later task.

RASP is a cloud media operating system that streamlines these operations by transcoding assets in small chunks only when necessary for the specific operation. In Figure 7, to sample the blocky region at the clip's start, the operator would drag the playhead over, click Play, and RASP would transcode the required video on the fly as needed. These operations would be transparent to the user, who has an experience similar to files stored locally. RASP is a natural for any application involving media stored in the cloud and will be available on a cost-per-minute basis from Dolby.

Streaming Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

NAB 2024: Top New Tech for a Disruption-Ready Streaming Industry

Exciting new and (mostly) AI-driven tools and services from NAB 2024 that very specific problems, from shooting great iPhone footage to automatically creating short clips to providing live low-latency translation and captioning to creating customised radio programming to building purpose-driven social communities.

Companies and Suppliers Mentioned