February 15, 2018
By Ankur Patel Director, Live Video Systems and Technology, Paramount+
Featured Articles

AI and Machine Learning Push Video Quality to New Heights

[Interested in how AI and machine learning are revolutionizing video? Join us at the Streaming Forum in London on 27 February, where we'll feature presentations from Amazon, IBM, and more that focus on AI and machine learning.]

Since the first television broadcast in 1928, video technology has moved forward us many light years from analog standard-definition (SD) black-and-white television to over-the-top (OTT) digital high-definition (HD) streaming to hundreds of connected devices. According to Cisco’s latest Visual Networking Index, video traffic will represent 82% of all internet traffic by 2021, up from 73% in 2016. What’s more, Cisco CEO Chuck Robbins predicts that there will be a million devices added to the network per hour in 2020. The utmost challenge for the OTT video streaming is to provide highest possible quality of experience (QoE) and quality of service (QoS).

According to a paper published by professor Ramesh K. Sitarman of the University of Massachusetts, Amherst, viewers begin to abandon a video after a 2-second delay, with 6% disappearing per second thereafter. Buffering and pixilation can generate negative user experience and revenue loss from digital advertisements. Adaptive bitrate (ABR) streaming has been adopted to minimize buffering by switching the bitrate as required and warranted by bandwidth fluctuations. The concept of ABR solves part of the challenges of OTT streaming. But ABR can’t completely eliminate re-buffering and pixelation on mobile handheld devices, considering the dynamic changes in the mobile user location and connectivity—additional remedies are required to make rebuffering a thing of the past. Also, additional complexities such as fast-forward and rewind often result in playback freeze, creating negative user experience.

The answers to these challenges are hidden in the newer technological concepts of artificial intelligence (AI) and machine learning. MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed the Pensieve neural network, an artificial intelligence (AI) system that uses machine learning to pick different existing algorithms such as rate-based algorithms, buffer-based algorithms depending on network conditions. The Pensieve neural network, which predicts connectivity issues ahead of time and predictively adjusts the streaming resolution to create enough playback buffer for buffer-free user experience. Practically, this approach won’t eliminate buffering entirely, but it will help it reduce it and bring us one step closer to buffer-free video streaming. The field experiment using the Pensieve neural network resulted in up to 30% less re-buffering and increased key QoE matrices by up to 25%. However, there will be always a room for further improvements as the more comprehensive data becomes available to train the Pensieve neural network.

Video streaming can also benefit from advances in machine learning technology. YouTube and Netflix employ machine learning to optimize the encoding parameters dynamically. This not only increases user QoE and QoS but also reduces the number of bits required for the same quality. Encoding optimization using machine learning can also help optimize costs in the form of less bandwidth usage. It will also reduce the cost of engineering resources previously used for manual optimization. In case of YouTube, neural networks (NN) are used to dynamically predict the video encode quantization levels (QL) that can produce target bitrate and achieve performance of dual-pass encoding in a single pass. As a result, it will also reduce overall video latency and encoding costs.

The availability of connected devices—from handheld mobile to large screen TVs—has created myriad challenges, since different screen sizes can make a huge difference in perceptual video quality. Static encoding models are not cost-effective, considering that they do not take screen size and scene complexity into calculation. Machine learning algorithms can be used to achieve “content-aware” encoding, based on perceptual quality of video. Machine learning algorithms can decide the encoding parameters based on the screen size and targeted perceptual quality intended for that specific screen size. For example, in order to achieve the same perceptual quality on two different screen sizes, the number of bits required can be far less for one screen than another screen. Machine learning can help us perform this on the fly, resulting in reduction in bandwidth consumption and cost savings.

AI and machine learning can provide effective solutions to longstanding challenges of dynamically detecting lip sync and closed caption (CC) text synchronization issues, which otherwise requires active eyeballs to detect or the use of invasive methods such as inserting a watermark or fingerprint in the baseband video (SDI) and audio. The experiments performed by University of Oxford’s Department of Computer Science using an artificial-intelligence system called LipNet can identify words with 93.4% accuracy, compared to human professionals who achieve just 52.3%. Similar test performed by the Google DeepMind project revealed that the AI easily outperformed professional lip-readers who attempted to decipher 200 random clips of data set. The AI successfully deciphered 46.8% of all words versus 12.4% by professional lip-reader. Products are emerging in the market that use AI and machine learning to detect lip sync and CC text Synchronization issues. One such product is LipSync frome Multicoreware Inc., which uses AI and deep learning to track the movement of the lips to measure video-audio synchronization.

As we step in to the world of artificial intelligence, new concepts and theories are emerging to optimize content generation, preparation, delivery, security, and presentation. For example, the implementation of Deep Neural Networks has made huge positive impact on YouTube video recommendations system. Even more promising is the next generation of highly intuitive networks based on artificial intelligence and machine learning, which will make a huge positive impact on OTT video streaming, shaping its adoption and growth and enhancing content security.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

AI and Machine Learning Push Video Quality to New Heights

Server-Side Selection is a Game Changer for Video Streaming

Buyers' Guide to Essential Video Utilities

At IBC, an AI Startup Hacks the Brain for Audience Insights

The State of Video and AI 2018

Streaming Forum Gets Smarter, Focuses on AI and Machine Learning

Finnish Video AI Specialist Valossa Takes $2 Million in Funding

Best Practices: Localise It - AI Subbing and Dubbing

Best Practices: Sports and Esports Strategies That Matter Most

More

First Look: IBC Streaming Solutions

Analytics That Matter: Turning Viewer Data into Actionable Insights

Hybrid Streaming Workflows: Blending Cloud, On-Prem, and Edge for Flexibility

More Web Events