InterDigital: Building the Mirror World
The company that developed many of the core technologies underpinning the revolution in mobile connectivity is turning its attention to spatial computing.
InterDigital—one of the world's largest pure research, innovation, and licensing companies—has identified raw opportunity in developing the standards that will guide future interactions between the physical world and digital augmentation.
"We are creating the mirror world," announced InterDigital CTO Henry Tirri. "The future impact of information technology depends on building platforms which merge physical objects and digital bits. The elementary technologies for this are augmented reality, artificial intelligence, and visual communication."
One example of this are avatars, which the company thinks will form the basis for online communication in a few years.
"We are creating visual bodies for the AI code. It is like The Matrix," said Tirri. "We are developing technology to compress the neural network and we will push this to MPEG standards."
InterDigital has been a pioneer in wireless for four decades, with 9,800 patents and more than 30,000 contributions to key global standards including 2G, 3G, 4G, and IEEE 802-related products and networks, as well as with the 3GPP's 5G efforts.
"We are an R&D company that monetises our innovation by licensing," Tirri explained to Streaming Media on a visit to the company's lab in Rennes, France. "There have been few if any industries which have been driven by unified standards like wireless. It was a natural expansion for us to look at video, AI, and computer vision and fundamentally the huge growth area where the physical and digital are linked."
The Delaware-headquartered company acquired many of these specialties in its $15.8 million purchase last year of Technicolor's Research and Innovation unit.
The deal, which followed the $150 million swoop for Technicolor's patent business in 2018, is part of a strategic reorientation toward visual computing.
Another reason for its focus on video is the convergence of mobile networks with immersive media.
"There is no longer a frontier between video and wireless," observed Patrick Van de Wille, Chief Communications Officer. "The strain on wireless networks is primarily about video being 80% of the traffic. The 3GPP and the DVB are talking a lot about this convergence."
Video was a logical industry for InterDigital to move into, he explained. "It is dominated by certain standards, it has an appetite for deep long-term research, and it starts off being extremely expensive and complicated."
InterDigital employs 350 engineers in R&D centres in Philadelphia, New York, London, Rennes, and Montreal, plus a dedicated AI team in Paolo Alto. It has a 300-strong team of non-engineers including 40 patent experts to make its business model watertight.
The company does not make products, which it says makes it non-competitive with almost any other company, research body, or potential partner. This also considerably lengthens its horizons. It invests in research for IP that may only become commercialised as part of a standard in a decade's time. For example, its LTE patents date back to 2001, with research beginning two years prior. It demoed its first 5G solution in 2012.
In Rennes, Streaming Media was invited to view a variety of recent activity built on the Technicolor RnI assets.
These include compressing video point clouds, in particular the capture and coding of metadata alongside the encoding of volumetric video in HEVC. Its work is being fed into the PCC standard being developed within the MPEG-I working group.
It showed volumetric content displayed on VR headsets and in pseudo-holographic displays like ones from Looking Glass. Video was also displayed on 2D flat panels with a three-dimensional effect shifting in sync with the position of a viewer. The multi-views are synthesised on a PC and linked to the position of the viewer captured with an IR camera.
It's still far from real time. One second, or 30 frames of HD from a 16-camera rig takes up about 1GB of data and processing is about 100 x slower than real time.
InterDigital uses a 16-camera HD rig to collect images for volumetric video.
InterDigital has built another lightfield rig using sixteen 4K cameras to test content. Work is ongoing to interpolate views between the lenses with possible display at IBC2020.
The company is exploring means of automating the creation of digital humans. A person's face and upper body can be volumetrically captured and rendered as an animated character within 30 minutes. Performance capture data from actors could be added to finesse characters for filmed and TV entertainment, but away from postproduction InterDigital envision a future in which we will all communicate with each other online via CG avatars.
"We will have a digital double of ourselves interacting in society, but there is no standard and there needs to be," said Gaël Seydoux, Director, Immersive Lab, InterDigital R&I. "We are preparing the ground work."
Tirri likened the emergence of the digital human to the augmentation of our senses. "We already have audio interaction with voice assistants and AR for a visual supersense. The least developed sense are haptics but the goal, over a ten year or more span, is to basically interact with objects that don't exist."
The algorithms which will animate the avatars is being trained on large data sets of human facial expression.
Data privacy is a concern. "Blockchain could be embedded in the process to ensure data authentication and security," said Seydoux.
"We will have a digital double of ourselves interacting in society, but there is no standard and
there needs to be," said Gaël Seydoux, Director, Immersive Lab, InterDigital R&I.
While Microsoft Hololens and Magic Leap have a strong lead, InterDigital is exploring ways in which AR content could be embedded in live or recorded video streams for advertising or narrative applications. The demo showed a tablet (smartglasses would work, too) playing the same video as a main screen TV but with the room view on the tablet augmented with CG objects—a training shoe, an astronaut—giving a mixed reality dimension to the streamed content.
InterDigital's mixed reality demo, showing a tablet playing the same image as on a TV screen,
but augmented with computer-generated objects.
Another demo showed the real-time removal of objects or persons from a real-world environment. So-called diminished reality captures the real-world environment first with objects replaced with other objects, colour corrected, and lit to fit the scene. Applications include postproduction pre-visualisation, interior design, and protection of privacy during telepresence.
It has partnered with gaming company Blacknut to develop a cloud gaming application which enables player interaction without any hardware device (unlike Google Stadia). This showcases a proposed AI hub that will sit in the smart home as the core for additional applications in e-health (such as detecting when a person falls over, requiring medical attention) or surveillance (detecting unwanted persons in a room).
"A main area of focus is home connectivity, supporting user mobility at high throughput and low latency with possible QoS constraints across a large number of devices," explained Laurent Depersin, Director, Home Lab, InterDigital R&I. "Key technologies include local processing capabilities (CPU, video processing, model inference and training) and storage, and innovative AI solutions to steer traffic toward the best connectivity in a multi-radio environment."
[Photos by Adrian Pennington]
Whether getting fans more involved in the action, super-serving viewers the content they crave, or better targeting advertising, artificial intelligence is the future of sports video
Haivision will add LightFlow's video encoding and multi-CDN routing AI technology to its current products, and is taking on the Madrid-based team that created it.
Intel and Microsoft are among those building tools for a merged reality video experience that could be streamed directly to the home.