Generative AI and the Future of Media Tech
It would be hard to miss all of the ongoing conversations concerning generative AI and how it will impact media technologies. So far, the conventional wisdom is that it promises to enable novices to complete repetitive tasks more easily and efficiently, to create better monetization opportunities, to foster new tool development, and—for better or worse—to drive disruption in many different ways.
The intelligence that generative AI models like ChatGPT have been trained on is vast, and, therefore, instead of counting on what a single human can produce, it can provide as many permutations as a machine drawing on the entire span of its learning can deliver. Ask the right questions, and you will be rewarded—maybe.
Incorporating generative AI into our workflows has the potential to impact almost everything in media technologies, and I’ll examine several of the possibilities in this article. Starting off with low-hanging fruit, I’ll cover production, then move onto monetization, search, the creation of new tools, and more. Many of the experts I spoke with expect the immediate by-products of integrating generative AI into our work to be an increase in productivity and a decrease in time spent on rote tasks. They also see the potential for some very creative problem-solving, while at the same time, a number of ethical questions that I won’t attempt to fully explore in this piece.
“In media entertainment, I tend to look at things through three lenses,” says Anil Jain, global managing director of strategic consumer industries at Google Cloud. “The first is improving content creation, production, and management. The second is enhancing and personalizing audience experiences, and the third is improving monetization.”
Ultimately, Jain contends, monetization is “going to be the one thing everyone cares about because of the opportunity to streamline internal processes and address operational efficiencies, which is probably where the biggest impact will be in the short term. When you think about monetization with generative AI, then you start looking at the art of the possible.
It all starts with the data. Traditional AI systems recognize patterns based on training, then make a prediction. Generative AI uses that data to create some kind of output that’s new.
“Generative AI creates things based on a lot of data,” says Jonas Ribeiro, digital products, platform, and ad tech manager at Globo. “We need to create models with this data so we can create things in the M & E industry.” This could include creating scripts or summaries for editors or images, audio, or video. “Basically,” Ribeiro explains, “you need a lot of data and a lot of models.”
Both public and private data contribute to these models, Ribeiro says. “For the major initiatives, we are using open data, but we need to have a more cautious approach because the internet can influence information”—and as we all know, the internet abounds in inaccurate and suspect information. “We have a lot of people check the outcome. Not everyone can afford to have private data, but for some specific workloads that I can’t disclose, we are trying to use the private data.”
Opinions abound when it comes to language models. One executive I spoke with talked about customers not needing their own training for their particular products, because they can supply data within a search query. Others aren’t so convinced they want to put data out on the open internet.
“To me, it’s a garbage in, garbage out type of situation,” says Steve Vonder Haar, senior analyst at IntelliVid Research. “If you’re going to take information from the web, you’re not going to really have a trusted source of information from which to draw. The real future for AI—at least in the business sense—is going to be in the development of limited datasets that are used to inform decision making within a specific corporate network.”
Generative AI-driven analytics seems to be taking shape as a sort of business analysis tool on steroids. First Tube is a live-streaming platform that leverages AI-driven analytics. The company started with the idea that it wanted to simulate a project’s success in order to tweak delivery. “We are using generative services to create mock campaigns that we can then pull into our analytics platform,” notes David Clevinger, First Tube’s VP for products. These campaigns “We’ve used generative AI to create test data that mimics the way a customer would see campaign elements and the outcomes of a campaign.”
Clevinger says this could be tied to identifying the best platform for posting content; what kind of content drives better engagement—for example, driving brand awareness or getting people to sign up for a sweepstakes—which social platform is better for that specific live stream; what kinds of measurements can be delivered in views, clicks, impressions, or social media comments; or even evaluating the ROI based on the result.
First Tube is planning to build its analytics platform in-house to fully deliver on this promise. “I’m never going to hand that to a third party,” Clevinger says. “But the intention then is to say this approach worked well for this brand in the past. How can we leverage the findings there to turn that into a campaign and now ask the generative AI service to draft a media plan based on what worked well last time?”
The next step for leveraging First Tube’s in-house platform, Clevinger states, is to use generative AI “to do optimization at the vertical level. What is the best tactic or the best kind of campaign or the best parameters around a campaign? Some of our workflow pieces have historically been in spreadsheets or disparate databases,” Clevinger concedes. “What we’re trying to do is build a more comprehensive, robust analytics platform for our customers based on performance metrics.”
Another organization that decided to put some of its production requirements into the hands of generative AI is Barrett-Jackson Auction Co. “Productivity-wise, we have to write tens of thousands of car descriptions every month for our listing service,” says Darcy Lorincz, Barrett-Jackson’s CTO. The company incorporated the automotive information it owns on every car sold in the past 50 years into its own language model. Barrett-Jackson shares some of this information because it wants people to know the results of auctions, but, otherwise, this is the company’s own proprietary data model.
“Training your own model isn’t for everybody, and that’s why these open models are great,” Lorincz explains. “Now we can generate that editorial in seconds. We still need people to do some moderation, but as the machines learn more and more, there’s less effort for us so we can scale,” he continues. This allows the company to say, “We want this car to be in this background with this person talking about it, and 5 minutes later, we have a meaningful 2-minute video describing something that would’ve taken a production team a monstrous amount of research.”
Automated captioning is a feature that has become increasingly common in streaming, VOD, and videoconferencing. Not everyone is enamored with the captioning results AI produces. Thierry Fautier, managing director of Your Media Transformation says, “I have a friend using Google speech-to-text for captioning at a French broadcaster. Does it work? No. In a lab with English speakers, it gets a certain percentage of recognition right. Then you move that to a French environment with noise in the room and with very strict regulation requirements, and it doesn’t work.”
In the live-streaming world, there’s a lot of use of AI for captions. “One of our big things is, if you have a political, legal, pharmaceutical, or healthcare client, there is no way you want to use AI for your captions because you are only going to lose at some point,” says Corey Behnke, lead producer and co-founder of LiveX. “Oversight is the key thing with AI. … I actually believe that we’re going to have more demand for producer oversight than we’ve ever had before in live streaming.”
I’ve seen live software demos that have a very high accuracy level, and I use systems in the course of work all the time that don’t. Later in this article, I’ll discuss an interesting use case that is somewhat based on the same technology, but for a totally different outcome.
“In the world of generative AI, you could actually increase the value of every impression, because now, the ad created is actually created specifically for you at the right moment based on all the context you’ve shared so that the CPM is much higher,” says Google Cloud’s Jain. This sentiment is echoed by others as something that will have immediate appeal.
While many people speak to me about targeted advertising, the attendant costs need as much clarification as the technical capabilities required to deliver targeted results. “I think you can make much more creative ads for a much lower cost,” says Fautier. Creating ads targeted at different groups now immediately runs up against budget limitations. “If I can automate the 10 different subcategories now, you can give a dedicated ad for a dedicated group. You don’t deal with more than 15 groups in general …, so you do 15 ads, and you’re done.”
Another area under consideration is digital product placement within content. “We identified some opportunities of putting a bottle on the table that could be water, beer, or soda,” says Globo’s Ribeiro. This would provide an opportunity to target a much wider audience. “We are not there right now, but we are studying it.”
All of the advertising data required to deliver useful analytics does exist: where it ran, how it ran, what it ran against, who it was delivered to, what errors occurred, what CPM was paid for it, and so forth. The problem is that these different pieces of information are currently sitting in different systems. “On the DSP side, there are disparate systems for combining CRM, delivery, and campaign creative datasets to see if an ad was better served on Crackle or the desktop site for NBC News,” according to C.J. Leonard, global media and ad tech consultant at MAD Leo Consulting. “For one person to sit there and look at an impression log from each of those systems and try to tie all of this together is far from practical.”
Generative AI can be used to clean up this data in ways that humans never could. “With these different datasets brought together,” Leonard says, “we should be able to speak to better outcomes. Instead of putting my finger in the air and saying, ‘Based on my gut …’ I want to be able to say, ‘Based on my gut and this model that is out there …”
The most common theme around applications for generative AI thus far in the media world has been how to use these tools to reduce the time it takes in postproduction to finish content. But it could be just as useful at other stages in the workflow. How do we use it in initial ideation work? How do we use it to summarize content? And if it does play a meaningful role in these areas, what does that mean for the humans who traditionally did those jobs?
“Our customers are a lot more cautious about pure generative AI usage literally trying to do the same job that the creative would have done,” says Shailendra Mathur, VP of architecture and technology at Avid. It’s easy to see how this would make a lot of people uncomfortable, whether they are producers, editors, animators, writers, or actors.
“One of the philosophies that we believe in is creative assistance,” notes Mathur. “It’s automating the mundane.” There are so many mundane tasks in the post workflow that can be very repetitive and time consuming, he explains, such as logging the metadata, manual content checking, searching for specific B-roll, and doing research for a script. Another idea is to offload less skilled work. “We have a labor and skills shortage in the industry today, so part of it is actually leveraging some of the AI models as well as some of the automation that results from it to drive what we could not fill with humans’ skills.”
However, this automation only goes so far. “[ChatGPT] can only guess at what you want,” says Mathur. “You need to know what you’re asking for, and you can’t blame ChatGPT for giving a wrong answer when you didn’t ask for the right things. If you’re asking the system to perform a job, you always need to be there to double check.”
While various levels of metadata search have been available previously, using generative AI means associated content can be surfaced that normally wouldn’t be found, Mathur explains. Large language models used in generative AI are based on a method of representation called semantic embeddings. “Embedding space models are used to convert text, video, or audio objects into a vector database,” Mathur says. This database can identify things using object data as well as semantic information.
“When we look at the semantic embeddings’ core technology underneath, this is where the association of multiple pieces of audio, video, etc., all come together,” according to Mathur. “You can say, ‘Is this written in this language?’ ‘Or [show me] a picture or audio about Nadine,’” and the system would return a list of every media object related to my name.
The result is that predictions about tens of thousands of image labels not observed during training are possible. This opens up source libraries to much speedier access and far greater detail than ever before. Avid has a research and advanced development lab showing many other concepts under consideration.
With all of the excitement surrounding ChatGPT in 2023, says Google Cloud’s Jain, “Everyone experienced the paradigm shift to direct to consumer. But now with generative AI, there’s a bright light that’s shining on the potential for the disruption on the upstream content creation and production side as well.”
“We’re running multiple FAST channels out of our facilities,” says Tulix CEO George Bokuchava. “Why not think about dynamic edge generation? Imagine you have a [brand], and you have a slot in a live stream. You can have an AI-generated ad dynamically inserted based on market conditions and whatever is going on in the world. We just need to be open-minded and think about things completely from a new angle. This is absolutely doable.”
“I think if you look at publishers, they have this mix of excitement and fear,” notes Jain. “On the fear side, generative AI is going to reduce the amount of time that audience members spend on publisher sites, because it’s either summarized somewhere, or it reduces the need for an individual to dig deep into what journalists produce.”
“The excitement is that can we actually create more kinds of content experiences for a consumer because we can summarize information,” Jain continues, “and we can build greater community because we can add conversational AI on top of our content to make the experience more of a dialogue, more dynamic and interactive.”
Using voice could also have a very exciting outcome for any and all software; consider conversational interfaces or using speech to control how a software product will work. “You’re actually conversing with the applications in your own natural language,” says Mathur.
Looking at the area of content creation, it’s easy to see how efficiencies are created in production. Google built an OTT proof of concept that uses Bard to help viewers pick content to watch. This functions in place of traditional streaming navigation. A viewer can interact to find content, based on whatever conversation inputs they have. It’s essentially a recommendation engine with a memory and the ability to parse interactions.
And then, if users want (and have granted permissions), they can have fries with that too (or any other type of food they can order through a delivery service) by continuing their conversation with “the voice of OTT POC.” This is an interesting approach to the overwhelming consumer content dilemma.
Lorincz says Barrett-Jackson is also looking at how to bring more interactivity to its offerings. “Now we’re doing video backgrounds with avatars that actually talk without being scripted,” he notes. “We also have a customer service robot—not an avatar—and I can ask it any question about our company, and it can answer me in real time.”
I’ve encountered many people who asked ChatGPT to write something for them and were absolutely thrilled with the results. So, to test the idea that ChatGPT is useful for creative assistance—and in the interest of research—I decided I needed to try this. I attempted to have ChatGPT write a paragraph on the same topic as this article in my style.
A few questions come to mind. Is there enough data to generate and impersonate my voice? Do I really write like this? Where are the direct quotes from real people that accompany all of my articles, including this one? I provided the same exact research to ChatGPT that I drew on here, and out popped the following:
Title: Revolutionizing Media Operations Through AI: Bridging Creativity and Efficiency
In an era where technological advances shape industries at an unprecedented pace, the media sector stands as a remarkable testament to the transformative potential of Artificial Intelligence [AI]. Over the past few years, the media industry has become a crucible for AI innovations, ushering in a new era of creativity and operational efficiency. From streaming platforms to advertising campaigns, the convergence of AI and media operations is yielding remarkable outcomes.
I suppose the ultimate question is this: Can you tell this paragraph was generated by AI, or could it pass for being written by a human (if not necessarily one particular human)? And for how long will we continue to value that distinction?
“A lot of the conversations I’ve had over the last few months have led me to believe that we’re going to see an enhanced premium placed on trust and authenticity,” says Google Cloud’s Jain. “In a world where so much more content can be created with far less toil, people are going to want to know: Is this AI-generated, or is this something a human put together?”
This article has been fully researched and written by a human.
AI dominates IBC discussion but application remains thin on the ground
Companies and Suppliers Mentioned