Industry Perspectives: Building a Smarter Server

[Editor's Note: Industry Perspectives is a regular feature in which vendors in the streaming media space explore issues and trends on which they can shed unique perspective. The articles reflect the opinions of the authors only, and we print them as a means of provoking thought and starting discussion.]

Since the Industrial Revolution, we’ve used machines to automate processing and improve workflow productivity. However, in this day and age, scaling infrastructure only gets you so far. There eventually comes a point where adding additional machines does not speed processing any faster, whether you’re transferring data or processing video. Furthermore, most of these machines have been designed to accomplish one thing, like the assembly line robots at an auto plant. Despite their complexity, they are ultimately single purpose tools, programmed to repetitively perform identical tasks in an unchanging environment.

Digital media today inherently includes a wide range of video types with varying resolutions, formats, and image content. This makes any static workflow either highly prone to failure or requiring human intervention for decision-making roles. As revenue from conventional delivery mechanisms drops, speed to monetization is becoming ever more important, and the popularity of online and mobile video is dramatically increasing the amount of encoding required.

It is becoming clear that the current transcoding paradigm will not be able to keep up with the rapidly increasing demand. We must evolve workflows to the point where they can not only process more data faster, but also make intelligent choices about how to prioritize, anticipate and expedite jobs. While we continually see advancements in transcoding server capabilities, at their core they still remain robots. The future lies in a smarter server that enables the processing of more content with greater accuracy.

High-level approaches have already been implemented to improve the media processing workflow. Platform players have long been devoting resources to improve this pipeline. Complete solutions like these often focus on the overall high-level management layer, but this isn’t necessarily the best way to solve the problem for the rest of us. Other approaches for keeping up with the rising demands for online content have created a niche of prohibitively expensive systems that are only applicable to a small group of very large media properties.

A better overall solution is to make the individual components of the system smarter, not just develop a more intelligent overall architecture. In turn, these individual components will be useful to a wider range of content creators, providing them with necessary workflow improvements.

Elements of smarter servers are already appearing, including features like load balancing and grid encoding. But we still aren’t using these encode systems to their full potential. Instead, we remain focused on tailoring system design to match human workflows; the resulting solutions are dependent on manual intervention at far too many stages. Fundamentally, next-generation systems should focus on reducing human touch and increasing system autonomy.

By empowering our servers—and as a result, the whole farm—with intelligence, we gain the capacity improvements we need to stay ahead of the ever-rising demands for encoding resources. At this point, there are still many unanswered questions, but one question we can address is: What makes a smarter server?

More Intelligent
Many of the most time-consuming tasks during video compression are those that involve human operators. Having to rely on individuals as the gatekeeper for large volumes of content is impractical. By building and implementing systems with the right analysis tools and tying business rules to those results, we suddenly have a system where many manual tasks can be offloaded to the server.

Simple tasks like file verification are a start (for example, "Is the file the right length?," "Does it have audio?," etc.). To offload more significant responsibilities, sophisticated analysis and judgments are required, such as inserting advertising based on the video content. In a perfect world, the computer systems we use for encoding should be able to recognize all of the content flowing into them today without human intervention.

More Intuitive
Intuition is the system’s ability to make judgment calls based on the results from its own analysis, as well as to anticipate issues that should generate alerts. Intuition is an extension of intelligence, as it requires the system to learn heuristically. In part, this is a business-rule environment, but with the ability for the system to adapt or change rules based on past experience.

Today, the high volumes of content that online properties like YouTube and Hulu process makes it impractical to spend the time to individually massage quality settings. In these instances, a best practices "cookie-cutter" approach must be taken, which targets middle of the road settings that sacrifice quality for overly aggressive content and waste bits for less aggressive material. A more intuitive process would mean allowing the servers to choose some of the settings required to encode the content, based on historical content processing of similar types. Faster
Computationally speaking, none of the analysis steps or business rules described above is fast. The processing cycles required to analyze so many steps risks slowing the content’s market delivery to unacceptable levels. The challenge is further exacerbated by the amount of high-definition video becoming common on the internet. Time-to-market will always be one of the primary concerns to any video distributor, and therefore any system developed requires dramatic speed improvements as a core competency in order to evolve the server.

Trying to speed up these tasks with older, proprietary hardware might buy a few months of speed improvements, but they are often prohibitively expensive and prone to obsolescence in short periods of time.

Currently, the fastest solution is achieved by balancing the workload between off-the-shelf graphics processing units (GPUs), supplied by companies such as NVIDIA, and central processing units (CPUs), supplied by Intel and AMD. Massively parallel GPU architectures are ideal for computationally intensive tasks like encoding, image processing, and video analysis. Companies can now take advantage of off-the-shelf hardware to dramatically raise the bar on performance, while still retaining all the flexibility and future-proofing provided by software-only solutions.

In Summary
The biggest benefit of smarter servers is that these ideas translate to a variety of video workflows, ranging from large media and entertainment companies to smaller educational or post-production customers. The business rules associated with a specific market vary, but not the core features. All have a consistent ultimate goal: to reduce the manual intervention factor. Smarter servers trade person-hours for computer-minutes.

As more companies work to solve these challenges, the faster we will evolve the intelligent servers that can satiate the global demand for online and mobile media. Not only will we be able to offload an enormous amount of tedious work, reduce infrastructure costs, and decrease time-to-revenue, we will create higher quality video experiences and provide greater global access to this growing medium. The smarter server is an idea whose time has come.

Streaming Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues