Buyers' Guide to VOD Encoders
This video-on-demand buyers' guide is meant to walk newbies through what an encoder is and how to choose one, rather than help a serious buyer choose a vendor or approach in any of the categories covered. If you're new to the market, you'll learn a bit about who's who and what's what; if you've been in the streaming business for a while, you'll probably not get a lot out of this. As always, the group of companies mentioned is representative rather than exhaustive.
Let's start with some datapoints, courtesy of Bitmovin's 2020 Video Developer Report, which incorporates responses from 792 participants located in more than 80 countries. Figure 1 shows the collective answer to the question, "Where do you encode your video?" and includes data from both the 2020 report and the 2019 report. Totals for each year exceed 100% because many respondents used multiple methods.
Responses to the question, "Where do you encode your video?" from Bitmovin’s 2020 Video Developer Report
The two software encoder categories (on-prem and cloud) total 84%, making software encoders the largest category by far. A later question reveals that of those who used a software encoder, 51% used a commercial encoder to build their own software encoding facilities, while 41% used an open source encoder like FFmpeg.
Working down the list, 32% of respondents used a cloud encoding service, while 13% used managed on-prem encoding services. I'll discuss all of these approaches in this article. Presumably, most of the hardware encoders are targeted toward live applications, so I will give hardware encoders only a brief mention.
There are several concepts you need to understand before you choose an encoder or encoding approach. The first is adaptive bitrate (ABR) streaming, which comprises technologies that enable you to deliver to viewers who are watching on different devices via different connection speeds. Common technologies include Dynamic Adaptive Streaming over HTTP (DASH) and Apple's HTTP Live Streaming (HLS).
All ABR technologies encode files into what's called an encoding ladder, which typically includes five-to-seven files customised for different viewers. Figure 2 shows Apple's recommended encoding ladder from the HLS Authoring Specification. At the top are lower-resolution, lower-bitrate files for those who are watching on mobile phones, while the bottom shows high-resolution, high-bitrate files for viewers who are watching on smart TVs over high bandwidths.
Apple’s recommended H.264 ladder from the HLS Authoring Specification
To distribute via ABR, you need to produce the files in the encoding ladder. You also need metadata files that help the player choose the best rung in the encoding ladder, which can also add captions to the videos and DRM protection.
Creating the encoding ladder is called encoding; creating the metadata files that pull the audio, video, captions, and DRM together is called packaging. Sometimes, packaging involves chunking the original files in the encoding ladder into shorter segments for easier distribution; sometimes, it doesn't.
Some desktop tools, like Adobe Media Encoder (AME), are encoders but not packagers. That's all you need if you're using an online video platform (OVP) like Brightcove or Kaltura to deliver your videos or even YouTube. All of these services ingest a single, high-quality file; transcode into the encoding ladder; and package for the ABR technologies they deploy. However, if your goal is to produce content you can deliver directly to your viewers via HLS or DASH, you'll need both an encoder and a packager or a tool that does both.
Two other concepts to understand are static packaging and dynamic packaging. With static packaging, you create the encoding ladder and necessary packaging and upload all of the files to the origin server for distribution. With dynamic packaging, you create your encoding ladder, upload files to the origin server, and use servers like the Wowza Streaming Engine and Softvelum Nimble Streamer to package the content in real time as needed to match the ABR technology that's compatible with the viewers.
Interestingly, the Bitmovin report tells us that 37.6% of respondents used dynamic packaging. To go dynamic, you need an encoder but not a packager. AME would again be fine; just encode to multiple outputs and upload the files to your origin server, where the dynamic packager can do the rest.
Long story short, before you choose your approach, you have to understand whether you need an encoder or an encoder and packager.
Desktop encoders are software programs you install on local Windows or Mac computers and include the aforementioned AME, as well as HandBrake and Apple's Compressor. You can throw Avid Media Composer's export function into this group as well. Of the four, Compressor is the only tool that can package to an ABR format, obviously Apple's HLS, with captions but no DRM. The rest can output one or multiple files in different formats.
While AME can't package, it does have a watch folder function to enable simple automation; anyone with access to that folder on a network can drop a file in, and AME will launch and encode the file to whatever presets you had selected. If the presets constituted a full ABR ladder, you'd be good to go with a system that used dynamic packaging. With Compressor, you can combine multiple Macs into an encoding workgroup. With HandBrake, you can easily convert a folder or multiple files into a single output preset, but like AME, there's no packaging function.
If all you need is HLS packaging without DRM, Compressor should work for modest production volumes. If you're distributing via an OVP or YouTube or Facebook, any of the desktop encoders should do. If you want full-service encoding and packaging to multiple ABR formats with DRM, you need to look elsewhere.
If this describes you, start by making a list of all of the required features of your encoder/packager, including ingest format support, output codec/ABR support, supported HDR formats, DRM requirements, and captioning requirements, as well as expected volumes. Consider the specific processing that your use case requires. For example, transcoding a simple MP4 file with two-channel audio to an HLS/DASH ladder is pretty simple. However, if you're working with Interoperable Master Format (IMF) files and need to map audio tracks for specific outputs while creating captions in multiple languages, you'll require a much more capable system or service provider.
If you're considering third-party software, you should know where you want to install the software; if you're considering a cloud service, you should know whether you want to deploy using the service or launch the software on your own hardware. In all cases, you'll need to know expected day-to-day volumes and think about the available options should demand spike for any reason.
Enterprise encoders are programs that you license and install on-prem or in a private or public cloud and perform a full range of encoding and packaging functions. Buyers in this category obviously want to own and control their own encoding experience, wherever they deploy it, as compared to using a third-party service.
Most products in this class can support all relevant input files and output in multiple codecs and ABR formats, with captions and DRM support, while providing a range of high-end features like ad insertion, watermarking, and audio loudness management. Most offer both a graphical user interface and application programming interface (API) for automated interaction with media asset management programs and other programs in the encoding and distribution workflow.
One potential differentiator is the deployment model. Can you install the software where you want to use it? How does pricing work in the different environments? Otherwise, you should explore issues like what's the required number of licences to handle both day-to-day encoding chores and the requisite level of redundancy? How many computers will you need to acquire to support your anticipated operation?
Another differentiator is the concept of workflow control over the encoding process. Systems with workflow capabilities can examine files and/or file metadata upon ingest and make encoding decisions like choosing the preset or removing potentially faulty files from the encoding pipeline and notifying a technician. This functionality can be delivered via a user interface or scripting and helps make the operation more flexible and robust.
Yet another differentiator is per-title capabilities, or the ability to customise the encoding ladder depending on the complexity of the video being encoded. Implementations vary, but every legitimate product in this category should offer this option.
Scalability is another consideration. What are your options if your company acquires a third-party library and needs to get it online as quickly as possible? Some vendors offer hardware acceleration, which is an expensive option for a temporary need but might make sense if day-to-day encoding demands increase. Does the company offer daily or monthly licences, or is there a sister cloud service that can handle your overage using the same presets as you use internally?
Don't make your encoder selection in a vacuum. If you'll be acquiring software for other functions, such as live streaming, advertising insertion, streaming file origin, or packaging, think about the benefits of acquiring two or more of these capabilities from a single vendor, and/or understand how the encoder you're considering will interface with products from other vendors.
Cloud encoding is typically provided as software as a service (SaaS), in which you upload your files to the service, choose your encoding options, and direct the service where to send the finished files. The primary benefits of SaaS cloud encoding as compared to on-prem software deployments are lower capital expenditures for the hardware and software, reduced operating costs related to housing and powering the encoding farm, built-in system redundancy, and the elimination of software update costs. As compared to third-party software installed in the cloud, you don't have to buy, install, or maintain the third-party software.
Of course, with only 32% of the Bitmovin survey respondents saying that they use a cloud platform, choosing a cloud service as compared to buying or developing your own encoder can't be a slam dunk. Viewed from a distance, it appears that the SaaS versus own decision is more philosophical than economic.
Cloud encoding services range from compression-only ones like Coconut; companies that offer encoding as well as other services, like Bitmovin; and encoding workflow vendors such as Encoding.com and Dolby's Hybrik to companies like Amazon and Microsoft that offer encoding as a component of an overall storage, encoding, and delivery workflow.
Choose a class of vendors that can deliver the range of services you're looking for and match your desired deployment model. For example, Bitmovin and Encoding.com both allow you to install their software on-prem or on external private clouds, but not all vendors do.
Consider how you want to interface with the system. Most cloud services support API-driven operation, but not all provide user interfaces (UI) for getting started or for nontechnical users. In particular, AWS Elemental MediaConvert has both a highly usable UI and capable API, making the service appropriate for all technical levels.
Pricing is one of the biggest differentiators. Most vendors charge by the output minute, but some, like Encoding.com, let you rent a managed cloud instance by the month for unlimited processing at one set price. For Hybrik, Dolby charges a flat fee per month based on the number of AWS instances that you can use its software on.
Building Your Own
As mentioned earlier, 41% of those who responded to Bitmovin's survey said that they used an open source encoder like FFmpeg. What we don't know is how many use FFmpeg casually as compared to those who build and host their encoding farm using FFmpeg, usually in combination with packagers like Bento4 or MP4Box.
In my view, two types of companies should consider building their own encoding facilities. At the top end are companies like Netflix, YouTube, and others, for which the ability to encode at high quality, high capacity, or both delivers a clear, competitive advantage. These companies have and need to continue to innovate on the encoding front, and you can do that best if you control the entire pipeline.
At the other end are small companies with relatively straightforward needs, in which anyone with a little time on their hands can create a script for encoding and packaging files for distribution (see "How to Automate FFmpeg and Bento4 With Bash Scripts.". Otherwise, for high-volume and/or complex needs, you're almost always better off going with a commercial software program or cloud encoder.
Once revolutionary, pre-title encoding was replaced by shot-based encoding and then context aware encoding. Here's how to evaluate vendors when choosing a solution.