Hardware-Based Transcoding Solutions Roundup: Testing Performance
Table 3 contains the overall BD-Rate computation for PSNR (not VMAF), which shows that by this metric, Intel Quick Sync enjoyed a slight advantage over NVIDIA, again with x264 medium very close and x264 veryfast trailing significantly. For those who are not familiar with BD-Rate computation, reading the top line horizontally shows that Intel Quick Sync can produce the same quality as NVIDIA, x264 medium, and x264 veryfast, with a data rate reduction of .46%, 2.12%, and 16.07%, respectively. Positive numbers indicate that higher data rates will be needed to produce the same quality.
Table 3. H.264 BD-Rate computations for PSNR (not VMAF)
Figure 5 shows the overall subjective ratings gathered by Subjectify.us, with 218 participants choosing the higher quality of 3Mbps versions of the four test clips in round-robin comparisons. These results confirm the overall objective findings that show the two hardware codecs slightly ahead of x264 medium and significantly ahead of x264 veryfast.
Figure 5. Subjective H.264 results from Subjectify.us
Note that there was a great deal of variation in the subjective results on a clip-by-clip basis. For example, NVIDIA enjoyed a significant advantage over Intel Quick Sync in the Football and Meridian test clips, which Intel Quick Sync reversed with a substantial lead in the GTAV clip, where both hardware codecs ranked behind the x264 medium clip. These roughly followed the objective scores, but not completely. If your content is gaming or animation, you should definitely run your own tests with your own videos.
Note that for these H.264 clips, we did not use any tuning mechanisms for the objective benchmarks, because according to Intel, there was no way to disable adaptive quantization or otherwise tune the Intel Quick Sync clips for VMAF or PSNR. As you’ll read, we did tune for the HEVC objective benchmarks and did not tune for the HEVC clips tested by Subjectify.us.
Overall, the quality difference among Intel Quick Sync, NVIDIA, and x264 medium wasn’t much of a differentiator. For publishers pushing huge stream counts, the data rate stability of the hardware codecs will prove very attractive. But if this doesn’t matter to you, it really comes down to the cheapest option.
Again, for HEVC, we tested Intel’s SVTHEVC codec, NGCodec’s FPGA-based codec, and x265 using two presets: medium and veryfast. Although not technically a hardware-based codec, Intel’s SVT line of codecs has been designed to run extremely efficiently on Intel Xeon Scalable processors and Intel Xeon D processors. The HEVC codec has 10 presets, which delivered the performance and quality shown in Figure 6 for the Football clip encoded at 3Mbps. Intel recommended that we test using preset 6, so we did.
Figure 6. SVT-HEVC’s quality and performance by preset
The SVT-HEVC codec has three tuning modes—0 to optimize for visual quality, 1 to optimize for PSNR/SSIM, and 2 to optimize for VMAF. Note that the default is 1, so if you don’t specify tune 0 for your encode, you won’t get optimal visual quality. We used tune 0 for the subjective tests and tune 2 for both VMAF and PSNR. This yielded the following command line, which Intel provided (showing tune 0). For all HEVC encodes, we boosted the buffer size to twice the target data rate to provide a bit more wiggle room for encoding quality.
ffmpeg -SVTnew -i input.mp4 -c:v libsvt _ hevc -tune 0 -rc 1 -preset 6 -b:v 5M -maxrate 5M -bufsize 10M -g 120 output.mp4
We tested the Intel SVT-HEVC encoder on a C5.9xlarge system equipped with an Intel Xeon Platinum 8000 series (Skylake-SP) processor that produced two simultaneous encodes of the full encoding ladder using preset 6 tune 0. Spot pricing on the system was $0.3466 per hour, yielding a cost per ladder of around $0.1733 per hour. On the same system, encoding with the x265 veryfast preset failed to produce a single encoding ladder at the requisite 55 fps. There are certainly larger computers that could produce our encoding ladder using the veryfast preset in real time, but it would definitely be pricey.
We used the script shown below for the x265 encodes, switching between the veryfast and medium presets and tuning for PSNR for the objective testing and not tuning for files produced for the subjective trials:
ffmpeg -re -i input.mp4 -c:v libx265 -preset veryfast -x265- params keyint=120:bitrate=5000:vbv- maxrate=5000k:vbv-bufsize=12000 -tune psnr -pix _ fmt yuv420p output.mp4
NGCodec provided the script below for our testing. Its HEVC codec doesn’t have presets and automatically produces constant bitrate (CBR) streams. We used the -aq-mode 0 switch to disable adaptive quantization for our objective tests and removed the switch to prepare the 3Mbps files for subjective testing:
ffmpeg -y -re -i input.mp4 -c:v NGC265 -b:v 5M -g 0 -idr-period 120 -aq-mode 0 output.mp4
We tested on an FPGA-based cloud computer (AS-f1.2fx8c) hosted by Altered Silicon, which featured two FPGA cards and cost $2.21 per hour, including the NGCodec software. We were able to create one stream for the entire card, but NGCodec claims that by the time you read this article, you should be able to produce up to two complete ladders per FPGA, for a cost of about 54 cents per ladder per hour. If you consider the NGCodec system, you should verify this performance up front.
Figure 7 shows the data rate variability of the four encodes, with NGCodec noticeably tighter than Intel and the two x265 encodes.
Figure 7. Data rate variability of the HEVC streams
Table 4 presents the numbers supporting these figures, with NGCodec showing a much lower standard deviation than any of the other three technologies, confirming the tighter pattern. At least in terms of the tightness of the data rate pattern, SVT-HEVC performs more like a software codec than a hardware codec.
Table 4. Stream variability of the H.264 encoded streams
Figure 8 shows the overall rate-distortion curve for the four clips using the VMAF metric, which has x265 medium in first place, followed by NGCodec, x265 veryfast, and SVT-HEVC in that order. Again, since our fastest test system couldn’t even produce a single x265 encoding ladder using the veryfast preset, the x265 medium stream isn’t a viable choice for most producers.
Figure 8. VMAF rate-distortion curve for the four HEVC codecs
Table 5 shows PSNR (not VMAF) BD-Rate computations for the four HEVC codecs in the same order. SVT-HEVC might have performed slightly better had we encoded with tune 1 (PSNR/SSIM) rather than tune 2 (VMAF), but the difference would most likely not have changed the distribution order. If you prefer PSNR over VMAF, you should run your test encodes using tune 1.
Table 5. HEVC BD-Rate computations for PSNR (not VMAF)
Figure 9 shows the average subjective results from Subjectify.us for all of the tested clips, which rated NGCodec the highest by a substantial margin, followed by x265 medium and then SVT-HEVC. Again, the results varied widely by the clip. For example, SVTHEVC actually rated highest in the Meridian clip, followed very closely by NGCodec and Intel, with x265 medium tied for the lead with NGCodec in the DinnerScene clip.
Figure 9. Subjective HEVC results from Subjectify.us
In the HEVC trials, NGCodec delivered better quality than SVT-HEVC and a tighter distribution pattern, with a reasonable cost per ladder per hour, assuming NGCodec’s performance claims stand up. Intel’s SVT-HEVC technology is relatively new, so it will likely improve over time and is definitely worth checking out for video-on-demand testing since its performance is so tunable.
Overall, while subjective tests produced similar results to our objective benchmarks in the H.264 trials, they varied significantly for HEVC. Although I personally trust objective metrics for intra-codec configuration decisions like choosing a preset or keyframe interval, I’m less confident in the accuracy of objective metrics when comparing different encoders or codecs. Our Subjectify.us costs were well under $500 and well worth the expense. Note that Intel and NGCodec split this cost, which is greatly appreciated.
This series of tests represents our first extensive venture into hardware-based transcoding. We appreciate the assistance from all codec vendors, as well as Softvelum and Subjectify.us, and couldn’t have produced this article without it. However, given the sheer number of technologies, configurations, and datapoints measured, it’s likely (if not certain) that some errors exist, for which the author takes sole responsibility. Please check the online version of this article for comments and (sigh) corrections before making any technology decisions or starting your own test series.
[This article appears in the Autumn 2019 issue of Streaming Media Europe Magazine as "What About the Hardware?"]