How Does VVC Measure Up Right Now?
For this review, I took a first look at VVC (Versatile Video Coding). Specifically, I compared Fraunhofer HHI's implementation of the VVC codec (Versatile Video Encoder; VVenC) against the Alliance for Open Media's (AOMedia) aomenc codec and the x.264 and x.265 codecs in FFmpeg. I found VVenC to be simple to use and much faster than expected. Its output quality was the best in the review.
As a caveat, I should mention that as with all compression technologies, there will be multiple VVC codecs; it's unclear where VVenC will ultimately compare in terms of speed and quality. I bring this up to remind readers that I'm not comparing VVC, AV1, HEVC, and H.264 in this review; rather, I'm comparing commercial implementations of these codecs.
We know from various Moscow State University reports that x265 isn't the best-performing HEVC codec, and my recent AV1 comparison shows that aomenc isn't the top AV1 codec or the fastest. Still, x265 and aomenc (and x264, for that matter) are among the most accessible codecs and the most likely to be used in actual commercial implementations.
With that boilerplate out of the way, let's get to the review.
Let's start with a quick description of VVenC. All standard-based technologies are implemented as a "test model," which in the case of VVC is called the VVC test model (VTM). According to a Fraunhofer HHI document titled "Open Optimized VVC Encoder (VVenC) and Decoder (VVdeC) Implementations," which was submitted to the relevant standards body (the Joint Video Experts Team), "The VVC test model (VTM) serves as a common reference implementation, i.e. a test bed for evaluation and verification of proposed technologies during standardization. While VTM used to be the only publicly available encoder and decoder implementation of the VVC standard, it is aimed at correctness, completeness and readability and should not serve as a real-world example of a VVC encoder and decoder" (emphasis added).
Most of the time, when researchers compare codecs, they use the test model, since this represents the ultimate possible quality, albeit in the slowest possible encoding time. While these tests are necessary for theoretical performance comparisons, they mean little to practitioners who actually use the codec, since encoding efficiency and, therefore, cost are often as important as quality.
Fraunhofer HHI's goal for VVenC was different. The same document says, "The Fraunhofer Versatile Video Encoder (VVenC) and Decoder (VVdeC) development was initiated to provide publicly available, fast and efficient VVC software implementations. The VVenC and VVdeC software is based on VTM, with optimizations including software redesign to mitigate performance bottlenecks, extensive single instruction, multiple data (SIMD) optimizations and basic multi-threading support to exploit parallelization. VVenC further contains improved encoder search algorithms and supports real-world encoder features, including frame-level rate control and perceptually optimized encoding."
Codec vs. Encoder
While VVenC is an encoder in the sense that it can input a raw YUV file and output a VVC-encoded file, its encoding features are nowhere near as complete as those of aomenc or FFmpeg. When I tested the AV1 codecs (and most codecs) with the two-pass variable bitrate encoding technique used for the vast majority of OTT video-on-demand (VOD) encodes, VVenC wasn't ready for that.
A Fraunhofer HHI staffer communicated to me, "VVenC currently only supports very basic 1-pass RC [rate control], which is experimental and so far suboptimal. Fixed QP [quantization parameters] encoding produces the best results in v0.1.0.1. The 1-pass RC also only takes care of hitting the target rate, no other constraints are checked (like max rate). We are currently working on a 2-pass RC implementation which we will be releasing soonish. The preliminary results are promising."
QP, as detailed in the blog post "Understanding Rate Control Modes (x264, x265, vpx)," "controls the amount of compression for every macroblock in a frame. Large values mean that there will be higher quantization, more compression, and lower quality. Lower values mean the opposite." Regarding using fixed QP encoding, which my Fraunhofer HHI contact was indicating, the post continues, "It is suggested not [to] use this mode! Setting a fixed QP means that the resulting bitrate will be varying strongly depending on each scene's complexity, and it will result in rather inefficient encodes for your input video."
I mention this to clarify what I'm actually testing here and how. In the case of the AV1 codecs, I was testing codecs/encoders producing ready-to-deploy bitstreams. Here, I'm testing the performance of a codec implementation using an incomplete encoder and an encoding technique that few producers will ever use in production. That's OK, obviously, since outside the test lab, there's nowhere to send VVC-encoded bitstreams to. But this is why I tested all codecs using QP encoding (or the equivalent).
Implementing Fixed QP Testing
When you test codecs using two-pass variable bitrate, you choose your bitrates and start encoding. In these cases, my preference is to find a range of bitrates that produce Video Multimethod Assessment Fusion (VMAF) values spanning from about 85 at the low end to about 95 at the high end, since this is the quality most producers seem to shoot for with the highest-quality stream in their encoding ladder. Of course, the data rate necessary to achieve this target will vary from clip to clip, which means you have to use different data rate targets for each clip.
To choose QP levels in this case, I decided to focus on x265 as the target and test how much aomenc and VVenC could reduce the bitrate while maintaining the same quality. I also included x264 as a point of reference. To implement this technique, for each test clip, I identified four QP values for x265 that produced VMAF scores ranging from about 83 VMAF points to 93. Then I tested at multiple QP values with the other codecs to find the QP values that achieved close to the same data rate.
This worked well except that the data rates for each rung didn't match as closely as they do when you're targeting a specific data rate. As you can see in Table 1, with VVC, x265, and x264, some or all of the rungs were consecutive, so I had to use that value; there was no way to get closer to the target. This doesn't really matter from a codec-comparison perspective, as the rate-distortion curves and BD-Rate comparisons automatically take this into account. However, when I compared the files subjectively, the data rates didn't match as closely as I would have liked.
Table 1. Finding the OP values for the individual codecs
Note that I applied this targeting technique to all test clips except Crowd Run, which I added last as a torture test for the codecs (more on this clip later). Here, I chose a peak rate of around 9Mbps and logical downward steps to around 4Mbps, which produced VMAF values ranging from 59 to 79 for the HEVC encodes.
Encoding With VVenC
To make VVenC accessible to a range of users, Fraunhofer HHI adopted a two-tier approach. You can use the "standard encoder" via a simple command line and essentially adjust two items: preset and perceptual optimization. Or you can use an expert mode that's based on the VTM that you access via a configuration file called in a command-line argument. I happily choose the former.
There are four presets: faster, fast, medium, and slow. To identify the preset that most producers would use, I encoded two files to all four presets and measured encoding time, VMAF quality, and VMAF low-frame quality, which often varies significantly from preset to preset. Then I computed each score as a percentage of 100% and created the chart shown if we jump ahead to Figure 2 (see page 37).
To explain, the faster preset encodes in about 11.57% of the slow preset while delivering 98.35% of average VMAF values and 98.52% of low frame. If you're a producer in a hurry, faster is for you. I thought that most producers, however, would swallow the 2.5x encoding time of the medium preset to achieve slightly higher quality, so I tested with that preset. Unless you're distributing in Netflix-level quantities, however, you're probably not going to opt for the slow preset.
Consider that in its release notes, Fraunhofer HHI documented VVenC's performance/quality trade-off to help its users choose the best preset for their encoding requirements, and the results roughly approximate those shown in Figure 1. I've never seen any codec vendor produce this type of obviously useful analysis, and this is just another positive indicator of Fraunhofer HHI's efforts to make VVenC highly usable and market-ready.
Figure 1. The Medium preset seemed like the one most producers would use.
Regarding perceptual quality optimization ("qpa" in the command string), this seems like the tuning mechanisms deployed in most codecs. Fraunhofer HHI recommended that I set this parameter to 0 when measuring for VMAF, so that's what I did in the command string. The
only other performance-related option on the command line was the number of threads. Here, I went with four, which was Fraunhofer HHI's recommendation.
Here's the command string used to produce the Football clip at a QP value of 31:
vvencapp -i Football.yuv -s 1920x1080 -c yuv420 -r 30 --preset medium --qp 31 --qpa 0 -ip 64 -t 4 -o Football_vvc_qp31.266
The only funky item was the ip value, which is GOP size. Here, Fraunhofer HHI requires a multiple of 16, which meant 48 for 24 fps files, 64 for 30 fps files, and 128 for 60 fps files.
Regarding AV1, x265, and x264, I used the same presets, tuning, and other configurations that I used in the AV1 comparison article, other than changing to a single-pass QP encode and changing to four threads. Please refer to that article for details on these command strings.
Here's the command string for aomenc (Football clip to QP=48):
aomenc.exe Football.y4m --width=1920 --height=1080 --fps=30000/1000 --passes=1 --auto-alt-ref=1 --row-mt=1 --lag-in-frames=
25 --end-usage=q --cq-level=48 --threads=4 --cpu-used=3 --kf-min-dist=60 --kf-max-dist=
60 --tile-columns=1 --tile-rows=0 -o Football_AV1_CQ48.mkv
Here's the command string for x265 (QP 32):
ffmpeg -y -i Football.mp4 -c:v libx265 -qp 32 -preset veryslow -threads 4 -tune ssim -x265-params keyint=60:min-keyint=60:scenecut=0:open-gop=0 Football_x265_qp32.mp4
Here's the command string for x264:
ffmpeg -y -i Football.mp4 -c:v libx264 -qp 32 -preset veryslow -threads 4 -g 60 -keyint_min 60 -sc_threshold 0 -tune psnr Football.mp4_x264_cq_32.mp4
For the record, I encoded with the Fraunhofer HHI vvencapp.exe version 0.1.0.1 and the aomenc v2.0.0 encoder. I encoded x264 and x265 using FFmpeg version git-2020-08-09-6e951d0. I produced all encodes on an HP Z840 workstation with two 3.1-GHz E5-2687 Xeon CPUs and 32GB of RAM running Windows 7 Professional.
I tested with five 10-second test clips representing a range of movies, sports, animation, and gaming content, with the tortuous Crowd Run thrown in to measure pure compression performance. Here are the clips:
- Crowd Run—The well-known test clip
of the start of a road race, encoded from 3.75Mbps to 9Mbps
- Elektra—A slow-motion, talking-head sequence from the Jennifer Garner movie, encoded from 200Kbps to 1Mbps
- EuroTruckSimulator2—A snippet from the challenging Twitch test clip, encoded between 2Mbps and 7Mbps
- Football—The Harmonic test clip of a college bowl game filmed at the Dallas Cowboys' stadium, encoded from 2Mbps to 4Mbps
- Sintel—A snippet from the well-known animation, encoded between 1,200Kbps and 2,800Kbps
When I detailed my test plan to Fraunhofer HHI, my contacts pointed out that the company was planning to add some encoding tools to the codec that would improve the performance when encoding computer-generated content. As you'll see, VVenC performed quite well on the simple animation and so-so on the computer-gaming clip. If you're interested in these market segments, stay tuned for updates from Fraunhofer HHI that should improve performance.
To test encoding time, I encoded two test clips, Crowd Run and EuroTruckSimulator2, to the Clip 3 parameters (see Figure 1). I then encoded the 2-minute Football clip used in the decoding tests and averaged the time for all three encodes. Table 2 shows the results.
Table 2. Encoding times for three files totalling 1:20 duration
Regarding VVenC, Fraunhofer HHI's VVC codec proved only about 2x slower than HEVC, the technology it was designed to replace. For perspective, the requirements document for VVC states, "Encoding complexity of approximately 10 times or more than that of HEVC is acceptable for many applications." It's also worth noting that when AOMedia launched AV1, encoding times were 45,000x real time. Any way you look at it, in terms of encoding time, Fraunhofer HHI's VVC codec is way ahead of expectations.
The same requirements document defined the quality goal as a "bit rate reduction of between 30% and 50% with the same perceptual quality as HEVC Main Profile." In its press release for VVenC, Fraunhofer HHI claims that "H.266/VVC reduces the bit rate by 50% (relative to its predecessor H.265/High Efficiency Video Coding) while preserving visual quality."
How did VVC do? Table 3 shows BD-Rate computations for all five test clips. If you read the first line, you see that VVC produced the same overall quality as HEVC, with a bitrate reduction of 39%. I tested all 1080p files; you would expect better performance with 4K and 8K files, perhaps all the way up to the 50% mark.
Table 3. BD-Rate VMAF statistics
You also see that VVC produced the same quality as AV1 at an 11% bitrate reduction. Interestingly, for three of the clips, Crowd Run, Elektra, and EuroTruckSimulator2, AV1 and VVC were within 3 percentage points, with AV1 delivering higher quality in Crowd Run (-1.92%) and Elektra (-1.96%). VCC pulled away in the Football and Sintel clips, with BD-Rate advantages of -17.8% and -26.28%, respectively. It's tough to draw genre-specific conclusions from a single clip, but VVenC's performance with Sintel, combined with the advanced tools for computer-generated content that Fraunhofer HHI promised to deliver, bodes well for those who are encoding animated content.
As between aomenc and HEVC, AV1 produced the same quality as HEVC, with a bitrate reduction of 28%. This compares to the 20.85% advantage that aomenc showed over x265 in my AV1 comparison. Figure 2 illustrates the rate-distortion curves, with AV1 and VVC competing in the relevant range between 85 and 94 VMAF points and showing a clear advantage over both x265 and x264.
Figure 2. Rate-distortion curve average for the five test clips
Whenever measuring quality with objective metrics like VMAF, it's useful to dig beneath the score into the video file itself and examine individual frames in order to confirm or challenge the metric score. In this case, I compared the Clip 3 encodes for aomenc and VVenC in the Moscow State University Video Quality Measurement Tool (VQMT) to identify regions in the clip where quality diverged. To explain, Figure 3, the results plot from VQMT, shows the VMAF values over the duration of the two files, with the red plot representing VVC and the green, AV1. You see several regions of significant divergence, particularly around the 180th frame.
Figure 3. The VQMT results plot helps identify quality differences between two encoded files.
In VQMT, you can then open and compare the compressed frames to each other and the source frame, which I did for all similar points in all test clips. Although I usually found visual differences to confirm the scoring delta, they were often very subtle, even at 250%–400% magnification: a little lost detail here, a slight artifact there. The bottom line is that while I would expect VVenC to win in subjective comparisons, the advantage over AV1 would be relatively modest, as the BD-Rate figures predict.
How do these results compare to those of other studies? The closest comparison was presented in the white paper "The Video Codec Landscape in 2020," in which the authors compared VVC, HEVC, EVC, and AV1, using the test models for the standards-based codecs and libaom, the AV1 codec within FFmpeg, for AV1. The authors tested two scenarios: broadcast, with a 1-second GOP, and streaming, with a 2-second GOP. The latter was most similar to my tests.
In their HD comparisons, VVC produced the same quality as HEVC, at a 31.7% lower bitrate, compared to my 39%. AV1 produced the same quality as HEVC, with a bitrate reduction of 15.3%, compared to my 28%. So I'm definitely in the ballpark.
But wait, there's more. As I detailed in my blog post "Video Quality Metrics: One Number Doesn't Cut It," you have to look beyond the single VMAF score to truly gauge quality, since some codecs exhibit transient issues that degrade QoE, while others vary quality significantly over the duration of the video, which can also degrade QoE.
To measure this, I recorded the low-frame VMAF values for Clip 3 of all test clips, which is a measure of transient quality issues, and standard deviation, which is a measure of quality variability. Both datapoints are among those reported by VQMT when computing file VMAF.
Table 4 shows the overall scores for all clips. This data demonstrates that VVC produced about a 2-VMAF-point advantage in the tested clips and is more resistant to transient quality issues, but it produces slightly greater quality variability. Nothing earthshaking, but you never know until you test.
Table 4. Indicators of the consistency of output quality
The final tests involved decoding speed, in which I used Fraunhofer HHI's decoder, VVdeC, to decode the VVenC-encoded file and FFmpeg to decode the rest. The test file was the aforementioned 2-minute-long Football clip, a 1080p-30 file encoded to the Clip 3 parameters for each codec.
In all cases, I decoded to a YUV file stored on a RAM drive and measured CPU utilization in Windows Performance Monitor. My test computer was an HP ZBook Studio G3 notebook with an 8-core Intel Xeon E3-1505M CPU running Windows 10 on 32GB of RAM, 16 of which I allocated to the RAM disk.
Figure 4 shows the results. During decoding, the VVenC-encoded file was about 3.7 times the complexity of HEVC, which relegates it into the "hardware-decode" category. By this, I mean that unless these decoding requirements get a lot lower, most producers likely won't distribute
VVC-encoded video until hardware-decoding support is available, since decode will either require too much CPU, consume too much battery life, or both. Of course, this is what we assumed about VVC from the start.
Figure 4. VVC consumes 80% to 100% of CPU on an 8-core HP notebook.
AV1 is much more lightweight than VVenC, and this is decoding with FFmpeg; there may be more efficient AV1 decoders available. This, plus the 49% encoding efficiency advantage over x264 shown in Table 3, are why larger streaming shops like Netflix, YouTube, and Facebook are starting to deploy AV1 for delivery to browser-based decoders and some mobile platforms.
Where does that leave us? From a strict technical perspective, the news is all good. VVenC proved very efficient during encoding and delivered about the quality expected. Although still most suitable for deployment to systems with hardware-decoding capabilities, which, of course, there are few, VVC seems ahead of schedule in terms of deployment ease.
All that said, VVC as a technology suffers from a number of challenges relating to the post-HEVC hangover for royalty-based technologies. You can read up on these details in "How to Think About VVC." Long story short, until the VVC development community puts together a comprehensive and reasonable royalty structure, hardware deployments may be delayed. Fraunhofer HHI's implementation proves that VVC can be both useful and usable; let's hope that VVC IP owners can formulate a royalty policy that delivers the same.
(Note: Thanks to the Fraunhofer HHI team for making VVenC so easy to use and for providing prompt feedback during the course of this review. It was good to finally connect with you guys; I look forward to working with you as VVenC and VVdeC progress.)
The old realities that used to dictate codec adoption no longer apply. Opening up new markets now matters more than reducing operating expenses. How are HEVC, AV1, and VVC positioned for the future?
BBC R&D finds that AV1 produces better low-bitrate quality than HEVC, but the codec picture will get even muddier in 2020 as MPEG fast tracks VVC, MPEG-5 EVC, and LCEVC