Testing EVC, VVC, and LCEVC: How Do the Latest MPEG Codecs Stack Up?
OK, I'll say it. As far as I know, this is the first time any study has compared the quality and performance of codecs representing Essential Video Coding (EVC), Versatile Video Coding (VVC), and Low Complexity Enhancement Video Coding (LCEVC), as well as AV1, HEVC, and H.264. It's not as exhaustive as I would like, but the results should help you understand the goals for the three newer MPEG codecs and how they stack up against older codecs.
I'll start with a quick refresher on the three new MPEG codecs, including their IP picture and the expected performance envelope. Then I'll identify the other codecs that we tested for encoding time, encoding quality, and decoding performance. I'll review those results and send you on your way.
VVC, EVC, and LCEVC (Oh My!)
Table 1 shows basic data regarding the three MPEG codecs launched in 2020. For a complete discussion of what each codec was attempting to accomplish, please check this article. Here's the CliffsNotes version.
VVC is the logical successor to H.264 and HEVC, with dozens of companies contributing IP, aggressive quality targets (30% to 50% over HEVC), minimal concerns about encoding/decoding complexity (10x HEVC is OK), and minimal concerns about presenting an affordable and cohesive royalty picture until the very end of the development cycle when the Media Coding Industry Forum got involved.
EVC was MPEG's answer to AV1 and HEVC's calamitous royalty rollout. There are two profiles, Baseline and Main. The Baseline profile is supposed to replace H.264 with ~30% savings and a supposedly royalty-free posture. The Main profile is designed to replace HEVC, with ~30% savings and a clearer, potentially cheaper royalty policy than HEVC because only four companies contributed IP. In addition, the codec is modularly designed so that royalty-bearing tools can easily be switched off if the owner refuses to offer a fair royalty.
Table 1. About MPEG's class of 2020 codecs.
LCEVC was MPEG's attempt to go green, a codec that dared to be great without increasing your encoding times by 10x. As an enhancement codec, LCEVC deploys a lower resolution base layer of an existing codec (like x264) with an enhancement LCEVC layer. For example, in our tests, we configured the 1080p LCEVC stream with a 960x540 x265 base layer and an LCEVC layer expanding the output to 1080p. The resultant MP4 stream is presented so that an HEVC player incompatible with LCEVC will simply play the lower resolution HEVC stream, providing backward compatibility.
In terms of quality target, LCEVC aspires to deliver better quality than a full-resolution version of the base-layer codec (e.g. x265 in these tests) and as close as possible to the next generation MPEG codec (e.g. VVC). The encode/decode performance envelope shouldn't exceed that of the base layer codec at full resolution. Completing the picture, codec company V-NOVA owns most of the IP related to LCEVC and has already announced its royalty policy.
What did our tests show? Well, the Fraunhofer VVC codec delivered the targeted quality with much less encoding complexity than expected. Both of the EVC codecs met their quality targets as well, with the Baseline profile very efficient on the encoding side, the Main profile not so much. None of these codecs will play back in software anytime soon, so you'll have to wait for hardware support to deploy them.
LCEVC hit the trifecta, delivering better quality than full-rez x265 in 30% of the encoding time with the same or better playback efficiency. LCEVC wasn't the only codec that made x265 look bad; AV1 pulled ahead even further in quality while the MainConcept HEVC FFmpeg plug-in also outperformed x265 by just under 20%.
That's the TL/DR version; here's what I did and how I did it.
What I Tested
I started testing the EVC reference encoder but later switched to the open-source eXtra-fast Essential Video Encoder (XEVE)(version 0.3.1) and eXtra-fast Essential Video Decoder (XEVD) (version 0.2.1) which delivered about 99% of the quality of the reference encoder with vastly faster encoding speeds and improved decoding frame rate.
I first tested Fraunhofer's VVC code here. In this review, I tested version 1.2 of the VVenC encoder and VVdeC decoder. I first tested LCEVC here; for this article, I tested LCEVC using version 3.4.0 of the encoder supplied in an FFmpeg 4.4 build by V-NOVA.
I tested x264, x265, and AV1 using FFmpeg version 2021-12-02-git-4a6aece703, downloaded from www.gyan.dev on December 2, 2021. Because x265 is recognized as a middle-of-the-road HEVC performer, I also tested version 2.0.0 of the MainConcept HEVC Encoder FFmpeg plugin.
I tested with five ten-second test clips representing a range of movies, sports, animation, and gaming content, with the tortuous Crowd Run thrown in to measure pure compression performance. Here are the clips.
Crowd Run—the well-known test clip of the start of a road race.
Elektra—a low-motion, talking head sequence from the Jennifer Garner movie.
EuroTruckSimulator2—a snippet from the challenging Twitch test clip.
Football—the Harmonic test clip of a college bowl game filmed at the Dallas Cowboys stadium.
Sintel—a snippet from the well-known Blender animation.
How I Tested
I tested using fixed QP-based encoding, where you set the quality level in the command string with a QP or CRF value and the codec delivers a file at the data rate necessary to achieve that quality level. To compute BD-Rate stats, you need four reference points, so you choose four QP levels that produce the desired quality spread.
For this review, I choose x265 as the baseline and iteratively encoded four of the five test clips to find four QP values that delivered VMAF scores roughly between 80 and 90 VMAF points. Then I encoded iteratively with the other codecs to find QP values that roughly matched the x265 data rates.
With the last test clip, Crowd Run, I choose a peak rate of around 9Mbps and logical downward steps to around 4Mbps, which produced VMAF values ranging from 57 to 79 for the x265 encoder. Then I found QP values that delivered the same data rate range for the other codecs.
With the open-source EVC codec, XEVE, I tested using command strings provided by an EVC developer from Samsung, one of the four companies contributing intellectual property to the EVC project. The open-source encoder offers four presets: fast, medium, slow, and placebo. To choose the appropriate preset, I encoded two files using the same QP values and all four presets and then measured encoding time and VMAF quality, which you see presented as a percentage of the 100% score in Figure 1.
This shows that for the Baseline profile, there was little difference in encoding time from fast through slow and that quality increased from 98.7% of the maximum to 99.36%. Chasing the last .64% would triple the encoding time which didn't seem worth it, so I used the Slow preset when encoding to the Baseline profile. A similar analysis and vastly longer encoding times lead me to use the medium preset for the Main profile.
Figure 1. This analysis shows that the Slow preset is optimal for the Baseline EVC codec.
This decision made, I used the following command line for the XEVE encodes, swapping in
baseline for the baseline encodes.
xeve_app.exe -i Football.yuv -w 1920 -h 1080 -q 29 -z 30 -I 64 -v 1 --frames 300 -o Football_main_29.evc -r Football_main_29_rec.yuv -v 3 --preset medium --profile main --threads 4
You set the QP level with the
-q switch, set to 29 in the command string. Like many early-stage codecs, you have to set I-frames to a multiple of 16, which meant 64 for 30 fps files like the Football clip. Like many encoders, XEVE can produce a YUV output file from the encoded file during the encode cycle, which saves a step for quality testing. This is the YUV file you see in the second line. Completing the picture,
-z is the frame rate, and
-v sets the verbosity level of the messages coming back in the Command window.
No Tuning for Metrics
When I reviewed the Franhaufer VVC encoder in late 2020, I tuned all encodes for VMAF processing. This time around, I didn't tune since few publishers tune for production encoding and because improvements to the VMAF metric should minimize and ultimately eliminate the differences between what looks good to the metric and what looks good to the human eye. This is a complex issue worthy of much longer discussion (see here). For the purposes of this study, note that this decision slightly decreased the scores for VVC, x264, and x265, but had little effect on the other codecs.
Here's the command string I used for the VVenC encoder; check the previous article for my rationale. The only change here was to enable "perceptually motivated QP adaptation," which is the default setting, and the consequence of the decision to discontinue tuning for metrics. This cost VVC about .5 VMAF points on average.
vvencapp -i Football.yuv -s 1920x1080 -c yuv420 -r 30 --preset medium --qp 28 --qpa 1 -ip 64 -t 4 -o Football_vvc_qp28.266
The command string for x264 was:
ffmpeg -y -i Football.mp4 -c:v libx264 -qp 32 -preset veryslow -threads 4 -g 60 -keyint_min 60 -sc_threshold 0 Football.mp4_x264_cq_32.mp4
The command string for x265 was:
ffmpeg -y -i Football.mp4 -c:v libx265 -qp 32 -preset veryslow -threads 4 -x265-params keyint=60:min-keyint=60:scenecut=0:open-gop=0 Football_x265_qp32.mp4
Both were unchanged from last time except for removing the tuning mechanism.
In the VVC article, I encoded with the Alliance for Open Media's standalone encoder; this time I went with libaom-AV1, the AV1 codec within FFmpeg. Here's the command string:
ffmpeg -y -i Football.mp4 -c:v libaom-av1 -b:v 0 -g 60 -keyint_min 60 -cpu-used 3 -auto-alt-ref 1 -threads 4 -tile-columns 1 -tile-rows 0 -row-mt 1 -lag-in-frames 25 -crf 41 Football_libaom_41.mkv
You can find the rationale behind this command string in this article.
Here's the command string recommended by MainConcept.
ffmpeg" -i Football.mp4 -c:v omx_enc_hevc -omx_core omxil_core.dll -omx_name
0000]:bit_rate_mode=4:rate_factor=41" Football_mc_hevc_qp41.mp4 -y
Finally, here's the command string for the LCEVC codec which uses the x265 codec as the base layer. V-Nova provided these parameters and recommended that I also test AV1 as the base layer, but I didn't have the time to incorporate that version.
ffmpeg -y -i "Football.mp4" -g 60 -c:v lcevc_hevc -base_encoder x265 -r
29.97 -s 1920x1080 -b:v 0k -eil_params
Testing with x265 as opposed to AV1 is significant because as an enhancement codec, LCEVC's performance is tied to the quality of the base layer codec. As expected, other than x264, x265 quality was the lowest of all tested codecs, which necessarily degraded the LCEVC scores as well. More on this during the quality analysis.
I tested encoding time on an HP workstation with a 3.4 GHz Intel i7-3770 CPU with 16GB of RAM and 4 cores and eight threads with HTT enabled. The results shown in Table 2 are the combined times for two ten-second test files, along with encoding time compared to x265 and as a percentage of real-time.
Table 2. Encoding times for the tested codecs.
Note that the Fraunhofer VVC codec was about 2x the encoding time of x265, well below the 10X expected. LCEVC was about .3x the encoding time of x265 and the MainConcept HEVC codec was about .83x.
The XEVE encoder produced the Baseline EVC file in under 2x the H.264 encoding time, which was impressive. While the Main profile encoding time looks slow, remember that AV1 was about 45,000 times real time when we first tested it back in 2018, and look how far it has come, essentially the same encoding speed as x265 using the very slow preset.
For the record, the EVC reference encoder that I started off using produced the two Baseline files in 2:33:56 (yes, that's two hours, 33 minutes, and 56 seconds) and the Main files in 9:08:39. Once I discovered that the open-source EVC encoder could deliver virtually the same quality in a fraction of the time, the decision to switch to the open-source software became obvious.
Table 3 shows the BD-Rate results for the listed codecs. These tables make the most sense when you read them by rows. So, if you start on the x265 row, the x265 codec can produce the same quality as x264 at a 33.86% lower bitrate (green is good) but has to increase the bitrate by 24.44% to match the quality of the MainConcept encoder (pink is bad).
Table 3. BD-Rate comparisons for all tested codecs.
At the low end, as expected, the EVC Baseline codec produced the same quality as x264 at about a 30% bitrate reduction but was well behind both HEVC codecs and AV1. AV1's lead over x264 and x265 increased by about 10 points since the VVC comparison; much of the increase is because I didn't tune for metrics with x264 and x265, as I did last time. In the last review, I didn't tune for metrics with AV1 because it made little difference, and I didn't tune this time either.
The MainConcept codec handily outperformed x265 and, as mentioned, comes as an FFmpeg plug-in. You can pick up a version of the plug-in here. While it only costs $99 it comes with a big "for non-commercial use" requirement, so if you want to deploy the codec for production, you'll have to negotiate a fee with MainConcept.
LCEVC with x265 as a base layer was 22.47% more efficient than x265 which is impressive. At some point in the near term, we'll have to test and see if using AV1 as a base layer would improve performance beyond the AV1, EVC Main, and VVC.
At the high end, all glory goes to VVenC, though the EVC Main codec was unexpectedly close. Again, VVenC's comparative performance would have been higher had we tuned for metrics as we did last time. Meanwhile, the EVC Main codec produced the same quality as x265 and the MainConcept HEVC encoder at bitrate reductions of 41.56% and 27.04%, meeting the quality goal.
Figure 2 presents the same results as a rate-distortion curve. You see x264, lonely on the bottom, followed by EVC Baseline in green and x265 in orange. Clearly, EVC Baseline is designed to supplant H.264, not battle it out with higher-end codecs.
Figure 2. Rate distortion curves for the low-end codecs.
At the top end, VVenC slightly outperforms both EVC Main and AV1, but the difference isn't meaningful. It's likely that some other factor, like decode performance, playback compatibility, or even licensing cost, would dictate your codec decision among these three.
During testing and analysis, I kept returning to the quality disparity between AV1 and the two HEVC versions, particularly x265. I checked several clips visually to confirm these findings and wanted to share Figure 3 with you. It's a split-screen view, Libaom-AV1 on the left, x265 on the right. It's always tough online and especially in print, but hopefully, you can see how the artificial turf encoded by Libaom-AV1 retains its integrity, while the x265-encoded grass is highly artifacted. This was an extreme case, but all subjective comparisons confirmed the scoring differential.
Figure 3. Libaom-AV1 preserved the visual integrity of the artificial turf, while x265 destroyed it.
While encoding time determines encoding cost, and output quality the bandwidth savings (or QoE improvements) delivered by a new codec, decoding performance determines where you can actually use the codec. Particularly for mobile distribution, if a phone or tablet can't play the video at the full frame rate without compromising battery life, producers won't deploy a new codec until hardware-based playback is available. This is what makes decoding performance so important.
I tested decode performance on the same machine as the encoder, storing the files in a RAM disk and decoding to a RAM disk as described here. I tested most decoders using a 2-minute version of the Harmonic Football test clip but tested the 10-second version for VVC and both versions of EVC.
I used the standard version of FFmpeg to decode H.264, H.265, and AV1 and custom versions of FFmpeg supplied by V-NOVA and MainConcept for their respective codecs. I used Fraunhofer's VVDeC decoder for VVC, and the open-source XEVD decoder for EVC. Table 4 shows the frames per second achieved by all decoders.
Table 4. Frames per second achieved by each decoder.
As you can see in Figure 4, H.264, H.265, and particularly AV1 were very efficient in utilizing the available horsepower as was Fraunhofer's VVDeC decoder, though the frame rate wasn't close. Even with 8 threads specified in the decoding script, the EVC open-source decoder wasn't able to fully utilize the available processing power, resulting in comparatively low frame rates.
Figure 4. Decode frame rate and CPU used by the decoder.
I asked V-Nova about their frame rate and CPU usage, which on their face looked disappointing. The response was illuminating. "In general, we never optimize for maximizing decoding frame rate beyond the real-time requirements, rather we optimize for power consumption. As long as decoder FPS is well beyond real-time on reasonable hardware, that figure becomes pretty irrelevant, while power is indeed important."
To check this, I measured the CPU used by forcing real-time playback in FFmpeg with the
-re switch and monitoring CPU usage using the Windows application Performance Monitor. You see the results for H.264, HEVC, AV1, and LCEVC in Figure 5.
Figure 5. Though LCEVC's frame rate wasn't competitive, power consumption sure was.
The results confirm V-Nova's claim. Overall CPU utilization was certainly less than AV1 and looks to be lower on average than HEVC as well. Unless you absolutely need 300-400 frames per second, LCEVC with HEVC as a base layer appears at least as efficient as HEVC played back in software, if not more so.
So where does that leave us?
Both EVC codecs delivered on their quality targets, and the Baseline codec showed impressive encoding speed. The Main codec delivered much higher quality but much slower encoding speeds, and both will need hardware to achieve full-frame rate playback with reasonable power consumption.
VVenC encoded at roughly half the speed of x265 which was vastly improved from our previous tests, and delivered best-in-class quality, if only by a hair. Like the EVC codecs, it doesn't appear that VVC will play efficiently in software on mobile devices in the near term.
LCEVC with x265 as a base layer encoded in 30% of the time as x265, delivered the same quality at about 78% of the data rate, and was a touch more efficient during decoding than straight-up HEVC. Again, it will be very interesting to see how LCEVC performance scales with even higher performing codecs as a base layer.
AV1 keeps getting more and more competitive from a quality and encoding time perspective, particularly compared to x265. Speaking of x265, I have no idea how much licensing the MainConcept codec costs, but at least in our QP-based testing, it shaved about 20% of the bandwidth of x265 at slightly faster encoding times. If you're an x265/FFmpeg user, buy or trial the MainConcept plug-in, compare using your existing parameters and see if it makes sense to switch over.
(The author wishes to thank Samsung's Minsoo Park for much support and handholding relating to encoding with the EVC codecs tested. I also appreciate the help of Fraunhofer's Benjamin Bross (VVC), MainConcept's Thomas Kramer (HEVC), and Guendalina Cobianchi from V-Nova (LCEVC). The author would also like to thank Andrey Pozdnyakov from Elecard for a high-level tech read. All errors that remain are my sole responsibility. ]
[Editor's note: This article has been updated to correct the resolution of the LCEVC base layer and to clarify LCEVC quality targets.)
This article first appeared in the 2022 Streaming Media Industry Sourcebook European Edition.
With H.264 usage finally beginning to decline and several newer codecs ascendant, this 2022 codec update reports on the most significant announcements from the last year relating to H.264, VP9, HEVC, AV1, VVC, LCEVC, and EVC.
Companies and Suppliers Mentioned