Back to Basics: H.264 Transcoding for Flash
• IDR interval. Something not typically considered in encoding, but important from a keyframe perspective, Sonnati talked about the IDR interval, or the distance between keyframes (which may or may not be I frames). Since two consecutive IDR frames isolate a group of pictures (GoP) an IDR frame is an I frame that is not "crossed" by multi frame referencing is best considered for keyframes. In other words, an I frame that is not in between P frames that reference one another could be used as a keyframe while an I frame between P frames that reference one another could not.
• B frames. Sonnati said B frames are more useful in static scenes. While some encoders can do up to 16 B frames, others don’t allow greater than 3 B frames. As such, Sonnati suggested keeping the B frame maximums in the 1-3 B frame range. In addition, the B-pyramid option allows the user of B-frames as reference frames (effectively as quasi P-frames, in that H.264 uses all three frame types: I, P and B).
Complex modes for motion estimation and motion compensation are beneficial but can increase the complexity of the encode, boosting quality but taking significantly longer times.
• Encoding rates. While the level of motion influences the level of H.264 compression, the rule of thumb is that:
—1080p generic video requires 3-4Mbps
—720p generic video requires 1.5-2Mbps
—480p generic video requires 800Kbps-1Mbps
Alternatively, Sonnati suggested, rather than a static absolute encoding rate, to consider using a content-adaptive resolution/bandwidth mix:
—Medium motion (TV series and news) should use a target bitrate and resolution
—Low motion (talking heads, interviews) should use a lower bitrate or a higher resolution
—High motion (sports, music videos, action movies) should use a higher bitrate or a lower resolution
• Anamorphic. Encoding in anamorphic can save 20-25% of bandwidth, such as using 1440x1080 versus the full 1920x1080 (the same trick that HDV uses) and then stretching the 1440 lines to 1920 at the time of playback. Same is true for 720p, which can be encoded at 1024x720, rather than 1280x720.
The perceived loss in quality caused by the use of a lower resolution is lower than the perceived loss of quality caused by higher quantization.
• Pre-processing. Sonnati suggested pre-processing source video since, eve in HD content, video noise is frequently present, given the types of compressions used for acquisition/capture. He recommended the use of temporal or 3D noise filters.
Still, since HD progressive content doesn’t have interlacing, although interlacing plagues SD source content or 1080i content, Sonnati says that HD progressive content looks better for SD playback.
"It is very important to properly deinterlace such content," said Sonnait, "with the most professional filters like a 'motion compensated adaptive deinterlace filter' which deinterlaces but preserves details and verical resolution."
Additionally, for those standard-definition encoders with a simple de-interlacer, Sonnati suggested encoding at one half the vertical resolution, eliminating the interlacing/second field. For instance, 720x576 could be encoded at 720x288 and interpolated back to 720x576 at playback.
Speaking of playback, Sonnati said that the "video.smoothing=true" command could be used to play back content where the video resolution is different from the preferred playback resolution, such as the anamorphic example above.
Sonnati also suggested considering the use of "details restoring filters" implemented using the standard filter object in Flash Player 8 or – in later versions – the more advanced Pixel Bender. He stressed though, that this is perceptually restoring content and is rather processor intensive for playback.
Sonnati also went into FMS 3.5 Dynamic Bandwidth streaming, which we covered in a previous article during the mid-November Adobe MAX San Francisco event.