Webcasting With Windows Media
Cropping—You only want to include video that you actually want to be seen. You want to leave out anything around the edges of the frame that isn’t image. This is common in analog sources, where vertical and horizontal blanking can often leave some black edges. A VHS source will often have "tearing" at the bottom of the screen with dramatic distortion of the image. The live encode should be cropped in to exclude that non-image content.
The other time you want to crop is with a letterboxed or windowboxed source. Those big black rectangles use some bits and, more importantly, a lot of CPU time as the codec looks for something in them to encode. Better to crop them out and let all the horsepower apply to the parts of the content that matters.
The mainstream live encoding codec in Windows Media today is Windows Media Video 9, which is widely compatible with existing, installed players. While older codecs are available, they don’t offer the compression efficiency and performance tuning of WMV 9. If interlaced streaming is required, WMV 9 Advanced Profile should be used. Note, however, that decent performance with interlaced encoding required a high-end system like one with a Tarari board or an Inlet Spinnaker unit.
The mainstream audio codec in Windows Media today is Windows Media Audio 9.2. Backwards compatible to players from the ’90s, it offers better quality at lower bit rates than MP3.
For very low bit rate voice-centric audio, the WMA 9 Voice codec can provide better quality than WMA below 32Kbps. And despite its name, in "mixed" mode WMA 9 Voice can provide decent reproduction of music cues (although you’d certainly never dance to it).
When efficiency is critical, the new WMA 10 Professional codec’s modes below 128Kbps can provide exceptional efficiency—at 32Kbps it matches WMA at 64Kbps. However, users need to have Windows Media Player 11 to get full fidelity out of it.
You should note that only WMA is supported in Silverlight 1.0. Professional and Voice are in consideration for future versions.
Data rate measures how many bits per second the video uses. This is mainly determined by the smallest pipe in the link between the encoder and the final viewer, be it between encoder and server, or server and user. And this should be determined by predicting the worst-case scenario—video and audio will be lost with even a short dip in available bandwidth. Since webcasting is a real-time broadcast, there isn’t any provisioning.
Windows Media also includes Intelligent Streaming technology, which allows a single encode to include video and audio at different data rates [see sidebar, "Intelligent Streaming," on page 4 of this article].
Frame rate determines the smoothness of the motion of the video. In webcasting, normally either the rate of the source or half that rate is used (so 29.97 for NTSC, or 14.98 for lower bit rates from NTSC). Cutting the frame rate in half reduces CPU requirements by nearly half and reduces required bit rate by a fair amount as well. But motion is much smoother at the source frame rate, which should be used when there are enough bits available. I prefer to get my frame rate up to 29.97 before going to more than 320x240 resolution.
Complexity is often a forgotten control in encoding, but it’s a critical one. Complexity controls the tradeoff between speed and quality of the encode—each value up is roughly two times slower, but can provide better quality at a given data rate. By default, Windows Media Encoder uses a complexity of 1, which was a reasonable default when it first shipped. But for simple 320x240 encoding, today’s dual-core laptops are several times more powerful than the biggest encoding box available back then. So a typical machine today should be able to handle Complexity 3 instead of 1 for 320x240 29.97 fps.As frame size and frame rate go up, complexity needs to come down on the same hardware.
The buffer size is the window over which the data rate needs to be averaged. For example, encoding at 200Kbps with a two-second buffer means than any two seconds need to use 400kb or less, while using a one-second buffer means any 10 seconds need to use 2000kb or less. But in that 2000kb, there’s much more flexibility to reduce bit rate in the easy parts of the video and increase it in the busier parts, delivering more consistent quality. The big drawback to bigger buffers is that they can increase the time it takes for the viewer to start watching the stream. An eight-second buffer is a reasonable default for most content.
Generally speaking, your keyframe rate is going to be at least as long as your buffer size, since they both control latency and random access timing. Too frequent keyframes can hurt compression efficiency, and result in keyframe "pulsing."
I talked about the registry keys available in version 11 of our codec back in the November/December issue of Streaming Media [pp. 28-32]. I won’t belabor all that here, but what follows is a quick refresher about keys appropriate to webcasting.
Lookahead—The Lookahead parameter tells the codec to store a specific number of frames in memory before deciding how to process them. This enables the codec to get a peek at what happens in the future. Lookahead enables flash and fade compensation and better keyframe insertion, improving quality around those kinds of content.The full Lookahead of 16 is recommended unless very low latency is needed [see sidebar, "Low Latency Streaming," on page 4 of this article].
B-Frames—A B-Frame is a bidirectional frame, meaning that it can reference both the previous and last frame, and hence is more efficient to encode. Turning on B-frames also allows flash/strobe compensation, where flash frames get turned into "BI" frames, which are essentially skip frames. This means that the frame after the flash can be based on the frame before the flash, so a keyframe doesn’t have to get inserted after every flash. This dramatically improves quality for that part of the video.
I recommend setting the codec to 1 B-frame to get those features; 1 is the optimum number for the majority of content.
Threads—The WMV 9 codec supports up to four threads for encoding. These are implemented as vertical slices, so a 512x384 frame encoded with four threads will be encoded as four 512x96 slices. This means that vertical motion that crosses a slice boundary can be a little less efficient. For the vast majority of content, this is a great tradeoff—getting the multithreading in there means that a higher complexity encode can be done, which really helps the quality. But if there’s content with lots of vertical motion (a pogo-stick championship, perhaps?), using fewer threads can help quality someA good rule of thumb is to have at least 64 pixels of height per slice. Here are my recommendations:
• Height <128: Use 1 thread
• Height <256: Use 1 or 2 threads
• Height 256 or higher: Use 1, 2, or 4 threads
A fast computer might be able to encode at Complexity 4 in single-thread mode for frame sizes of 320x240 or lower.
Noise Reduction—The noise reduction parameter is just what it sounds like: a simple filter to reduce noise in the source. Ideally, only clean source will be used, but shooting in low light or with analog, particularly composite, can get pretty noisy. Five levels of reduction are supported; use the lowest one that makes the video clean enough. Too high a level will make the video distractingly soft.