Hitchhiker’s Guide to Streaming Media : Container Formats
Container formats are one of the most misunderstood aspects of streaming video and audio technology. Here's a basic explanation.
Mon., Feb. 16, by Dom Robinson
Companies Mentioned:
>>>Lookup: Container Formats

ASF, WMV, RTSP, RTP, RTMP, MPEG-TS, MPEG-PS . . . WTF are these acronyms all about, eh?

A “format” is a way to convert information into binary data and back. Container formats can contain a variety of types of data which have been compressed using compression algorithms known as a codec (Compression/DECompression). Some containers hold a single “elementary stream” such as a video image or a sound track, while other codecs will interleave multiple elementary streams (including metadata and so on) into a single container.

Good examples of containers common in the streaming media world are 3GP, ASF, AVI, and WAV. Note that WMV and MP3 are file types that define the file as being compressed using specific CoDecs (Windows Media Video and MPEG2-Layer3 in this case) and so its easy to be confused.

FLV is an interesting one. This is recognizable to most as Flash Video, but the FLV, while akin to its counterpart in the Windows Media world, the WMV, actually defines the Flash Video Container format, and not a codec. As such FLV is actually akin to ASF, with VP6 or H.264 being comparable to WMV.

RTSP is a good protocol to look at since it clearly sits in the Transport Layer of the OSI layer model and streaming files in “containers” requires a streaming “transport.” Many servers support RTSP, as it is a well-established standard. Windows Media Services has a good implementation. Data compressed with codecs and held in ASF containers can be transported by the Windows Media Services implementation of RTSP. So the layers in Windows Media Services can be represented as:

WMV/ASF/RTSP/(TCP or UDP)/IP then off onto the network

Likewise on the Flash server platforms RTMP (Adobe’s “version of RTSP”) is a transport layer protocol. Flash then layers FLV containers and currently has options including VP6, Sorenson Spark and H.264 for CoDecs. Its stack would look like this:

VP6/FLV/RTMP/TCP/IP then off onto network

Note that RTMP does not support UDP.

So the challenge of writing the container format to the transport layer is a critical task that a true streaming media server must undertake to define itself as such.

Container formats are also independent of transports and oftentimes compressed data stored in containers is used for archiving and for providing on-demand access. Most non-streaming (typically called “store and forward”) transports (e.g. HTTP or FTP) can happily transport any type of archived data and the containers help to define individual items of audio and video content and enable these transports to move the content from storage location to storage location.

Here are some “expert footnotes” from Kon Wilms, who is the chief architect at Streaming Media Hosting:
• You can transmit non-streaming content over streaming via a custom container (most transports allow for this, i.e. DSM-CC sections in MPEG2-TS streams allowing for raw PID encapsulation). RTP and RTSP are capable of encapsulating this content - in this case the onus is on the decoder to know how to correctly decode the data
• When you cannot transmit RTSP over UDP and it is required (i.e. satellite or digital terrestrial IP transmission), then the data flow is video transport -> RTP -> end to end connection -> Edge device RTP decode -> RTSP re-encapsulation and retransmission over TCP/IP based edge network
• Unicast streaming content can be encapsulated a transport making use of FEC, carousels, reliable multicast with final transmission in the IP realm over RTP or straight UDP/IP with custom packet headers. A backchannel from the edge device is usually required though (depends on the source content).