DIY: Rudimentary Video Streaming
Last year I penned an article on the most basic form of streaming i could think of that could easily be accessed by a non-engineer who was able to at least access a Linux command line: rudimentary audio streaming.
While most of us in the business of streaming are technically aware and understand the concepts behind what we do, it is usually only the developers and engineers who "make" the computer stream while business developers and commercial members of the team typically just use some software that was already written, their actual interaction limited to pressing a big green/red button.
These DIY articles are all about empowering non-developers and non-engineers with a dangerous amount of practical technical experience that, while it may never be something used in production, will give them confidence to talk more deeply about the otherwise abstract concepts when engaging with developers and engineers.
This article will introduce a very rudimentary video streaming model very similar to last year's audio one.
I will break the experiment into two models. The first will explore manually sending data over TCP/IP between your computers—essentially reading from the first computer's memory and copying the data to the second computer. When we set that "memory" to be where a video archive is stored on the first, we can then read that data into a media player application on the second and render it to the screen.
The second model will replace the memory read of a static archive on the first computer with a memory read from a live video source, but will otherwise be very similar.
While I have set up a demonstration using a network and two machines, this experiment will also work using two terminal windows on a single machine. It actually will perform almost flawlessly if you do, given the internal "network" of a single machine is as good as you can get, though streaming across the motherboard of a single machine is far less exciting than across a network.
The fundamental workflow setup is very similar to my audio DIY from last year:
You will need two computers with one (at least) being a Linux machine. I used the latest release of Ubuntu 18, but this should work on almost any Linux OS, although there may be nuances in the exact command lines (I'll mention them again when we get to that point). They need to be networked.
For the first model, you will need a test video file. Remember if you are networking either (or both) of the computers via Wi-Fi that will likely be a limiting factor, so the test video bitrate is a consideration. While a 2Mbps compressed video will be easy to handle, a 1.4Gbps HD-SDI raw video will simply soak your Wi-Fi (but it is useful to know that it is *just* that network link capacity that will limit your choice of source content, aside from the media player being able to decode the stream.)
Also if you want to try the second model—live streaming—you will need a frame grabber, and you may face a number of device-specific nuances depending on your unit: I have geared this to a standards-based, "UVC"-compliant USB video capture card since they are cheap, abundantly available (frankly everyone in this industry should have one in their bag!), and more importantly, universally supported. That said, as you will see in the article, there are some limitations to the universality of "standards."
As with the audio DIY we are going to use Linux "pipes" and "redirects"—essentially they allow a combination of commands to pass data and responses between themselves. I strongly recommend having a look online at the awesome power of using pipes.
We will also be using the Linux command "nc" which is an abbreviation of "Netcat." Netcat is a network utility that allows you to read and write data using TCP or UDP over IP networks. Using nc commands I can simply listen on an IP network, and once I get a connection I can simply forward data over the link or receive data from the link. There is almost nothing else involved. It can be a really handy way to pass data around a network, and it is intended to be used by programmers who want to do exactly that.
[Editor's note: Due to the limitations of our publishing CMS here at Streaming Media, we've published the longer passages of code as screenshots below. For text copy of the code, contact the author at email@example.com]
Our first quick test is to check for nc by typing nc on your command line. On my Mac this is the response:
On a Mac, if you don't get that response, then type the following:
brew install Netcat
And on Ubuntu it would be:
sudo apt-get install Netcat
Once installed, you will need your test video file. I will assume you are running your command line from the same directory as the video file—if you are not you may need to add a full path before the file name in the example.
So, on the first machine type the following:
nc -l 1234 < testvideo.mp4
This sets up Netcat to listen on port 1234 and read the data from testvideo.mp4 to the TCP network. (TCP is the default - if we want to set it to UDP we have to add -uafter the nc command. Nuance notice: on some instances of nc this may need to be expressed as -p1234
Now, either in a second console on the same machine or (better) on a console on a second machine enter the following:
nc 192.168.0.237 1234
This replaces the IP address for that of the first computer on your network.
As soon as you press return you will see the second console jump into life and a stream of unreadable machine data will scroll rapidly up the console.
At this point the essence of what I want to demonstrate is actually complete! You have just streamed a video over either local busses on your motherboard (if you are only using one machine) or over both the busses on the motherboards and the network cards/link of both computers!
But no one wants to watch pages of this:
0k?b?M????_?o?söOv?????????9wq?Nw?k?QVÜ??%Am?+???Å?ygY?q?4??i7??? RE9RZ?öj?>k?I?.2w?f???w?zz?'ä'??b ?1
So let's prove this is actually the video. For this we will need something that can receive this data stream as an input, and can make it "human readable"—or, more to the point, play the video!
With Linux pipes and redirects we can direct this data stream away from the default console display and pipe it into a media player application as if it were a file source.
For this we will use Mplayer. Again, check if it is on your machine by typing this:
If it is not installed, then on Mac enter the following:
brew install mplayer
And on Ubuntu:
sudo apt-getinstall mplayer
Normally we would play a video file directly with Mplayer with:
This should "just work" on the machine with the test file. However, on the remote machine we need to read the video file data stream that we were transferring earlier. To do this we need to "pipe" the output of the nc command into the Mplayer command as Mplayer's source. This is achieved by the following steps.
First start the source on the first console again:
nc -l 1234 < test video.mp4
Then on the second console/machine type this (note the "-" at the end is intentional):
nc 192.168.0.237 1234 | mplayer -
This command uses the "|" pipe indicator to redirect the output of the nc command into Mplayer at the "-." Mplayer carries on as before, simply reading the data stream as if it were on disc or coming from any other file location on the machine but nc talks over the network and grabs each byte of data from the source testvideo.mp4 and presents that as the source for Mplayer.
Mplayer then begins to play the video as soon as the first few GoPs have been transferred—it is truly streaming.
For interest, under the hood Mplayer is receiving the bitstream, using the mp4 container format (which it auto-detects using clues such as the .mp4 name and also flags and headers in the bitstream itself) to understand what that bitstream structure is. From this it extracts the compressed video data and applies the decoder for the chosen codec (in my case this was H.264), and the resulting output is then forwarded to my display device driver to be displayed on my screen.
Congratulations: you just made your first video-on-demand service!
If you are streaming over a network, you will most likely notice that the video doesn't playback perfectly. This rudimentary model relies entirely on the TCP stack that nc is talking to to regulate the data flow. If a packet is lost, or scrambled, or if the decoder in Mplayer can't keep up with the data only TCP can slow down the flow, or recover the lost packet, and while it does this Mplayer may have to refuse a few packets, or notice it has missed them, and then re-request them, during which time the playback may jitter.
Less rudimentary streaming services will still ultimately use the same IP stack that nc uses, but they will add buffering into the data read / write process, and this allows the network problems to be resolved in the background before the upstream decoder process knows it was a problem, and this is why ‘proper' streaming services will playback constantly and smoothly.
OK—still with me?
Shall we try to step up a gear and make your first live video streaming service?
When we work with live video, we need to understand a few more concepts. In a simple analog-to-digital live video workflow the first thing we have to do is convert the camera signal into something digital that the computer can work with. For this we use a frame grabber. A frame grabber grabs frames of raw video and digitises them. Imagine it is a very fast stills camera taking 25, 30 or perhaps 50/60 photos each second. Each still is captured by a digital sensor (such as a CCD or CMOS chip) which converts photon counts on each pixel into a digital number, and maps that number to the correct place in a digital array of data. Each still is a "frame." It is a ton of data. That data is then readable through the device's device driver (using IOCTL calls if you want to go deeper) and whatever reads that data has to understand what it is, and how it is structured in order to sensibly process it.
Critically the "whatever" that "reads that data" will vary from machine to machine and OS to OS, but each variant attempts to provide a simplified API for reading from the frame grabber. On Windows it is typically directshow , on Mac it is AVFoundation, and on Linux it is typically video4Linux.
When you plug your USB frame grabber into your Linux machine you will see it listed under the /dev folder. It may even list several devices to include alternative formats to grab:
However the pseudo-file /dev/video0 is not directly accessible - if we try to pipe it's data directly into our streaming model in the place of "testvideo.mp4" it won't work, because communicating with /dev/video0 requires a sequence of two way exchanges of metadata defining how we want to receive the source data from the device. We could potentially write a small c interface to help nc setup the datastream on behalf of the remote Mplayer and then pipe the data through that c interface into nc and onward into Mplayer. Fortunately directshow, AVFoundation, and video4Linux will do that for us (and a whole lot more) and these are natively provided with nearly all operating systems these days.
However these interfaces themselves also require a considerable amount of interaction, and so the easiest way to access the capture card is via an encoding application that is already talking to the interfaces—good choices are FFmpeg, VLC, and the one we are going to use: Mplayer.
I chose Mplayer purely because it is in its own right a 20-year-old tool that is relatively ubiquitous—it is generally more stable than VLC, and makes a nice change to FFmpeg, which is often explored in Streaming Media. That said, FFmpeg is a far more comprehensive technology - indeed Mplayer can use FFmpeg libraries for encoding, etc. Indeed, some of the original FFmpeg team were also Mplayer developers. At the end, I will include a version of this experiment using FFmpeg for the encoding in case you want to try it: you will notice it generates far fewer errors than Mplayer.
So let's put everything together:
On the machine with the capture card (or webcam), check that /dev/video0 exists (the experiment won't work without it!). You can do this (as I did above) with ls/dev/vi*
On the Linux machine with the capture card execute this at the command line:
mencoder tv:// -tv driver=v4l2:width=720:height=576:devi=/dev/video0 -fps 25 -nosound -ovc x264 -o - | nc -l 1234
In plain(er) english this instantiates MEncoder, using the ‘tv' source found through the video4Linux (v4l2) driver (and telling the driver how MEncoder expects the source video to be scaled. I have set "-nosound" simply to not distract us into also capturing audio. I am then using the x264 library to compress the video. I could add further formatting to fix the bitrate or adjust the scaling of x264, but I am letting MEncoder use its defaults—the exercise is not looking at compression etc so for now these are incidental.
Finally the compressed video is being "|" piped into Netcat which in turn is set to listen on port 1234 for any remote connection. When that connection is made the data from the video card will be captured, compressed and written to nc.
The counterpart is then launched on the second console / machine:
nc 192.168.0.237 1234 | mplayer - -cache 32
This commands nc to connect to the remote machine (adjust for your own lan's ip addresses etc) on port 1234 (where the first machine is listening). All data read from that nc connection is ‘|' piped into Mplayer. I have added a cache of 32 - while not essential i wanted to use a small cache so that playback synchronises quicker on start up.
Since we are rather brutally making mencoder write a file to a nc pipe and then asking Mplayer to make sense of a ‘file' that is not really a ‘file' the two take a little while to negotiate all the required metadata that enables the stream to play, and until that metadata gets over the nc link the receiving Mplayer will complain a lot and produce a string of errors.
In addition, once the H.264 codec is put into action it struggles to cleanly interpret frames for a while, producing an explosion of errors.
But after a while a new keyframe is delivered and Mplayer can start to work out what it is supposed to be doing, the cache starts properly filling and after a few seconds Mplayer opens it's display window and starts to render the bitstream into video:
Despite the ongoing of error messages, I found the stream to be stable.
Assuming you got this far, congratulations! You have just made your first live streaming service.
To recap, it is most important to ignore the Mplayer/MEncoder elements of this experiment: These elements are purely interpreting and compressing/encoding the stream, but have nothing to do with its transmission and streaming: that is entirely being handled by Netcat.
MEncoder is using v4l2 as an API to the frame grabber driver and compressing the raw video it is receiving, and Mplayer is purely decoding and rendering the stream on screen. The "streaming" between the two is essentially a very simple TCP/IP connection created with Netcat, and it has no caching or flow control, nor any "clever" retransmission strategies—all of which would be found in any production streaming workflow.
However, it does demonstrate a very crude way that live video can be streamed from a capture card to a remote screen.
For those who want to explore a little more, I found replacing the MEncoder setup with FFmpeg gave a result that produced far fewer errors. I will leave you with my working FFmpeg version:
On the capture machine:
ffmpeg -f v4l2 -i /dev/video0 -vcodec libx264 -f mpegts - | nc -l 1234
On the playback machine:
nc 192.168.0.237 1234 | Mplayer - -cache 1024
Again, this produces a number of errors as it negotiates the stream type—leave it running for a few moments and it will jump into life, and it produces a very stable stream which, once running, produces no errors (for me at least).
So from here perhaps you will work up improvements or (better) simplifications to this as you investigate yourselves, and if you find something interesting to share along the way, then please leave a comment.
[Main article image by Negative Space]