DIY: Rudimentary Audio Streaming
Sending an audio stream across a network is a very simple thing to do in a very rudimentary way with some open source code that is freely available online, and I thought it might be useful to provide a simple walkthrough to show you how to set this up for your own purposes. While extremely simple, this particular DIY model introduces some interesting concepts, and while you may be used to using pre-packaged technologies and robust applications in production, by exploring these extremely simple models I hope you will develop a deeper understanding of how some of these technologies can work.
I hasten to note that on my own company's team I am the one not allowed to deliver production code. In fact, the rest of my team programs using the functional language Erlang, where I am a lowly PHP and Shell Script hacker, with enough capability to read a C/C++ program and get the idea of what is going on sufficiently to find critical hooks for a model, but nowhere near enough ability to write anything serious in any compiled languages. Fortunately the rest of my team are pretty good at turning my hacks and ideas into something production ready!
That’s a roundabout way of saying that my own code in these DIY articles however should not be used in high-scale, high-availability environments!
With this, the second of a series of DIY articles, on the one hand I hope to provide something quick and interesting for software engineers who are proficient in more complex languages and methods, but curious about specific functions required by webcasters. (Click here for the first DIY article, on streaming to multiple destinations.)
On the other hand, I also hope to provide an interesting “hands-on” article for the interested amateur webcaster who wants to implement some of the functionality they require themselves in a practical way. I feel this will help those who—like me—increasingly spend more time in the business of streaming than in the engineering of streaming to maintain a clearer picture of what is going on with the evolving services and capabilities under the hood of the various operators we all work with.
Here’s the workflow we are going to build:
You will need two computers. I no longer use Windows (unless a client insists), so my examples are focused on Mac or Linux OS. If you feel inclined please do work out the example on your Windows machine and share it in the comments!
First we are going to capture the line-in audio on one of the two machines, so make sure you have something, whether a microphone, an MP3 player, etc.—playing some audio into the line-in input.
Next we are going to route that to the network using a “pipe.” Pipes are awesome. Essentially data created or opened by one command can be sent—via a pipe—to another command. I strongly recommend having a look online at the awesome power of using pipes.
In this case we are going to capture the audio from the line-in, format it (so we know what we are handling at the other end of the workflow), and present it to a command which in turn will present the bitstream to the network.
I played with a number of models in creating this article, and to be fair—since this example is so rudimentary—please do not expect production-quality audio nor production stream consistency. I will explain a little more about “what is lacking” to take this model forward to a production system once we have the model up and running.
If you are using Ubuntu or a close variation of Ubuntu, you will need just two things installed, SoX and Netcat. You can test for them by typing “sox” or “nc” at the command line, and if they are not there then you will need to install them using the following:
sudo apt-get install sox netcat
Allow those to install.
If you are using Mac OS X then you can install these with:
brew install sox netcat
Typically these are already installed, but this may not always be the case.
SoX is ‘the Swiss Army Knife of sound processing programmes’ – you can find out more here.
Hidden away in SoX is a command “rec.” This command grabs audio from the default (unless you specify otherwise) audio capture device and normally writes it to a file.
We are also using Netcat, which is a very cool, very simple tool to read and write data across a network. It is well worth exploring in some depth: you can, for example, make a two-way chat engine in just a single command. More info can be found here.
At the command line Netcat is abbreviated to “nc.”
In this model we are not going to write the audio captured with rec to a “real” file; we are going to write the audio bitstream to Netcat, which in turn is going to listen for a connection on the network on port 3333.
So our streaming server, in its complete form (!) is as simple as this:
rec -c 1 -t raw - |nc -l 3333
Yes. Really. That is it!
What it is doing is capturing one channel (“-c 1”) and specifying the type as “raw” (“-t raw”) and instead of specifying a target output file we are using the flag “-“ to let the command know that it should output the resulting recording to the target pipe (“|”) which is in the second half of the command. This second half is the netcat program (“nc”), which is set to listen for a network connection (“-l”) on port 3333.
So now the computer that is capturing the audio is piping the raw audio bitstream to the network and is ready for a remote connection on port 3333. It really is that simple!
This is what it looks like running on my Linux machine:
Note it even has a little audio level meter (the -===|===- indicator)!
OK, so now the machine is waiting for a remote connection to connect to nc.
Lets jump over to the other machine—the one we are going to use to play back the audio. What we have to do now is essentially the same, but in reverse.
The first thing we want to do is connect to the first machine using the Netcat feature.
We need to know the IP address of the first machine, which in my case is 192.168.0.32. The basic use of nc would look like this:
nc 192.168.0.32 3333
But if you simply type that you will end up with streams of seemingly random text on the screen. Actually what you would be looking at is the audio bitstream. Pretty, and as a geek I love to see the stream in this rudimentary form, but its not much fun to listen to!
So we are going to modify our command with—you guessed it—a pipe to pipe the output to another SoX feature called “play,” which will decode the bitstream into something usable by the second machine’s audio devices.
The command, in full, looks like this:
nc 192.168.0.32 3333|play –c 1 –b 16 –e signed –t raw –r 48k -
As you can see we are telling the play command to expect one channel of audio “–c 1” with 16 bit encoding (“-b 16”) which we can find from the information given as we start the source on the first machine (see my screen grab above where it says “16bit Signed Integer PCM”). The encoding is signed (“-e signed”), the stream type is raw (“-t raw”) and the sample rate is 48k (“-r 48k”). You can play with these settings, but the results will either result in some error feedback or odd effects such as double-speed playback and so on.
As soon as you execute the command you will see a similar-looking response (also with that funky primitive level meter!) looking something like this:
And within a few seconds you will start to hear the audio!
Voila! You are now streaming audio across your LAN from one computer to the other in an extremely rudimentary way.
You may also notice some other issues. First, you will notice that the stream occasionally breaks up. It is a very rudimentary setup—there is no caching setup to create a buffer. This means that if you are streaming over Wi-Fi, or if your LAN has any other congestion on it there is little room for managing this. While Netcat in this mode is using TCP, and so eventually all the data will make it through the network, it may be that some attempts fail, and the recovery time takes long enough that the audio playback stops briefly.
Also, the audio quality is somewhat limited: We can tune up the ‘rec’ command to optimize this, and for example you may want to experiment with moving to stereo (“-c 2”) etc, but this will increase the bandwidth requirements and may make the intermittency a little worse.
So don’t expect great things from this approach. But it goes a long way toward showing how a basic bitstream can be sent between two network locations, and highlights how the many wonderful tools that we generally use for modern audio streaming have improved on this basic rudimentary streaming workflow. You only have to think for a moment about Icecast, FFmpeg, SIP, and into RTMP and HLS to realize how many features such as forward error correction and flow control have made this basic process “production ready.”
For now, however I hope this helps you get a feel for the very heart of how a stream can be setup.
I look forward to feedback, comments and alternative models like this as you begin to experiment yourselves!
Can you believe it's been a quarter of a century since the first internet radio broadcast? In this first installment of a two-part series, we talk to some of the trailblazers who started a revolution.
Building on previous DIY articles, this installment will walk you through capturing audio, encoding and packaging with FFmpeg, then playing out through the Icecast server you set up in the last article
In the latest installment of our do-it-yourself series, we'll look at how to set up an Icecast server on an AWS cloud instance.
Want to stream to YouTube, Facebook, and Twitch, but don't need a commercial streaming service and don't mind doing a bit of coding? Read on!