Upcoming Industry Conferences
Streaming Media West [13-14 November 2018]
Live Streaming Summit [13-14 November 2018]
Streaming Forum [26 February 2019]
Past Conferences
Streaming Media East 2018 [8-9 May 2018]
Live Streaming Summit [8-9 May 2018]
Content Delivery Summit [7 May 2018]
Streaming Forum [27 February 2018]

DIY: Live Audio Streaming Using Icecast with FFmpeg
Building on previous DIY articles, this installment will walk you through capturing audio, encoding and packaging with FFmpeg, then playing out through the Icecast server you set up in the last article

In my last article we setup an Icecast server. It was a pretty “meh” exercise in isolation, but it laid the groundwork for this article. In order to live stream audio to multiple listeners we need two things: an encoding workflow and a delivery/distribution workflow. The Icecast server is in effect our distribution workflow. Without a source-encoded stream to distribute, it is pretty boring. It just sits there confirming it is ready. 

In this article we are going to wake it up and send and audio stream. I will then make a few comments that contrast this “proper” streaming workflow with the earlier rudimentary audio streaming article I wrote, which simply used a TCP connection to send audio data across your LAN.

So let’s assume you have the Icecast server up and running, waiting patiently in your cloud platform for a source from your laptop.

Why the laptop? Well we need to present some audio, and so while you could use a file on a disc on another cloud machine, there is nothing very interesting about delivering an audio file from one location to another. What is really interesting is hearing your own voice streaming out. not least because things like latency become much more apparent when you say “hi” and you hear it a few seconds later. You can’t really get a feel for that when you are pressing play and simply hear a recording streaming through the workflow.

Since there is no way to get your microphone plugged into a machine in the cloud, we will use the microphone on your laptop. 

This is the workflow schematic:

 

Our microphone will be connected to the audio capture interface (“line/mic in”). FFmpeg will listen to this input for uncompressed/PCM Audio, and then use an audio encoding codec (mp3 in this example) to compress the audio. FFMpeg will then encapsulate this audio in an Icecast ICY/HTTP/TCP container format (a process called “packaging”) and establish a connection to the Icecast server up in the cloud. 

Once established, we will check for the stream on the Icecast server and finally we will play that stream in a web browser. For good order we then recommend you check the stream can be accessed on multiple browsers across multiple machines.

Note that all these demos can be reworked on Windows OS, but I don’t use windows, so you will have to do a bit of Googling to work out the nuances.

Audio Capture

The first challenge is to get the audio input from your mic source working and usable by FFmpeg.

Mac OSX

First, list all the mic inputs: 

ffmpeg -f avfoundation -list_devices true -i ""

On my machine this produces a lot of responses, but the very last section looks like this:

[AVFoundation input device @ 0x7fe2a8e061e0] AVFoundation audio devices:

[AVFoundation input device @ 0x7fe2a8e061e0] [0] AirParrot

[AVFoundation input device @ 0x7fe2a8e061e0] [1] Built-in Microphone

[AVFoundation input device @ 0x7fe2a8e061e0] [2] Logitech Wireless Headset

[AVFoundation input device @ 0x7fe2a8e061e0] [3] Hercules DJ Console Mk4

[AVFoundation input device @ 0x7fe2a8e061e0] [4] Hercules DJ Console Mk4 Aggregate

[AVFoundation input device @ 0x7fe2a8e061e0] [5] ManyCam Virtual Microphone

[AVFoundation input device @ 0x7fe2a8e061e0] [6] Aggregate-Audio-Device

Note the “[1] Built-In Microphone” – this ‘1’ is the number we want.

So now we can connect FFmpeg to the Built-In Mic using the following: 

ffmpeg -f avfoundation -i ":1" ...

Note that this command won’t actually do much at the moment – all we have done is told FFmpeg that we are using the avfoundation format (“-f”) and input (“-i”) index “:1.”

Linux 

Now lets repeat this on Linux. There are a number of ways to do this, and it may vary if you are using non-ALSA (Advanced Linux Sound Architecture) devices, but given it is fairly commonplace I will stick to methods focusing on discovery relating to ALSA:

$ cat /proc/asound/cards

That will return something like this (will vary with your machine and its hardware):

0 [Intel]: HDA-Intel - HDA Intel

HDA Intel at 0x93300000 irq 22

1 [SAA7134]: SAA7134 - SAA7134

saa7133[0] at 0x9300c800 irq 21

There is an alternative method I ran on a separate machine:

$ arecord -l

Which will return something like this:

**** List of CAPTURE Hardware Devices ****

card 0: NVidia [HDA NVidia], device 0: AD198x Analog [AD198x Analog]

Subdevices: 3/3

Subdevice #0: subdevice #0

Subdevice #1: subdevice #1

Subdevice #2: subdevice #2

What is important here is we see some indexes. On the first machine we had an Intel High Definition Audio device (HDA) and a second audio device (SAA7134). Assuming the mic is connected to the HDA line in (in my case directly connected to the motherboard), we are using device “0.”

In the second example ALSArecord has produced a more granular list, and we have three audio inputs. So we must select any one of those (obviously the mic must also be connected to this). For simplicity we are going to work with subdevice “0” so that the example is the same for both machines.

Our FFmpeg command is now going to look slightly different to the Mac system since we are using ALSA rather than Apple’s AVFoundation sound system.

ffmpeg -ac 1 -f alsa -i hw:0,0 ...

Again note the command is not yet complete, so will fail if you execute it at this stage. Note that we have now specified one audio channel (“ac 1”), told the machine we want to acquire the source from ALSA and use the hardware input (“-i hw” with index “:0,0.” Note that the hw index always specifies hardware device AND sub device – even if we only have one.

Encoding

Now we have captured our audio input, the next step is to compress the audio using our choice of codec. I am going to use libmp3lame. This is a robust and widely used mp3 compression codec. 

Plumbing this into the FFmpeg workflow is straightforward on its own, but the codec also has numerous parameters that can be passed to it to vary the aggressiveness of the compression. These include the target bitrate and delay and so on.

Mac OSX

ffmpeg -f avfoundation -i ":1"

-acodec libmp3lame -ab 32k -ac 1 –re ...

Linux

ffmpeg -ac 1 -f alsa -i hw:0,0 -acodec libmp3lame -ab 32k -ac 1 –re ...

Once again note that the command is not quite complete yet so will still not run in its current form.

“-ab” has set the audio bitrate and “-ac” specifies a mono/single audio channel.

I have included the “-re” setting – which sets the read rate of the input to its native frame rate. This means that if the source is 44khz (or similar), the encoding process will synchronize with it. This should really only be used for live streaming and is not essential, but in my experience if it can be used it will provided a more consistent stream in many cases.

So now we have the audio capture setup to be compressed to 32Kbps mono. 

Packaging and Contribution

On its own the compressed audio is simply some numbers in memory at this stage. The next step is to make this something that can be distributed over a network. 

We must, at this point, make sure you have the Icecast server from the previous article up and running.

You will need several bits of data to complete the command. First of these is the target server’s IP address. This can be found on your cloud management console. In my case the IP was 35.237.210.58. Assuming you used my default setup in the article then the port is set to 8000. In my case, the server can be confirmed as running by entering http://35.237.210.58:8000 (note HTTP not HTTPS) which will open the Icecast console.

The second piece of data you will need is the password that allows an encoder to connect. Assuming you used my default setup from the other article the streaming password is “str3am.” 

The final piece of information we will need is the mountpoint name. “Mountpoint” is a fairly old expression that was used by most streaming servers in the early years – it is the specific stream name to which listeners would connect to listen to your content. Servers are all designed to have multiple encoders connected, and so listeners need a name to specify which it is they want to connect to. This is the mountpoint name. In the setup we are using the name is arbitrary, so you can make up the stream/mountpoint name yourself. 

As a summary of my info (yours may vary) this was my data set: 

  • Server Address: 35.237.210.58:8000
  • Stream Password: str3am
  • Mountpoint: domlive

We now need to collate all that into the FFmpeg command. It will look like this:

MacOS

ffmpeg -f avfoundation -i ":1" -acodec libmp3lame -ab 32k -ac 1 -;re

-content_type audio/mpeg -f mp3

icecast://source:str3am@35.237.210.58:8000/domlive

Linux

ffmpeg -ac 1 -f alsa -i hw:0,0 -acodec libmp3lame -ab 32k -ac 1

-content_type audio/mpeg -f mp3

icecast://source:str3am@35.237.210.58:8000/domlive

Note that the entire command should all be one line (text formatting in the article wraps it).

Acquisition

By now you have probably already given the command a go! Here is the output from FFmpeg when I execute it on my Mac:

Input #0, avfoundation, from ':1':<

Duration: N/A, start: 117809.810408, bitrate: 2822 kb/s

Stream #0:0: Audio: pcm_f32le, 44100 Hz, stereo, flt, 2822 kb/s<

Stream mapping:

Stream #0:0 -> #0:0 (pcm_f32le (native) -> mp3 (libmp3lame))

Press [q] to stop, [?] for help

Output #0, mp3, to 'icecast://source:str3am@35.237.210.58:8000/domlive':

Metadata:

TSSE            : Lavf57.83.100

Stream #0:0: Audio: mp3 (libmp3lame), 44100 Hz, mono, fltp, 32 kb/s<

Metadata:

encoder         : Lavc57.107.100 libmp3lame<

size=      67kB time=00:00:22.31 bitrate=  24.7kbits/s speed=1.28x

The bottom line will update steadily as the stream runs, and so long as the speed stays above 1x then the encoding process can keep up with all the other requirements and produce a good steady stream. If the bandwidth drops, the encoder’s CPU fails, or if there are network issues, then the speed may drop below 1x and if it stays there for any length of time the server will most likely drop the FFmpeg connection.

Streaming

So assuming FFmpeg is now running happily, then your encoded mic audio will be available from the Icecast server. Lets try to play it!

Return to the Icecast console—your IP will be different to mine, but following the pattern, open http://35.237.210.58:8000/

I can now see this:

 

So my connection is made, and I can see the mount point “domlive” active on the server.

Let’s try to listen to it. There are a number of ways to do this, but I don’t want to assume you have VLC or any other external mp3 player (which can be launched by clicking the “M3U” link for example). Instead I want to play the stream directly in the browser.

So we open the following url (changing the IP for your own);

http://35.237.210.58:8000/domlive 

This will open an HTML5 audio client – something like this:

 

And you will hear the audio your mic is capturing. Notice that there is a significant delay—perhaps 5 to 10 seconds, although it may be as low as a second or two on some networks. 

Now all you need to do is replace the mic with your studio feed or your DJ decks and share your version of the http://35.237.210.58:8000/domlive URL to your friends and you have a simple internet radio station in its most basic form.

Depending on the server and its connectivity and CPU (etc.) you should be able to sustain a few thousand connections to this live stream. That WILL incur you bandwidth and CPU costs from your cloud provider so don’t go crazy!

We will look at scaling up beyond a single server in a future DIY article.

Observations

The most important thing to notice is that once this stream is working it is typically very robust and will run for hours, days, or even months without (often) dropping or buffering on the client side. 

Icecast and FFmpeg negotiate their connection and share a considerable amount of useful data (log into the Admin on the Icecast console to start exploring this), all of which ensures that both sides of the link can handle lost packets, variations in the network connection, and so on and so forth. 

This differs very much from the earlier rudimentary audio streaming DIY article model where there was no such transport control and error handling, and as a result the audio stream in that earlier article was very unstable, and behavior was unpredictable. 

Icecast is a quiet giant in the audio streaming space, and coupled with FFmpeg it is a complete, robust, and relatively simple platform to get hands on with live streaming. 

There are many other things that can be done with this setup, and with a bit of jiggery-pokery we can even get Icecast to stream video using FFmpeg, but that is a future article.

In the meanwhile I hope you enjoy getting your hands dirty! See, it's not that scary after all! 

Related Articles
For the second installment in our do-it-yourself series, we look at how to set up a very basic audio stream between two computers on the same LAN. While the coding is simple, the concepts underneath are crucial to an understanding of how streaming works.
In the latest installment of our do-it-yourself series, we'll look at how to set up an Icecast server on an AWS cloud instance.
Can you believe it's been a quarter of a century since the first internet radio broadcast? In this first installment of a two-part series, we talk to some of the trailblazers who started a revolution.