Behold! I tell you a mystery... Adventures with ffmpeg

The tool which my Giocoso classical music player uses to actually produce audio output is called ffmpeg, a command line audio and video de-coder and player. It is something of a truism to say that it is an absolute nightmare to use! It's command structure is truly ghastly, with a typical example looking like this:

ffmpeg -i example.mp4 -i LM_logo.png -filter_complex "[1:v] scale=150:-1 [ol], [0:v] [ol] overlay=W-w-10:H-h-10" -codec:a copy example_marked.mp4

And that's a fairly short one! They easily get worse from there 🙁

At it's simplest, however, you can play a FLAC music file with the very basic command:

ffmpeg -i musicfile.flac -f alsa default

...thereby lulling you into a false sense of security that this thing might be a usable way to play music!

If you actually issue that command, however, you'll see this sort of output, itself pretty ghastly, and seemingly all over the place... yet simultaneously being a very detailed technical description of what's being played:

Careful scrutiny of this sort of output will show you that the top part of the screen displays what we might call 'static' output: descriptions about the metadata tags in the file, or the technical nature of the audio signal (whether it's 44.1KHz, stereo, etc). The last line of the screen, however, shows you what might be termed 'dynamic' data: it's forever changing, displaying (in particular) the 'time' played so far. As the playback progresses, that 'time=' value on the last line will keep incrementing (which is, of course, a tad tricky to demonstrate in a still screenshot, but trust me: it's incrementing in real life!). This distinction between the two 'types' of data that ffmpeg displays is important. You'll notice, for example, that every line of the 'static' data is on a new line; the dynamic data, however, always and only resides on a single line, being constantly over-written by the new time data as it's computed.

The significance of this is that the static data is "line delimited": each line of data is terminated by a 'newline' command. The dynamic data, however, is "carriage return delimited" each line is terminated by a 'carriage return' signal, indicating that the cursor should return to position 1 on that line of the screen but that no newline should be created. Typists from waaay back will know that it was possible to return the typing platen back to its start position without engaging the part that rotates the platen and thus advances the paper: you then end up typing over the same stuff you'd already typed. That's sort-of what's happening on that last line of the display. If it didn't work like this, every new second of playback would cause the 'static' display to scroll off the top of the screen, as each new second of 'time=' would get displayed on its own, new line.

The relevance of this? Well, I'll get to that: suffice it to say, it's going to be really important later on! For now, let's undertake the allegedly simple business of capturing ffmpeg's output into a text file we can work with later. The usual way of outputting program display into a text file is just to redirect that output to a text file with the ">" operator (so, for example: echo "Hello World" > test.doc). Let's try that with ffmpeg, therefore:

ffmpeg -i test.flac -f alsa default > test.txt

Here, I'm playing the FLAC as before, but now I'm re-directing the output of that play command to a text file, called 'test.txt'. Re-directing or piping output from one command to somewhere else is a fairly common Unix-y thing to do! But with ffmpeg, it doesn't work: in another terminal, I now type:

cat test.txt

 ...and see this:

In the background, the music is playing away as before, and in the foreground, the test.txt file is... completely empty!

Here's the first issue with working with ffmpeg, therefore: what you see on the screen as it plays is coming from what is termed 'standard error' output, not 'standard output'. Ffmpeg is one of those slightly weird programs that pipes its routine output to the screen as though it was error text, rather than just standard program display. It's actually entirely understandable it does this, to be fair: ffmpeg's main output is, of course, the sound of the audio that's playing. So it's that which is going to standard output, leaving only the 'standard error' sink for its textual output. So: if you want to capture the text containing that all-important 'time=' information, you're going to have to explicitly redirect ffmpeg's error output to the text file, not its 'normal' output. Knowing that the standard error stream is identified as 'stream 2' in terminal commands, the following command will do the necessary deed:

ffmpeg -i test.flac -f alsa default 2>test.txt

...which results in this:

Again, here we see the ffmpeg command on the left: it's not showing any output at all now (because its text output is being redirected away from that window), so it appears to be just 'hanging' there. You'll have to take my word for it that music is still playing out of my speakers, however! On the right is the 'cat test.txt' window, and it's finally displaying information. So, now we know how to capture ffmpeg's output to a text file: the secret is explicitly redirecting stream 2 to the file, not just using the bare '>' operator (which implies use of stream 1, or standard output).

But, let's let the music finish and inspect the end of the captured ffmpeg data:

I've specifically shown you the entire output of the test.txt file to show you that it includes: (a) a bunch of technical information at the top about ffmpeg itself and what codecs and libraries it works with; (b) the static data about the flac file being played, in two sections labelled 'input' and 'output'; and finally (c) a single line of the dynamic data, showing the finishing time of the playback (53:74 seconds in this case).

You might reasonably ask at this point: what happened to the in-progress play time that is displayed when ffmpeg is simply used without redirection to a text file? The very first screenshot at the start of this article mentioned 'time=11.80' for example: where is that timestamp in the text file? Well, it's not there... because, just as happens on-screen when playing directly, when ffmpeg outputs to a text file, it's last line of dynamic data is carriage-return delimited, thus triggering the constant over-writing of the last line of the text file, rather than new timestamps continually being appended to the file.

What this tells you is that if you were hoping to extract the 'current position of play' from ffmpeg's output, you're not going to be able to do it very easily, even after learning the trick of re-directing the standard error output, because ffmpeg doesn't allow any one timestamp to persist before it over-writes it. Here's another example of the problems this 'carriage return-ness' of some of ffmpeg's output causes:

Here, in the first command, I'm issuing a bare tail -F test.txt command. That means to display the end-part of the text file in a continuous manner, 'following' the changes that are happening to the file as ffmpeg plays the flac. Sure enough, I can see 'time=20.27' and if I keep watching, I will see that time increment. As each new 'last line' is written to the file by ffmpeg, over-writing the previous version, the tail command can 'see' it just fine.

So then I add the extra command "| grep time=" onto the end of the previously-working tail command. That says to pipe the output of the tail command through 'grep', which is a program that lets you search for particular bits of text that match a supplied pattern ...and I'm saying the pattern I want to find contains the text "time="... which, from the earlier command shown, you can see this text file definitely does contain. And the result is... nothing at all. The tail command is now just sitting there, doing nothing. Why? Because it can't see the 'time=' text it's been told to search for, because that line of text in the file is carriage-return delimited, not line delimited. If I instead ask to search for the text 'encoder', for example, which you can see is part of the 'line-delimited', static text from further up the file, that works fine:

So, ffmpeg's static data is 'greppable', but the dynamic data is not: even though 'tail' can see both types of data, grep can't operate on the dynamic, carriage return-delimited stuff. That makes polling for the current playback position rather tricky!

Brace yourself, therefore! If you want to capture and use ffmpeg's dynamic data, you have to pass it through to a tool that knows how to deal with carriage-return data. That tool is awk, and its syntax at this point is probably worse than ffmpeg's own! Here's an example:

Here, you see me not re-directing to a text file, but piping ffmpeg's output to awk with a -v RS="\r" switch: that's me telling awk to expect carriage return data. I'm then telling awk that the delimiter between fields in the captured text is an equals sign (that's the -v FS='[= ]' bit, where FS stands for 'field separator'). I'm then telling awk to search for the text "time=". And since there are multiple equal signs in the output, I'm telling it to concentrate on the content of the 4th field delimited by equals signs. The net result, as you see, is a second-by-second output of the current position of play output one line at a time, at last. If I re-direct that output to a text file, I can then grep that text file and extract the relevant time information as being always the last line of the file.

In one terminal, I therefore issue the command:

ffmpeg -i test.flac -f alsa default  |& awk -v RS="\r" -v FS='[= ]' 'NR > 1 && /time=/ {print $4; fflush("/dev/stdout")}' > test.txt

...and in a second, I do:

tail -F test.txt | cut -c-8

And the result is:

I've applied a 'cut' to the output, so that only the first 8 characters are output: I'm not interested in the millisecond data visible in the raw output from ffmpeg-via-awk.

And that's the mystery of how to extract usable data from ffmpeg's raw output finally solved!

Except it isn't!

The dreadful ffmpeg-piped-though-awk command I showed you earlier worked for me, first time of asking, on EndeavourOS (a flavour of Arch Linux). So I confidently copied my code over to a Debian box, and watched as the text file filled up with 10 lines, then sat there doing nothing, before filling up with a fresh 25 lines, before sitting there doing nothing for ten or more seconds and then outputting a further 25 lines or so... and so on. Now, the 'fflush' bit of the earlier command is there to try to stop this sort of behaviour: awk waiting for a buffer to fill before processing it. So this sort of 'burst-behaviour' shouldn't happen when that switch is present. Well, sure enough: on EndeavourOS, the burst-y behaviour was not present; but on Debian, it was. Different versions of utilities obviously respond differently to various commands!

The fix turned out to be relatively easy: instead of awk, use gawk, which is the GNU version of basically the same program. Gawk responds to fflush commands consistently on all distros I tested on, anyway. That means the final 'how can I extract the current position within the playing music from ffmpeg' is this delightful thing:

ffmpeg -i test.flac -f alsa default  |& gawk -v RS="\r" -v FS='[= ]' 'NR > 1 && /time=/ {print $4; fflush("/dev/stdout")}' > test.txt

...which is identical to the earlier command, but using gawk rather than awk after the initial pipe character. And this, finally, really is the answer to extracting 'current position in playback' from ffmpeg itself. It's only taken me a decade to learn any of this! And I couldn't have done it without some recent help on a Unix technical forum, for which I'm very grateful.

The reason any of this is important? Because previously Giocoso has had to jump through hoops to present a 'played X of Y' counter during playback, specifically by counting the number of seconds since we knew playback started, since it didn't know how to access the actual number of seconds of playback by asking ffmpeg. This worked fine so long as you never paused playback -because pausing means the number of seconds since play start keeps incrementing (since time doesn't stand still), but the amount of music played stays fixed. Pausing thus completely screwed with the maths and, as a result, Giocoso could never give you a realistic 'Ending at' indication after you'd ever paused playback. My recent adventures with accessing ffmpeg's own count of where it's got to means big changes afoot in the next version of Giocoso. Watch this space!