Foreign Character Woes...

As I mentioned a couple of posts back, I recently re-installed Manjaro on a new small form factor Lenovo PC. All has been running well, except that in the last couple of days I noticed that Giocoso had developed a strange habit of not displaying "accented characters" properly: that is, Giocoso suddenly didn't seem able to display letters with their correct diacritic marks, such as é, ê, ß, ö, ç and so on.

For example, this is what Giocoso displayed this morning:

You'll notice that I'm playing something by Julius Röntgen... and pay particular attention to the o-umlaut in that surname. The umlaut is there in the physical folder name: you can see it on the first line of yellow text in that screenshot. So, Giocoso is perfectly capable of displaying characters with umlauts. But something odd has happened in the 'Playing...' part of the display, because there, Giocoso has decided to display 'Julius Rontgen's Symphony No. 3'; there is a distinct lack of o-umlauts in that part of the display text!

I thought maybe I'd tagged that particular symphony incorrectly, so I went to check:

That's a program called Kid3, which is my preferred graphical FLAC-tag inspector. You'll notice that in the 'Artist' tag, Julius Röntgen is spelled entirely correctly, with an o-umlaut. So it's not that the correct 'foreign character' isn't in the data correctly; it's not even that my terminal session is somehow set up not to display accented characters correctly (since it's happy to display o-umlaut when it's in the physical folder name). No, clearly, Giocoso is doing something to the tag data it's reading before it displays it.

I hoped it was a peculiar one-off, but then Giocoso decided to play Poulenc's Concert champêtre... which is a composition requiring the use of an e-circumflex in the last part of its name. Here's the tag data for that:

So the tag data is correct: the e-circumflex is definitely there in both the ALBUM and TITLE tags, as it should be. But when Giocoso plays that recording, I get this:

Again, note the correct e-circumflex in the 'Now In:' part of the display... and its complete absence in the 'Playing...' part. So no, it wasn't a peculiarity associated with just Julius Röntgen, but something that was affecting Giocoso generally: it had apparently lost the ability to display accented characters where it mattered!

I was sure this was new behaviour: the various old PCs I've used in the past have not had this problem -or, at least, I was never aware of it before!

Poking around inside Giocoso's program code, I realised that the program was extracting the composer's name, or the composition name, using commands like this:

displaycomp=$(metaflac --show-tag=Artist "$f" | sed s/.*=//g)
displaywork=$(metaflac --show-tag=Album "$f" | sed s/.*=//g)

...where '$f' is the name of the FLAC file involved. I manually ran those commands in the folders containing 'accented music', with results such as this:

Bingo: there's plain old metaflac stripping the ê from 'champêtre' and displaying it as the simplified 'champetre'. Giocoso relies on the metaflac program to be able to read FLAC tags: if metaflac has suddenly lost the ability to read foreign characters, Giocoso would inevitably do so, too.

So I had a hunt around the metaflac documentation, wondering if a recent software update had caused something to change about the way it handled accented characters... and no such software update seems to have taken place (flac and metaflac are fairly ancient and stable programs by this point, so I'd have been astonished if such a 'breaking update' had been released lately, but it was worth checking). Entirely in passing, however, I happened to note that the metaflac program has long had a --no-utf8-convert runtime switch -a fact I'd been oblivious to for many a decade!- and I therefore thought I'd experiment with that, by way of testing:

Sure enough, in that screenshot you can see the switch has been added to the second running of pretty much the same command as before... and Lo! The e-circumflex is back in the resulting composition name.

So, clearly, metaflac was doing some sort of stripping out of characters, replacing accented ones for 'plain English' equivalents -but I couldn't think why this behaviour should suddenly have manifested itself! So I took a closer look at the metaflac documentation. Here's what it has to say about what that no-utf8-convert switch is meant to do:

Do not convert tags from UTF-8 to local charset, or vice versa. This is useful for scripts, and setting tags in situations where the locale is wrong.

So here, the documentation is saying that metaflac has always, by default, converted UTF-8 tags to whatever your local characterset is set to, unless you use this switch to prevent that from happening. Basically, given that using the switch results in "champêtre" and not using it results in "champetre", the documentation is implying that my local characterset is wrong -or, as it puts it: "your locale is wrong".

Ouch! Whaddyamean, my locale is wrong?! (Warning: deep, dark rabbit holes approaching!!)

An installed Linux system acquires a 'locale' at the time it is first installed. Generally, it's derived from your answers to questions about what keyboard you're using, what time zone you're in and so on. You can chack what locale has been set for your PC with the locale command:

Well, as you can see, my locale is set perfectly correctly: it's using GB English (because I'm based in the United Kingdom... and don't ask why it's en_GB and not en_UK!). More importantly, although it's set to GB English, it's set to use the UTF-8 character set to display that language. UTF-8 is a Unicode character set that means displaying accented characters shouldn't be a problem (and, remembering that my terminal was quite happy to display the words "Röntgen" and "champêtre" with accents when they appeared in the physical folder path, it actually isn't a problem). But clearly, having a UTF-8 locale correctly set didn't seem to be impressing metaflac any... and I have no idea really why that should be.

Now, as it turns out, problems with Manjaro, KDE and system locales are not exactly unknown. After reading that particular post and half a dozen others like it, I still cannot explain what the problem is, however. Clearly, it's related to the fact that I'm using a new installation of Manjaro on my new tiny PC, rather than the previous Kubuntu... but I cannot resolve it beyond that. I had a moment of serendipity when I realised that I'd used an English keyboard layout for my new PC -and all my previous computers had been configured to use US keyboard layouts (because that's what they use in Australia!) and that maybe that would have something to do with it. Unfortunately, a freshly install Manjaro+KDE on an old laptop, replete with very-much-US-keyboard layout also suffers from this can't-display-accented-characters problem. So that wasn't such a serendipitous thought after all 🙁

I fired up a casually-thrown together MX Linux virtual machine:

...and as you can clearly see, it has no problem displaying the e-circumflex in the 'Playing...' part of the window (nor anywhere else, come to that). My MX Linux virtual machine is running the default XFCE desktop, by the way: so KDE is definitely out of the picture there. That difference aside, however, the VM is reading the exact same music files, via an identical NFS mount, just as I do on both my physical desktop and laptop. Yet those physical machines can't see the e-circumflex; but the MX Linux VM can. The locale settings seem to be identical across platforms, too:

Compare that to the all-green Manjaro equivalent screenshot earlier and I think you'll agree that en_GB.UTF-8 has been set for all variables identically in both cases. So: exact same locales; exact same files; accessed via exact same NFS share; yet MX Linux displays it correctly and Manjaro (+KDE) does not.

Did I mention deep rabbit holes?!

I can fiddle around with this for quite a long time, I think, to see if I can get things on Manjaro+KDE back the way they used to be on Ubuntu+KDE... but my patience is wearing a bit thin after several hours of fiddling already, without actually achieving a meaningful result!

So instead, I'm announcing the release of Giocoso version 1.12. It allows for a new run-time parameter: --accents. If that switch is present when Giocoso is run, it forces the use of the --no-utf8-convert switch whenever Giocoso calls the metaflac program. Without that new parameter, however, Giocoso does precisely what it always did: run metaflac with the default of converting between locales when necessary, so that if the locale on your PC is set incorrectly, you would probably experience 'accented character loss'! If you've never had a problem with missing accented ('foreign') characters in Giocoso before now, however, then it probably means either (a) your locale is set correctly to a UTF-8-using character set; or (b) you're not using KDE! And if either of those conditions apply to you, it's fair to say that you probably don't need the new version (but it won't hurt you getting it anyway, since the default behaviour remains unchanged).

Since having to remember yet another run-time switch to get Giocoso to behave is a pain in the proverbial, I've also re-worked Giocoso so that if it tests for the existence of an environment parameter called GIO_ACCENTS. If it's set to a value of 1, then the program will behave as if --accents had been supplied at run-time, even if it actually wasn't. In my .bashrc, for example, I have this bit of configuration:

You can see the last line there sets GIO_ACCENTS to 1 and therefore Giocoso will always behave as if it had been explicitly instructed to 'always display accented characters'. A reboot (or a logout and log back in) will ensure the changes to your .bashrc take proper effect. In my case, after a reboot, launching Giocoso with the bare command giocoso, I now see this:

The e-circumflex is back!

I wish I could explain the regression: I originally wrote Giocoso when I was running Manjaro+KDE -so it's not exactly an exotic combination to me!- and I've always been very clear about the need to support accented characters in a classical music player, since the Germans, French and Russians tend to be quite significant components of any classical music collection and their languages are riddled with accents and diacritic marks! So I was always careful to ensure Giocoso handled that sort of thing perfectly... and it seriously annoys me to now discover this problem has arisen, for reasons I cannot yet fathom. I think it's a Manjaro+KDE issue more than a Giocoso one, but if you are suffering from the same symptoms, at least Version 1.12 can help!

To update, the usual approach is recommended: run the command

giocoso --checkver

...and supply the root/sudo password when prompted.

I will close by making a somewhat cryptic reference to the fact that version 1.12 of Giocoso happens also to include another (minor) new feature which I'd been planning on releasing as version 1.11 before this accented character issue was spotted and took centre stage! So one update gets you two new features! But I'll explain that other new feature in my next post. This one has gone on quite long enough!!