1.0 General Thoughts
A central goal of Giocoso has always been: to help uncover music by composers who would otherwise tend to be forgotten. One of its central music-choosing strategies is randomisation, after all: pick something to play at random and it might just as well turn out to be by Kurpinskí or Kenins as by Mozart or Mahler. However, if you were to switch on Giocoso's 'Only play previously unplayed recordings' feature, you would slant the playing pitch the wrong way -because the likes of Mozart or Brahms or Bach wrote huge quantities of music and the likes of Kenins or Novák didn't. It's therefore easy to play the three pieces you have by Novák and have nothing else by him that hasn't been played previously, whilst the well of unplayed Bach or Beethoven is likely to be very deep.
So, is there a way to swing things back the other way and to give the Tormises, Peeterses and Salzedos of this world more of a chance of getting their limited wares before your ears? Sure: that's what Giocoso's new (to Version 3) 'under-played composer' feature is for.
The Underplayed Algorithm (as I'll call it henceforth) takes as its starting premise the idea that if there are 500 composers represented in your music collection, then in a completely fair and reasonable world, each composer would constitute 1/500th of the plays from that collection (or 0.2%). If you had collected music by fewer composers than that, the percentage would go up; with more composers, it would go down.
But 0.2% of what? It can't be simply the count of "plays". A 45-second keyboard sonata by 4 year-old Mozart should not have the same mathematical significance as one of Wagner's lengthier Gesamtkunstwerks! Instead, it must be of cumulative play time: one play by Wagner counts for a lot more than one play by Tylman Susato (say), because it lasts an eternity whereas a Susato dance lasts mere minutes.
Thus, to make a 'fair deal for unknowns' algorithm work, we first have to know the length of time all music plays performed to date have taken. Giocoso computes this as simply the total sum of the DURATION column in the PLAYS table, measured in seconds. We also have to know how many distinct composers we have in our collection: Giocoso computes this as a select distinct of the COMPOSER column in the RECORDINGS table.
If the total duration is X seconds; and if there are Y unique composers in the collection, then a 'fair deal' would result in 1/Y * X as a 'fair play' time: we can use that formula to generate a 'threshold amount of time' below which a composer is being hard-done by; above which, a composer is being played more often than is strictly 'fair'. For ease of use, rather than derive the threshold as a number of seconds of playtime, we can also use straight percentages. If 1/500 is 0.2%, then a composer whose personal, total plays represent 0.3% of all play time is over-played; one whose personal play-time is 0.15% would be considered under-played.
To put some numbers on that: say my collection consists of 602 composers; and its total duration if played end-to-end would be 400,000 seconds (6,666 minutes, or 111 hours). Then we can say any composer whose cumulative plays to date have lasted less than (1/602)*400000=664 seconds (around 11 minutes) would be classed as 'under-played'. A single play of anything by Wagner would clearly make him 'over-played', but even a dozen plays of early stuff by Mozart would have difficulty crossing that threshold. In percentage terms, 1/602 composers is about 0.1667%, so any composer whose total, personal play-time was less than 0.1667% of the 400,000 global play-time would count as under-played.
One slight fly in the ointment remains: you cannot just add up 'cumulative plays to date' and declare the job complete: if your collection has recordings by composers who have never previously been played, they should be regarded as 'under-played' too... but if you were only to count up play durations from the PLAY table, they'd never be considered. Accordingly, the algorithm needs to do the previous mathematic calculation which does exclusively check the PLAYS table, but it must then also bolt on any composer who appears in RECORDINGS and not in PLAYS.
So with all that in mind, here's the actual code which makes up Giocoso's 'under-played composer' algorithm:
select recordings where composer in (
with duration_in_secs as (select composer, substr(duration, 1,2)*3600 + substr(duration, 4,2)*60 + substr(duration, 7,2) as duration_secs from plays),
total_durations as (select sum(duration_secs) as totalcount from duration_in_secs),
composer_durations as (select composer, sum(duration_secs) as composertime from duration_in_secs group by composer),
percentages as (select c.composer, c.composertime, t.totalcount, cast(c.composertime as real) / t.totalcount * 100 as playpercentage
from total_durations t, composer_durations c),
threshold as (select cast(1 as float)/(select count(distinct composer) from recordings)*100 as threshold_value)
select composer from percentages
where playpercentage < (select threshold_value from threshold)
select distinct composer from recordings where composer not in (select composer from plays))
With apologies for the colour-coding for any of my colourblind readers, we can pick that apart as follows:
The 'with' clause introduces five separate sub-queries:
- Duration_in_Secs takes every record in the PLAYS table and, for each, isolates the COMPOSER and DURATION fields, whilst converting the DURATION (which is physically stored in hh:mm:ss format) into a simple count of seconds. So this sub-query tells us 'here, composer X was played for Y seconds; here composer A was played for Z seconds...' and so on)
- Total_Durations takes all the records in the Duration_in_Secs subquery and simply generates a one-line total number of seconds of play for everyone, for all time. This is the sub-query that says every play you've ever made, regardless of who composed it, amounts to 400,000 seconds of play-time.
- Composer_Durations takes all the records in the Duration_in_Secs subquery and generates a per-composer total number of seconds-of-play. It's this query that says 'Copland was played for 450 seconds, Britten for 3,200 seconds, Brahms for 145 seconds...' and so on.
- Percentages takes the results from Total_Durations and Composer_Durations. It takes the per-composer total playtime and divides by the total-total-playtime, to generate a percentage of total play by composer. Here, then, we take Copland's 450 seconds of play-time, divides by the total 400,000 seconds of play-time and conclude that Copland was played 0.11% of the time; Britten's 3200 seconds represents 0.8% of all play-time, and so on.
- Threshold simply counts up the total number of unique composers found in the RECORDINGS table and computes what 1 ÷ that total is: if I have 602 composers in RECORDINGS, the Threshold subquery is going to return a value of 1÷ 602 = 0.166%.
We finally get to the main query: return composer names whose personal play-time percentage is less than the gobal, threshold play-time percentage. These are the underplayed composers: both Copland and Brahms would make my list of the under-played, but Britten would not.
But this doesn't account for composers who have never been played yet. We therefore bolt on an extra query any composer whose name appears in RECORDINGS but which doesn't appear in PLAYS: these are the never-before-played composers, and by adding them via a union all clause to the list of underplayed composers generated by the main query, we now have a complete list of composers we think have been underplayed to date.
Note that the Threshold is computed dynamically. This means it will change as you add new recordings (and new composers) to your music collection. If I own music by 602 composers, the threshold is about 0.166%; but if I add music by 32 new composers, the threshold changes automatically to 1÷634 = 0.1577%. A composer that was barely 'under-played' before the addition (with a play percentage of, say, 0.162%) would now automatically be regarded as somewhat over-played now: there are 32 never-before-played composers to worry about before re-playing this fellow's work again!
At first glance, talking about thresholds of 0.16% sounds ridiculously low -but simple maths indicates it's correct. If you only had music by 10 composers, then each would 'fairly' have 10% of the total play-time; if there were 100 composers, then a fair-share would be just 1%. The moment you have music by more than 100 composers in your collection, the 'fair share' threshold becomes a percentage less than 1%. Once you have music by 1,000 composers, the fair-share threshold becomes less than 0.1%... and so on.
The necessity of using substrings to convert per-play durations from hh:mm:ss format into a simple number of seconds in the Duration_in-Secs sub-query is unfortunate, but arises from the way Giocoso version 1 (and it's predecessor music player) stored play durations, coupled with my reluctance to alter the database structure for backward-compatibility considerations. The conversion works by knowing that if a duration is stored in hh:mm:ss format, then characters 1 to 2 are the hours, which need to be multipled by 3600 to convert to seconds. Characters 4 to 5 are the minutes, which thus need to be multipled by 60 to derive seconds. Finally, characters 7 to 8 are the seconds component, which need no conversion, because they are already in seconds. It's a simple matter to add the three converted numbers together to thus derive a per-play duration in pure seconds.
Another consequence of storing play durations in hh:mm:ss format is that the PLAYS table is not quite accurate -because durations in the RECORDINGS table is actually measured in seconds with multiple decimals. A recording that is said to last 1487.146667 seconds, for example, will be played as lasting for 00:24:47, which is close but not perfect, as the 0.146667 seconds decimals have been discarded. For a composer right on the threshold of being under- or over-played, the rounding that takes place at this point of the calculation might make a significant difference, but it's something that has to be lived with due to prior database design consequences.
A very important point to note: the use of the 'under-played composer' filter is determinative. By this I mean that, as you can see from the SQL query it triggers, it literally restricts the list of composers whose music can be regarded as an acceptable random choice for the next play. It doesn't just mean that an under-played composer might get randomly picked for the next play: it means an under-played composer definitely will be the next pick. You just don't know which under-played composer it will be. This is quite different from, for example, the use of the exempts file to exempt a composer from the usual time bar: mentioning Britten in the exempts file doesn't mean a piece by Britten will be the next one played, but only that a piece by Britten could be the next one played. The exempts file is persuasive; the under-played filter is determinative.
One final point: if you left this filter on long enough, every composer ought to more-or-less reach the threshold and thus cease to be a candidate for play: at which point, Giocoso would have no viable candidates for play and your life would go silent! Think of the filter as a way of roughly levelling the playing field for a time, not as something you'd leave on permanently, therefore.