A Problem of Dates in Giocoso

I was viewing my music play history this morning and noticed this:

That seems to be indicating that a recording which lasts for 14 minutes completed two plays at precisely 7.08pm on the evening of 26th August. That physically makes no sense, of course! So, I checked the database within which Giocoso stores its play history:

Sure enough, the same two plays are listed -though, this time, you can see why there are two plays displayed on the website: their 'playdate' timestamps are different from each other by 1 second. Where the 'real' play concluded at 19:08:46, a spurious second play was then recorded at 19:08:47.

Now, I know why that's happening, in general terms. Giocoso always writes a play's conclusion into the PLAYS table, as it concludes. For abstruse internal reasons that I won't bore you with here, Giocoso then arranges to flush its write buffer to the PLAYS table, too: doing so helps prevent table locks which would stop Giocoso working properly. If the 'flush write' takes place within the same second that the 'conclude write' takes place, a built-in requirement for unique timestamps means the flush-write fails to add an additional record to the PLAYS table: duplication of a play thus doesn't happen. Occasionally, and unpredictably, the flush happens across the 'seconds border' with the conclude write. That is, the play concludes at 19:08:46 but the flush only happens when the clock has ticked across to the 47th second. That means the 'flush-write' doesn't violate the 'unique timestamps' requirement and the flush write is (erroneously!) added into the database as a valid record of a concluded play.

How often has this sort of thing been going on?! Well, this query told me the answer to that:

There's some data manipulation going on there: basically, the query creates a copy of the PLAYS table in which the PLAYDATE is reduced by one second; it then joins the real PLAYS table to this adjusted copy of the PLAYS; and lists those unique DIRNAMES that have PLAYDATES and adjusted PLAYDATES that match. Apparently, I have 16 instances (so, 32 table records in total) where this sort of 'cross-second boundary' flush writes have doubled up on genuine plays. Out of interest, I picked one of the listed examples at random (Benjamin Britten's Owen Wingrave, played on 22nd February this year) and went back and checked the website play history:

Sure enough, there are two 'plays' of the same 1.75 hour-long recording completing within one second of each other. Clearly, the query correctly identifies when these 'false flush records' are taking place. How, then, to correct them? That's easy:

with faulty_records as (
select t1.id as orig_id, t1.playdate, t1.composer, t1.composition, t2.id as adj_id, datetime(t2.playdate,'-1 seconds'), t2.composer, t2.composition from plays t1
inner join plays t2
on t1.dirname=t2.DIRNAME
and t1.playdate=datetime(t2.playdate,'-1 seconds')
order by t1.playdate desc)
select adj_id from faulty_records

That slightly-modified version of the query allows you to list the unique play IDs which are the higher of the two related IDs for each duplicate play. Two screenshots ago, for example, you saw that the original query listed IDs 17470 and 17471 as the IDs associated with the pair of Andrzej Panufnik plays which started this saga: the modified version of the query will list just ID 17471. Similarly, the two Benjamin Britten plays are listed by the original query with IDs 15952 and 15953: the adjusted query will list just the 15953 one.

It therefore follows that if the last line of that code reads 'delete from plays where id in (select adj_id from faulty_records)' rather than merely 'select adj_id...', the duplicate 'flush records' will be eradicated from Giocoso's database. Problem solved!

The more important issue, however, is whether the problem can be prevented from re-occurring in the first place. This is tricky, because it requires comparing the playdate timestamp of a row about to be inserted with the playdate timestamp of a row that already exists -and, if they are within one second of each other when their composer, composition and dirnames all match, the proposed insert should fail (preferably, silently!). This is the role of a trigger and here's a proposal for one:

create trigger if not exists checkplays
before insert on plays
begin
  select
  case
    when datetime(new.playdate,'-1 seconds')*86400.0 in
    (select datetime(playdate)*86400.0 from plays
     where dirname=new.dirname and composer=new.composer and composition=new.composition) then
     raise (abort,'')
  end;
end;

Basically, this says "take 1 second off the proposed new playdate; if it then matches an existing playdate, where the composer, composition and dirname all otherwise match, that's a fail: don't do the insert, but don't send back a message about the failure either".

To see if this works as intended, here's my existing table data:

Obviously, I haven't tidied up my existing duplicate data yet, but that's not important right now. The point is that there's a play recorded of Pafnuknik's Nocturne at 19:08:47. The test must therefore be, if I try to insert a new play of the same piece dated 19:08:48, will that get into the database. Here's what happens currently in Giocoso:

That's me manually inserting a record timed at 19:08:48. The result? This:

Sadly, the database has accepted the new insert, so there are now three plays of the Nocturne all within 1 second of each other! This is Giocoso's current, intermittent behaviour. Now let me set things back to the way they were by manually deleting that 19:08:48 freshly-inserted row, add in the trigger, and try doing that insert again:

That's me creating the new trigger: 'executed successfully'. So now let me try my insert again:

Notice that the insert says 'Execution finished without errors'... but if I check my table data:

...I see only the two rows of Nocturne, not three: the new insert has actually failed to take place, but without any awkward error messages appearing to confuse things. It simply fails silently. One last double-check: suppose I try to insert almost the same record as before (still dated at 19:08:48, in other words) but edit my insert statement so that I'm trying to insert the fact that something called 'Nocturney' was played, rather than 'Nocturne'. Does that change the outcome?

Again, the insert executes without errors... but does the table now contain the new row or not?

Absolutely it does! The tiny alteration in the composition name, by adding just one letter to it, is enough for the trigger to fire, notice that not all the trigger conditions are met, and thus it shuts itself down without interfering in the insert process further. Result: new row gets into the PLAYS table after all. Where composition, composer, directory name all match an existing row in PLAYS with a playdate that's one second away from the proposed new insert's playdate, however, the trigger does correctly intervene, silently blows up the proposed new insert and thus preserves data integrity within the PLAYS table after all.

What does this all mean for Giocoso, then? Well, it means there's going to be a new release coming soon in which this trigger is applied to the PLAYS table automatically to prevent further duplicate play records making it into the database. It will take a few days of me testing it to be certain that it's fit for release, but hopefully I'll release something by early September. As for near-duplicate plays that are already in the PLAYS table? Those I'm not going to fix in a new release. Having Giocoso delete data automatically from PLAYS is not something I want to do as the risks of it deleting data it shouldn't do are just too high. I showed you the query to identify near-duplicate records and explained how to modify it to delete those duplicate records earlier: if you're comfortable with sqlitebrowser, therefore, you've got all the tools you need to remove your own duplicates without me trying to automate it for you (and you can assess and take the potential risks involved in doing so!) As it happens, the existence of 16 duplicates going back thousands of plays in the past couple of years indicates the problem is a rare one, so if you choose to leave well alone and put up with a handful of duplicates in your own play records, I think that's a perfectly understandable and reasonable approach, too.

Short version: watch this space over the next day or two. A new release of Giocoso is coming that will contain this fix to prevent future duplicate records in the PLAYS table.