Jesse Keating made a comment in my previous post on leap seconds, which I thought was worth highlighting in another post, for the benefit of those who don’t read the comments.
This is why rarely executed codepaths suck. Whilst it is tempting to gloat over another Microsoft failure, this could easily have been any other OS. I already mentioned that Linux had suffered something similar once. A bug like this in consumer devices is a nightmarish, but imagine if such a bug ended up in something more critical ? “Sorry, your life support system went offline because there was a leap second”. In safety critical systems, rare codepaths are kind of terrifying.
Writing test cases for bugs like this is also not particularly fun. You’d have to have a fake ntp server for testing the rare case.
Now think about all the other potential ‘only runs once every blue moon’ codepaths in your apps, and imagine the effort required to write test plans for all of them. Not impossible, but certainly a lot of potential job security there for QA folks. Just like fuzz-testing, traditional coverage-testing by just running common workloads aren’t the panacea of testing when there are variables outside your control.
What’s still puzzling to me though.. The Zunes died several hours before 00:00:00 UTC.
Quirk of MSFT’s ntp implementation I guess. *shrug*