How if, when I am laid into the tomb
I wake before the time that Romeo
Come to redeem me? there’s a fearful point!
Shall I not, then, be stifled in the vault,
To whose foul mouth no healthsome air breathes in,
And there die strangled ere my Romeo comes?
–from Shakespeare’s Romeo and Juliet
I wanted the second post in this series to be about distributed systems. A distributed system is any system in which computation is done on separate machines that communicate across a network. As I scanned the web for papers about distributed systems–particularly systems in which information flows across the network in an asynchronous way, I came upon Fischer, Lynch, and Paterson’s Impossibility of Distributed Consensus with One Faulty Process. The paper is short, and made for a quick read. It would be perfect for this month’s blog post, which was already quite overdue. I became interested, however, in another blog that had already done the work for this particular article — the Paper Trail is a great blog, and it has a great summary of the FLP result. I explored this blog and found a treasure trove of posts about other famous CS papers and discovered yet ANOTHER blog that actually paired Brewer’s CAP theorem with an account of a Sex Pistols concert.
Apparently, essays about distributed systems and music are pretty easy to come by.
Rather than duplicate work, I got to thinking about the small shred of the CS-paper universe that I had seen in this ego-slaying discovery of musical online compatriots in CS theory. There were a lot of fundamentally negative results. Turing’s result is negative in nature–a turning machine cannot decide if a given turing machine will be circle-free. The FLP impossibility result sparked a flurry of derivative papers, most of which were variations on a theme–finding some minimal set of constraints on a distributed system that would force consensus, or attempting to extend consensus impossibility to as broad of a swath of systems as possible. Either way, I could not help but feeling that there was something fundamentally negative about the universe that computer scientists could not escape.
Rather than pair this month’s seminal result, the FLP paper, with a work of music, I think it is more appropriate to pair it with a classic work of literature: Shakespeare’s Romeo and Juliet. This classic is arguably a rehashing of two other classics, Tristan and Isolde; and further back, Pyramus and Thisbe, but will do just fine for the present purpose of understanding the fundamental issue with distributed systems: unreliable messengers.
In Romeo and Juliet, the final scenes see Juliet scheming with Friar Lawrence to free herself from the bonds of familial loyalty into true star-crossed-love via mock-death. The plan is to take a drug that will kill her for only two days–with the idea that two days should be more than enough time for her husband-to-be Paris to look elsewhere for a wife. A messenger, working for the Friar (a champion of true star-crossed love) will relay the true nature of the situation–FAKE DEATH–to Romeo, so that he can meet her when she wakes from the 48-hour coma.
Long, poetic story short, the messenger FAILS to deliver, Romeo finds Juliet overdosed, assumes her dead, kills himself, Juliet wakes, finds Romeo dead, kills herself, story over. What does any of this have to do with distributed systems?
As far as the mechanisms of the FLP result go, the key parallel in Shakespeare’s work is that the messenger fails to deliver. The message is super-important–that while Juliet LOOKS dead, she is actually alive. There is a reason why this tale of mistaken death is so timeless: DEAD vs. LIVE is as high as the stakes can get in terms of distributed systems and consensus. Granted, the authors of the FLP result make darn sure that the key process p that is adjacent to the the 0-valent and 1-valent initial configurations NEVER SENDS A MESSAGE in the key run of the system. The proof is a giant extension of the simple fact that that a single, faulty process can lie in between two different results of an admissible run. But it ultimately boils down to the fact that the messenger fails to inform. The subject of the message lives (Juliet), but the listening processes (Romeo) hear news of death, at precisely the wrong time (these moments occur infinitely often, in the system constructed by FLP).
Is the FLP impossibility result really a significant result, given that literature holds hundreds, if not thousands of years of myths surrounding this, the ultimate failed message? The drama of the dead/live mistake has been endlessly optimized from Pyramus to Tristan to Romeo–authors are constantly searching for and finding audiences to WEEP over its truth.
Granted, Shakespeare and his anonymous predecessor authors never intended to prove anything in a rigorous way about asynchronous systems. They were merely trying to construct a great story.
I personally think that the plethora of fundamentally NEGATIVE results in distributed systems can be traced back to Gödel–that his proof revolving around the sentence “this sentence is not provable” is fundamentally a messenger problem. This is a bit tough to explain–the messenger here is hidden, in a way. With distributed systems, the messages are explicit–the messenger speaks about entities separate from himself–inconsistency is achieved very simply by messenger failure that gets propagated; indefinitely in FLP, fatally in Shakespeare. With Gödel, more nuance is required, but the key to proving incompleteness came with the ability of a sentence to say something about the system in which it resided–the formal system’s fundamental flaw is in its ability to talk about itself. No messenger is necessary–the system is its own messenger. The tragedy is embedded, inherent, unavoidable in the systems of Turing and Gödel.
Here are some links to awesome music: