In A Crash Course in Failure, Craig Stuntz discusses the concept of building crash only software – or software for which a crash and a normal shutdown are functionally equivalent.
Highlights:
“Hardware will fail. Software will crash. Those are facts of life.”Why shouldn't continuous and automatic state saving be the default for any/all applications? A CAD system I bought in 1984 did exactly that. If the system crashed or terminated abnormally, the post-crash reboot would do a complete 'replay' of every edit since the last normal save. In fact you'd have to sit and watch every one of your drawing edits in sequence like a VCR on fast forward, a process that was usually pretty amusing in a Keystone Cops sort of way. It can't be that hard to write serialized changes to the end of the document & only re-write the whole doc when the user explicitly saves the doc or journal every change to another file. That CAD system did it twenty-five years ago on on 4mhz CPU and 8" floppies. Some applications are at least attempting to gracefully recover after a crash, a step in the right direction. It certainly is not any harder than what Etherpad does- and they are doing it multi-user, real time, on the Internet.
"…if you believe you have designed for redundancy and availability, but are afraid to hard-fault a rack due to the presence of non-crash-only hardware or software, then you're fooling yourself."
"…maintain savable user data in a recoverable state for the entire lifecycle of your application, and simply do nothing when the system restarts."
“…it is sort of absurd that users have to tell software that they would like to save their work. In truth, users nearly always want to save their work. Extra action should only be required in the unusual case where the user would like to throw their work away.”
“Accept that, no matter what, your system will have a variety of failure modes. Deny that inevitability, and you lose your power to control and contain them. Once you accept that failures will happen, you have the ability to design your system's reaction to specific failures. … If you do not design your failure modes, then you will get whatever unpredictable---and usually dangerous---ones happen to emerge.” -- Michael Nygard
References:
A Crash Course in Failure, Craig Stuntz
Design your Failure Mod es, Michael Janke
'Everything will ultimately fail', Michael Nygard