Sunday, September 21, 2008

The Path of Least Resistance Isn't

09/29-2008 - Updated to correct minor grammatical errors.

When taking a long term view of system management

The path of least resistance is rarely the path that results in the least amount of work.

As system managers, we are often faced with having to trade off short term tangible results against long term security, efficiency and stability. Unfortunately when we take the path of least resistance and minimize near term work effort, we often are left with systems that will require future work effort to avoid or recover from performance, security and stability problems. In general, when we short cut the short term, we are creating future work effort that ends up costing more time and money than we gained with the short term savings.

Examples of this are:
  • Opening up broad firewall rules rather than taking the time to get the correct, minimal firewall openings, thereby increasing the probability of future resource intensive security incidents.
  • Running the install wizard and calling it production, rather than taking time to configure the application or operating system to some kind of least bit, hardened, secured, structured configuration.
  • Deferring routine patching of operating systems, databases and applications, making future patching difficult and error prone and increasing the probability of future security incidents.
  • Rolling out new code without load and performance testing, assuring that future system managers and DBA's will spend endless hours battling performance and scalability issues.
Another way of thinking of this is that sometimes 'more work is less work'; meaning that often times doing more work up front reduces future work effort by more than the additional initial work effort. I learned this from a mechanic friend of mine. He often advised that doing what appeared to be more work often ended up being less work, because the initial work effort paid itself back at the end of the job. For example - on some vehicles, removing the entire engine and transmission to replace the clutch instead of replacing it while in the car is less work overall, even though it appears to be more work. With the engine and transmission in the car, the clutch replacement can be a long, tedious knuckle busting chore. With everything out of the car, it is pretty simple.

In the world of car collectors, a similar concept is called Deferred Maintenance. Old cars cost money to maintain. Some owners keep up with the maintenance, making the commitments and spending the money necessary to keep the vehicles well maintained. They 'pay as they go'. Other owners perform minimal maintenance, only fixing what is obviously broke, leaving critical preventative or proactive tasks undone. So which old car would you want to buy?

In the long run, the car owners who defer maintenance are not saving money, they are only deferring the expense of the maintenance until a future date. This may be significant, even to the point where the purchase price of the car is insignificant compared to the cost of bringing the maintenance up to date. And of course people who buy old collector cars know that the true cost of an old car is the cost of purchasing the car plus the cost of catching up on any deferred maintenance, so they discount the purchase price to compensate for the deferred maintenance.

In system and network administration, deferred maintenance takes the form of unhardened, unpatched systems; non-standard application installations, adhoc system management, missing or inaccurate documentation, etc. Those sort of shortcuts save time in the short run, but end up costing time in the future.

We often decide to short cut the short term work effort, and sometimes that's OK. But when we do, we need to make the decision with the understanding that whatever we saved today we will pay for in the future. Having had the unfortunate privilege of inheriting systems with person-years of deferred maintenance and the resulting stability and security issues, I can attest to the person-cost of doing it right the second time.

4 comments:

  1. I completely agree, and it's good practice to spend your (usually infrequent) free time catching up on the maintenance, otherwise, like you said, it'll catch up with you, and most likely when it's least advantageous.

    ReplyDelete
  2. I worry your argument will fall largely on deaf ears in academia, mainly because your emphasis here is on security as the main example of preventative maintenance. From where I'm sitting (student sysadmin), non-standard software, configurations, etc. etc. eat up far more time than security incidents.

    For example, I inherited from my predecessor a collection of systems running a total of four different distros, which is two more than it reasonably needs to be. Getting rid of one of those distros is going to be a fairly painful process.

    ReplyDelete
  3. @tarheelconx

    Point well taken. The situation that you mentioned probably would have to have been addressed by negotiating with the academics to get the application hosted on a standard distro at the initial deployment. That likely would have been a significant effort, but would have been less work than attempting to move/re-host the app years later.

    I use security as the driver for this, but in reality almost any of the key components of system administration would benefit from understanding where we are accruing deferred maintenance that will have to be caught up on in the future, including things as simple as 'non-standard application installations'.

    Facilities people are really good at this. Odds are your University has a document/report that lists facilities related deferred maintenance (roofs, boilers, etc) that gets presented to your governing board once a year or so. Odds are your IT department doesn't. ;)

    ReplyDelete
  4. I agree so completely that I named my blog after it.
    http://leastresistance.net.

    My choice of the title was a bit sarcastic, but the point is that the Path of Least Resistance sometimes proves to be the wrong path. I used it to call for more network and application monitoring, which is shockingly still underutilized.

    ReplyDelete