Skip to main content

The Path of Least Resistance Isn't

09/29-2008 - Updated to correct minor grammatical errors.

When taking a long term view of system management

The path of least resistance is rarely the path that results in the least amount of work.

As system managers, we are often faced with having to trade off short term tangible results against long term security, efficiency and stability. Unfortunately when we take the path of least resistance and minimize near term work effort, we often are left with systems that will require future work effort to avoid or recover from performance, security and stability problems. In general, when we short cut the short term, we are creating future work effort that ends up costing more time and money than we gained with the short term savings.

Examples of this are:
  • Opening up broad firewall rules rather than taking the time to get the correct, minimal firewall openings, thereby increasing the probability of future resource intensive security incidents.
  • Running the install wizard and calling it production, rather than taking time to configure the application or operating system to some kind of least bit, hardened, secured, structured configuration.
  • Deferring routine patching of operating systems, databases and applications, making future patching difficult and error prone and increasing the probability of future security incidents.
  • Rolling out new code without load and performance testing, assuring that future system managers and DBA's will spend endless hours battling performance and scalability issues.
Another way of thinking of this is that sometimes 'more work is less work'; meaning that often times doing more work up front reduces future work effort by more than the additional initial work effort. I learned this from a mechanic friend of mine. He often advised that doing what appeared to be more work often ended up being less work, because the initial work effort paid itself back at the end of the job. For example - on some vehicles, removing the entire engine and transmission to replace the clutch instead of replacing it while in the car is less work overall, even though it appears to be more work. With the engine and transmission in the car, the clutch replacement can be a long, tedious knuckle busting chore. With everything out of the car, it is pretty simple.

In the world of car collectors, a similar concept is called Deferred Maintenance. Old cars cost money to maintain. Some owners keep up with the maintenance, making the commitments and spending the money necessary to keep the vehicles well maintained. They 'pay as they go'. Other owners perform minimal maintenance, only fixing what is obviously broke, leaving critical preventative or proactive tasks undone. So which old car would you want to buy?

In the long run, the car owners who defer maintenance are not saving money, they are only deferring the expense of the maintenance until a future date. This may be significant, even to the point where the purchase price of the car is insignificant compared to the cost of bringing the maintenance up to date. And of course people who buy old collector cars know that the true cost of an old car is the cost of purchasing the car plus the cost of catching up on any deferred maintenance, so they discount the purchase price to compensate for the deferred maintenance.

In system and network administration, deferred maintenance takes the form of unhardened, unpatched systems; non-standard application installations, adhoc system management, missing or inaccurate documentation, etc. Those sort of shortcuts save time in the short run, but end up costing time in the future.

We often decide to short cut the short term work effort, and sometimes that's OK. But when we do, we need to make the decision with the understanding that whatever we saved today we will pay for in the future. Having had the unfortunate privilege of inheriting systems with person-years of deferred maintenance and the resulting stability and security issues, I can attest to the person-cost of doing it right the second time.

Comments

  1. I completely agree, and it's good practice to spend your (usually infrequent) free time catching up on the maintenance, otherwise, like you said, it'll catch up with you, and most likely when it's least advantageous.

    ReplyDelete
  2. I worry your argument will fall largely on deaf ears in academia, mainly because your emphasis here is on security as the main example of preventative maintenance. From where I'm sitting (student sysadmin), non-standard software, configurations, etc. etc. eat up far more time than security incidents.

    For example, I inherited from my predecessor a collection of systems running a total of four different distros, which is two more than it reasonably needs to be. Getting rid of one of those distros is going to be a fairly painful process.

    ReplyDelete
  3. @tarheelconx

    Point well taken. The situation that you mentioned probably would have to have been addressed by negotiating with the academics to get the application hosted on a standard distro at the initial deployment. That likely would have been a significant effort, but would have been less work than attempting to move/re-host the app years later.

    I use security as the driver for this, but in reality almost any of the key components of system administration would benefit from understanding where we are accruing deferred maintenance that will have to be caught up on in the future, including things as simple as 'non-standard application installations'.

    Facilities people are really good at this. Odds are your University has a document/report that lists facilities related deferred maintenance (roofs, boilers, etc) that gets presented to your governing board once a year or so. Odds are your IT department doesn't. ;)

    ReplyDelete
  4. I agree so completely that I named my blog after it.
    http://leastresistance.net.

    My choice of the title was a bit sarcastic, but the point is that the Path of Least Resistance sometimes proves to be the wrong path. I used it to call for more network and application monitoring, which is shockingly still underutilized.

    ReplyDelete

Post a Comment

Popular posts from this blog

Cargo Cult System Administration

“imitate the superficial exterior of a process or system without having any understanding of the underlying substance” --Wikipedia During and after WWII, some native south pacific islanders erroneously associated the presence of war related technology with the delivery of highly desirable cargo. When the war ended and the cargo stopped showing up, they built crude facsimiles of runways, control towers, and airplanes in the belief that the presence of war technology caused the delivery of desirable cargo. From our point of view, it looks pretty amusing to see people build fake airplanes, runways and control towers  and wait for cargo to fall from the sky.The question is, how amusing are we?We have cargo cult science[1], cargo cult management[2], cargo cult programming[3], how about cargo cult system management?Here’s some common system administration failures that might be ‘cargo cult’:Failing to understand the difference between necessary and sufficient. A daily backup is necessary, b…

Ad-Hoc Verses Structured System Management

Structured system management is a concept that covers the fundamentals of building, securing, deploying, monitoring, logging, alerting, and documenting networks, servers and applications. Structured system management implies that you have those fundamentals in place, you execute them consistently, and you know all cases where you are inconsistent. The converse of structured system management is what I call ad hoc system management, where every system has it own plan, undocumented and inconsistent, and you don't know how inconsistent they are, because you've never looked.

In previous posts (here and here) I implied that structured system management was an integral part of improving system availability. Having inherited several platforms that had, at best, ad hoc system management, and having moved the platforms to something resembling structured system management, I've concluded that implementing basic structure around system management will be the best and fastest path to …

The Cloud – Provider Failure Modes

In The Cloud - Outsourcing Moved up the Stack[1] I compared the outsourcing that we do routinely (wide area networks) with the outsourcing of the higher layers of the application stack (processor, memory, storage). Conceptually they are similar:
In both cases you’ve entrusted your bits to someone else, you’ve shared physical and logical resources with others, you’ve disassociated physical devices (circuits or servers) from logical devices (virtual circuits, virtual severs), and in exchange for what is hopefully better, faster, cheaper service, you give up visibility, manageability and control to a provider. There are differences though. In the case of networking, your cloud provider is only entrusted with your bits for the time it takes for those bits to cross the providers network, and the loss of a few bits is not catastrophic. For providers of higher layer services, the bits are entrusted to the provider for the life of the bits, and the loss of a few bits is a major problem. The…