Last In - First Out: Patch Now - What Does it Mean?

When security researchers/bloggers announce to the world 'patch now', are they are implying that the world should 'patch now without consideration for testing, QA, performance or availability'? Or are they advising an accelerated patch schedule, but in a change managed, tested, QA’d rollout of a patch that considers security and availability? And when they complain about others not patching fast enough, are they assuming that the foot draggers are incompetent? Or are they ignoring the operational realities of making untested changes to critical infrastructure?

Consider that:

All patches have a probability of introducing new bugs. That probability is always > 0 and <= 1. The probability is never equal to zero. (And for a certain large database vendor, our experience is that the probability of introducing new bugs is very close to one).
There are many, many bugs that are only relevant under high loads.
A patch that corrupts data, as in databases or file systems, can be impossible to back out or recover from without irretrievable data loss.
Building test cases that can put realistic real world loads on test servers is very difficult, very expensive, and may not uncover the new bugs anyway.
A failed system or application has known, documented consequences. It is not a game of probability or chance. An unpatched security vulnerability is a game of chance where in most cases the odds against you are not known.

As an operations person with real responsibilities, who is accountable to a very large group of paying customers, and who has to make security versus availability decisions almost every day, I need security researchers to uncover, analyze and communicate risks, threats, vulnerabilities and mitigation techniques. The best of the researchers already do that very well, and for that I am very grateful. To those who are doing that for public service, fame, fortune or personal ego, I sincerely thank you, no matter what your motivation. You are adding value to the Internet community.

But when security people push recommendations out to the world without consideration for availability and/or performance, their recommendations remove value from the Internet community.

Security Researchers add value when

Uncovering and analyzing vulnerabilities and active exploits. (Research)
Analyzing probable and improbable attack vectors and calculating and communicating probabilities. (Research)
Testing and verifying attack vectors. (Research)
Communicating to the community the relative and absolute risks of vulnerabilities and consequences of exploitation. (Public Service)
Developing and communicating mitigation options. (Research)

Security Researchers do not add value when

Making blanket patch advice without consideration for performance or availability. (Operations)
Complaining about enterprises that do not follow their advice. (Carping)

(non-exhaustive lists, of course.)

In that context, when I hear 'patch now' advice, You can bet that I will filter the advice through the prism of availability, performance and operational reality.

I'll listen to 'patch now no matter what' advice from a security researcher/blogger who has real time operational responsibility for a large customer base, perhaps 100,000 or more customers, and who, if the patch fails, would be responsible for interruption of service for those hundred thousand customers, and who, if the patch fails, could or would be terminated for non-performance.
I'll listen to 'patch now no matter what' advice from a security researcher/blogger who has had a system with a hundred thousand customers down hard, has escalated to the vendors highest support level, and who has been on a tech support conference call for 13 continuous hours or more.
I'll listen to 'patch now no matter what' advice from consultants who are putting the reputation and existence of their consultancy on the line every time they give a customer advice.
I'll listen to 'patch now no matter what' advice from our own security staff, who I know will not point fingers, duck and hide when the patch goes bad and my systems fail.

As far as I am concerned, if you are in a position like one of the above, you can complain about service providers who do not patch fast enough to suite your preferences. If you are not in that position, you cannot complain when I don't (or your service provider doesn't) patch fast enough for you.

The bottom line is that unless the people who give the world advice to 'patch now no matter what' are also going to write my e-mail's and presentations explaining why my systems failed, unless they will absorb the inevitable backlash from customers, senior management, governing boards and will stand up in front of representatives from my internal business units and get grilled, castigated, chewed up and spit out for my decision, I don't need them to complain that I am not 'patching now'.

I've been in the 7pm vendor conference call with vendor VP and development supervisor, where our CIO came to the meeting with his/her letter of resignation, to be turned in to our CEO should the vendor fail to deliver performance fixes for the business critical application by 7am the next day.
It was not a fun meeting.

'Patch now' advice must be filtered through the prism of availability, performance and operational reality.

2 comments:

AnonymousAugust 11, 2008 at 9:07 PM
So, how does the patching process differ between normal patching and "patch now", assuming you decide it's warranted in a given instance?
Michael JankeAugust 12, 2008 at 11:24 PM
I see 'patch now' as a process that leaves the change management, test and QA cycles intact, but greatly shortens the time lines. This is still an increase in risk over a measured deployment. Time finds bugs, sometimes in your own tests, or in the case of public and widely deployed software, the tests that others perform and report.

The controls that attempt to keep bad code out of production need to remain in place.