Last In - First Out: June 2008

Patching Strategies - Time to Rethink Conventional Wisdom?

Another 'must read' from Verizon Business Security group. Very, very interesting. Read it. It looks like it's time think about patch strategies and how they fit in with other security countermeasures.

The first point to ponder

"Given average current patching strategies, it would appear that strategies to patch faster are perhaps less important than strategies to apply patches more comprehensively..."

Making sure that all your systems are patched and having thorough, comprehensive system coverage is more important that quickly applying patches but with less thorough system coverage. So essentially you'd be better off ensuring that you don't miss a single computer or server than you would be by spending that same work effort on a faster deployment that leaves a few systems unpatched.

And the second point

To summarize the findings in our “Control Effectiveness Study”, companies who did a great job of patching (or AV updates) did not have statistically significant less hacking or malicious code experience than companies who said they did an average job of patching or AV updates. And companies who did other simpler countermeasures, like lightweight standard configurations, had very strong correlations with reduced risk. The Verizon Business 2008 Data Breach Investigations Report supports very similar conclusions.

(The bold is my emphasis.)

Simple countermeasures, presumably done right, rather than complex, but poorly implemented controls or systems. Keep it simple, but do it right. For example:

both applying default deny ingress and egress router ACL’s (p=0.006) and doing light-weight hardening to a “minimum configuration” (p=0.007) were very highly correlated with lower malcode or hacking events.

Verizons conclusion

Collectively, our “Verizon Business 2008 Data Breach Investigations Report”, along with our earlier studies, suggests that getting the right mix of countermeasures in an enterprise is far from simple. Rather than “do more,” all three studies seem to suggest that we should “work smarter.”

My conclusion

The Verizon Business 2008 Data Breach Investigations Report that I comment on here and Verizon's analysis of their other studies commented on in this post - make it look like the thoughts on least bit system management, on simple, but structured system management, perhaps are on the right track. I've outlined essential transitions for improving availability, and I suspect that there is a similar set of simple, but essential transitions for improving security.

Verizon 2008 Data Breach Investigations Report

Verizon published a report summarizing about 500 data breaches that is worth a read for anyone who is or pretends to be interested in IT security. (Download directly from Verizon)

Some interesting findings

As Verizon notes, the percentages are likely skewed. They are reporting on what they investigated, not on what happened. It's still more than worth the read.

External threats are far more frequent (73%) than internal. The old axiom that the biggest threat is from the inside seems to be archaic. Perhaps, as the report indicates, 'when mainframes ruled the computing world, internal threats were the predominant concern'. That makes sense. When mainframes ruled the world, inside threats were predominant because the mainframes were generally not attached to the outside world. We now have that thing we call the Internet.

Partner threats have greatly increased over time, probably because data exchanges between partners have migrated over time from EDI-like file transfers to inherently difficult to secure VPN-like network connections. And '...In a scenario witnessed repeatedly, a remote vendor’s credentials were compromised, allowing an external attacker to gain high levels of access to the victim’s systems...'. I can see this happening - or more accurately, I've seen this happen. The application vendor requires access to your systems for technical support, the vendor gets compromised, and so do all the customers. Because the vendor used the same credentials for all their customers. The '....partner’s lax security practices...undeniably allow such attacks to take place'. And '...many occasions, an account which was intended for use by vendors in order to remotely administer systems was compromised by an external entity...'

However, the shift toward external and partner threats is not the whole story. 'The median size (as measured in the number of compromised records) for an insider breach exceeded that of an outsider by more than 10 to one.' So as measured by the combination of size + frequency, inside threats are still as big of concern.

And of the internal breaches, half of them are from IT staff. That's a number I'm interested in. Tell me again why every IT staff needs DBA privs or read-only access to the whole database? The combination of IT staff + ODBC + Notebook keeps me awake at night.

On configuration management - For system managers, application managers and DBA's, it's worth knowing that errors of omission '...contribute to a huge number of data breaches. This often entailed standard security procedures or configurations that were believed to have been implemented but in actuality were not....'. So the standard practices, best practices, or whatever, were believed to be implemented, but were not. I'm pretty hung up on determining that a device or application is configured a certain way and knowing through some independent means that the config is really there. Audit scripts, config checkers, etc. Anything but humans. A perl script will find a 'permit any' in a firewall config faster and more reliably than a human every time.

The report indicates that the threat is moving up the application stack - confirming what current thinking seems to be. '...attacks targeting applications, software, and services were by far the most common technique...' at about 40%. But OS/platform ranks closer than I would have guessed, at about a quarter of the attacks. There are far more people writing code for applications than platforms or operating systems. The target is much, much larger, and the developers and admins of the application space are far behind the OS vendors in their security/maturity progress.

On patching, '....no breaches were caused by exploits of vulnerabilities patched within a month or less of the attack....'. Meaning that as the report concludes, it is better to patch consistently than it is to patch quickly. That means that the odd-ball 'appliances' that your vendor won't let you patch, the server-in-the-cube, and the VM I left laying-around-just-in-case need to get patched too. That also indicates that vulnerability scanning, even with a primitive tool, would be valuable in finding the unmanaged, unpatched odds & ends servers.

Attack complexity is skewed toward the simple attack that '...automated tools and script kiddies...' could conduct, at about 50%. The conclusion is to '...implement security measures such that it costs the criminal more to compromise your organization than other available targets...'. So get the simple security controls in place and well managed first. Do that and you are ahead of your peers, and that's what matters.

And where is your data? You really should know, because 'two-thirds of the breaches in the study involved data that the organization did not know was present on the system...'. Probably on a notebook in a coffee shop somewhere. Data containment, or what Verizon calls the 'transaction zone' sounds critical. Move the tool to the data, not the data to the tool. It's easier to secure the tool than the data.

On detection - most incidents were reported by someone outside the organization (70%), and most events were detected long after they occurred (weeks or months). the state of event monitoring appears to be pathetic. Or worse. The Verizon advise is - 'Rather than seeking information overload, organizations should strategically identify what systems should be monitored and what events are alertable.' My personal limit for looking at events is 10,000 per day. After that I get a headache. ;)

And on what you don't know (Quoted from page 24).

Nine out of 10 data breaches involved one of the following:

• A system unknown to the organization (or business group affected)
• A system storing data that the organization did not know existed on that system
• A system that had unknown network connections or accessibility
• A system that had unknown accounts or privileges

We refer to these recurring situations as “unknown unknowns” and they appear to be the Achilles heel in the data protection efforts of every organization.

I suppose it's tough to secure it if you don't know it's there.

Verizons conclusions

For executive types, I suppose:

Ensure essential controls are met
Find, track, and assess data
Monitor event logs

Manager types can read the more detailed conclusions on pages 26-27.

My conclusions

Patch consistently, focusing on completeness and coverage rather than fast patch rollouts.
Implement enough security to discourage the attacker and direct her to simpler targets. You don't need to out run the grizzly bear, you only need to out run the other hikers in your group.
When designing security, think in breadth first, perfection later. Less that perfect coverage of lots of security areas will have far more short term value than perfection in a single area.
Devise automated audits for critical security controls.
Move the tool to the data, not the data to the tool.
Be wary of your partners. Don't trust, but if you have to trust, you must verify.
Monitor critical events only, and don't get distracted by event log volume.
Layer your security, and keep a security layer close to the data.
Pay attention to the application, not just the operating system.

And most importantly - the closer the security layer is to the data, the closer you need to monitor the logs.

Naked Without Strip Charts

The strip chart. Can't live without it.

The classic strip chart is the MRTG network utilization graph. MRTG and its companion RRDTool have to rank as some of the most useful system and network administration software ever written. The world is full of interesting uses for RRDTool and MRTG.

As part of normal application, server and network monitoring we generate about 2500 strip charts every 5 minutes.

Here's examples of how we use them:

Long term trends

Yep -the network load at this site follows an annual calendar, and appears to have grown quite a bit last fall. But then it leveled off this spring. The bandwidth management appliances must be doing their job.

Application load

Application load, as measured by HTTP hits/second, peaks on Mondays, declines dramatically on Saturday, and starts to ramp back up Sunday night. That's good to know. Sunday night is as almost busy as Friday afternoon. And of course this isn't a 'follow the sun' application. It's really only used in a single time zone.

Detecting problems

Does anyone feel like helping track down a connection pool leak?

This is awful. TCP connections from an app server to database shouldn't saw tooth like that. We have 1700 open TCP sockets on a single server. Probably 100 of them are in use. The rest are stale, dead, hung or something.

Something has changed

Round trip time to get a dynamic HTTP page on this application more than doubled a few months ago. Presumably we could go back through a change log and determine what we might have done that caused the response time to change so dramatically. Lets see...the second week of March.....Hmmmm...

Detecting Anomalies

That 500Mbps spike at the beginning of week 19? A denial of service attack, ~~perhaps~~ likely. At least we know that the routers on each end of that circuit can handle 500Mbps.

Reconstructing Events

We know that application response time was bad on Monday. Users told us. Lets dissect the event.

Yep, it was bad. 3.0k ms = 3 seconds. You can see normal response time (RTT) is something closer to 250 ms. For us, 3 seconds is bad. For my bank, my cell phone company, 3 seconds about normal.

Let's see if it was related to web application load. Maybe we had unusually high user generated load.

Nope - not user load related. Monday was just another day.

Lets check the back end database servers.

Dang - that's bad. The green server was buried. And yep - the times line up. From 10am to 4pm.

I wonder what process on the server was using up the cpu?

Look like to me like the red database was the culprit. Of course an Oracle AWR report will let us drill down into the period in question. (Notice also that the blue database has a half-hour periodic CPU spike. There probably is something in either crontab or the Oracle scheduler that will explain that, like perhaps a materialized view that someone needs refreshed twice per hour.)

Conclusion

Strip charts don't help much for up-to-the second performance or troubleshooting data. The operating system or database built in tools are much better for that. But for the types of uses outlined here, strip charts can't be beat.

MRTG and RRDtool are limited only by Perl, which has no limits.