Monday, December 27, 2010

It is a Platform or a Religion?

Blog posts like this annoy me. "Anyone who was ever fool enough to believe that Microsoft software was good enough to be used for a mission-critical operation..."

I’m annoyed enough to keep that link in my ‘ToBlog’ notebook for over a year. That’s annoyed, ‘eh?

Apparently the system failed and the blogger decided that all failed systems that happen to be running on Windows fail because they run on windows.

A word from the blind.

I've been known as 'anti-Microsoft', having had a strong preference for Netware and Solaris on the server side and OS/2 & Solaris on desktops. At home I went for half to a decade without an MS product anywhere in the house. Solaris on SunRays with Star Office made for great low energy, low maintenance home desktops that ran forever.

My anti-Microsoft attitude changed a bit with NT4 SP3, which even though it had a badly crippled UI, was robust enough to replace my OS/2 desktop at work. My real work still got done on Solaris though. On the server side, I didn't see much to like about Windows, using it only where there were no other choices.

Windows 2003 finally changed my mind. After running W2k3 and SQL Server 2000 'at scale' on a large mission critical on line application, and after having badly abused it by foisting upon it a poorly written turd of an application, and after further compounding the abuse by my own lack of Windows and SQL experience, I had to conclude that one couldn't simply declare that Windows was inferior, or that it didn't scale, or that is wasn't secure. If you wanted to bash Microsoft and still be honest, you'd have to qualify your bashing, hedge it a bit, and perhaps even provide specific details on what you are bashing.

It was after a few months of running a large MS/SQL stack that I was quoted as follows:

'It has display an unexpected level of robustness'

and

'It doesn't suck as bad as I thought it would'.

Now days I'm pretty close to being platform neutral. I have preferences, but they are not religious.

For any application that can run on up to 32 cores, SQL server works. Period. It might work on larger installs, but I don't have experience with them, so I can't comment on them.

Microsoft SQL server has a cost advantage over Oracle, so for any application that doesn't need Oracle Streams, Partitioning RAC or other advanced features, SQL Server tends to be the default. It certainly is far cheaper to meet a typical availability requirement with SQL server than Oracle, so for any application with an availability requirement that allows for an occasional 3 minute downtime (the time it takes for a database cluster to fail over), the Microsoft stack is a viable choice. My experience is that unplanned cluster failovers are rare enough that active/passive failover makes our customers happy.

For applications than can be load balanced, the Microsoft stack can be made as reliable as any other. Load balancing also mitigates the monthly patching that Microsoft requires.

For what it’s worth, the vendor of the ‘turd’  has improved the application to the point where it is a very well written, scalable, robust application running on a very, very robust database and operating system (Server 2008, SQL 2008).

It’s a platform, not a religion, so I reserve the right to change my dogma and preach to a different choir as systems evolve and circumstances change.

FWIW – I really, really liked OS/2’s Workplace Shell. I wish that Apple and Microsoft would figure out how to build a desktop like that.

Yeh, I’m still cleaning out my ToBlog queue.

Saturday, December 25, 2010

Feel kind of sorry for the sysadmins at Barnes & Nobel right now

It looks like they are having a bad day.


Do you suppose there were more shiny new Nooks brought into service than what their system could handle? Figure that last months sales were mostly wrapped for xmas and left idle until last night or this morning, and most of those are being booted and registered in a 24 hour window. A shiny new Nook isn't much fun without books...

...adds up to a rough day for them.

I've been there, as have many of you.

Thursday, December 23, 2010

ToBlog Dump – Time to Clean House

Geeze – Even after periodic culling, I still have twenty+ notes in Google Notebook, fifty-odd notes in Ubernote, and a whole bunch of Google Reader starred items, all waiting to be turned into blog posts.

Ain’t gonna happen. Time to clean house. I’ll dump the most interesting ones into a few posts & cull the rest.

Obviously tracking this sort of thing would be better served by a bookmarking service, but I’ve decided that my professional Internet presence will be Google and Google related apps. I use a combination of Yahoo &Live.com for things that I don’t want associated with my professional presence, and I try hard not to mix them. The only interesting bookmark service is a Yahoo property (for now, at least) so I don’t have a public bookmarking service. Lame? Yes. I don’t have Twitter or Facebook accounts either. Really lame. Maybe even lame2. I still would rather read blogs posts than tweets. Is that lame3 ?

Disclaimer – most of these links are more than a year old, but they’ve survived periodic culling, so maybe they are good links?

Here goes:

A read-once-for-sure and re-read-once-a-year post on the Anatomy of Security Disasters by Marcus Ranum describing …ummmm… the anatomy of security disasters, I guess. Good read.

I saved this DailyWTF post because it shows a bad security device implementation (and what I believe is a bad choice of identifiers). My luck the clowns would store my fingerprint un-hashed. One revocation down, nine to go.

Here’s a good ‘Sysadmin Principles’ list from Steve Stady and Seth Vidal. It’s in plain text, so those of you who surf the web with curl and less can read it too. I like reading what others think are the core system administration principles. To me, doing it ‘right’ has value, and I don’t appreciate people who shortcut just because they are lazy or in a hurry. They get by today, but someone else (probably me) will have to clean up after them later.

It’s possible that we’ll eventually end up migrating from Solaris to Linux. This post by ‘The Unix Blog’ reminded my why I like Solaris. Until Oracle fscks it up anyway.

I read and bookmarked a whole lot of articles about the Heartland breech. The most interesting one is Heartland Sniffer Hid in Unallocated Portion of Disk. Cool, unless you are Heartland or one of it’s victims. I’m a fan of network segmentation and bi-directional default deny firewall rules. I hope it makes a difference, ‘cause it sure is a lot of work to maintain.

A saved a few snarky anti-windows links, mostly written by the blind for the purpose of feeding the trolls. I don’t think that Unix is superior to Windows in every way. In some ways, like patching, I’d much rather have Windows/SQL than Solaris/Linux + Oracle. Microsoft has an really good patch management suite. And no, I don’t think Open Source is automatically bug free, cheaper, faster, easier to manage. My Firefox is on .13 right now. That’s not impressive. It’s annoying.

Speaking of the Microsoft stack, Todd Hoff’s High Scalability blog had a post on scaling StackOverflow.  Buy vs. rent, scale up vs. scale out, all the good stuff.

Here’s a couple more related links:

On a slightly different theme, Michael Nygards Why Do Enterprise Applications Suck was a good read. It’s hard to keep up the energy on backend apps.

I’ll close with a quote from the comment section of this InfoQ post on scalability worst practices:

Marcos Santos:

The Great Knuth said:

"Early Optimization is the root of all evil"

But Marcos Eliziario, who is a poor programmer, known of no one, said at 2 AM after two sleepless days:

"Reckless regard for optimization is the cube of an even greater evil"

Don’t worry, there will be more.

Thomas Limoncelli: Ten Software Vendor Do’s and Don’ts

From a panel discussion at a recent CHIMIT (Computer-Human Interaction for Management of Information Technology), summarized and published at the Association for Computing Machinery. A good read, right through the comments.

Thomas covers non-GUI, scripted and unattended installation, administrative interfaces, API’s, config files, monitoring, data restoration, logging, vulnerability notification, disk management, and documentation. The comments cover more.

Comments on the above:

API’s: In our latest RFP’s, we ask ‘What percentage of your application functionality is exposed via API’s?’ These can RFP’s can have an $8-digit tail on them, so odds are that they actually read them. I like sending messages to vendors.

Installation layout, location: I really like non-OS software to be completely contained in something like /opt/<application>. I don’t like third party software mucking up /etc, /var, or /usr. When I’m done with the software, I want to be able to pkgrm and rm –rf and have a server that is as clean as the day it was installed. I don’t like having to rummage through /usr, /usr/local, /var and /etc looking for remnants of your old install. Odds are I will not be able to figure it out and your junk will be there five years from now. That points seems to be contended in the comments though.

I’m to the point where if your application needs Perl, Python, or miscellaneous libraries, I am going to install a separate copy of the runtime or library in /opt/<application>/lib or /opt/<application>/perl.

More things that I’d like to see on application software:

11. Separately securable administrative interface. I likely will expect to be able to hide the /admin/everyfriggenthing URL behind a different IP address and protect it with a firewall, VPN, load balancer, Apache mod-something, 2-factor, etc. The security of any application interface that has the ability to modify more than one users data should not be treated the same as the interface used by the general public.

12. Updated Java, Python, Perl, … runtimes. Honestly – I have really expensive software from each of the worlds largest 2, 3 and 6 character fortune really-small-number corporations in my shop, and each of them has at least one current product that does not run under a current, non-exploitable JVM. Can you please recompile that crap so it works with a recent run time? A few years ago when Sun announced one of their Java exploits, we opened up security incidents with each of our vendors that had embedded JRE’s, asking for a version of the application that runs on a non-exploitable run time. In every case, the vendors bumbled around for days until they finally admitted that they do not update embedded JRE’s on products when the JRE is exploitable.

13. Software that doesn’t roll over and die when scanned with commonly available vulnerability scanners. That’s just dumb. That tells me that you shipped me software that YOU did not scan with a network vulnerability scanner. Tell me again about the security of your internal corporate network?

I’m afraid this could be a long list.

Thanks for the tip Kevin.

Saturday, December 18, 2010

Wireless Bandwidth Management

We know what some people are thinking:

image

We run a fair sized network (2Gbps inbound during the day). If we didn’t have aggressive bandwidth management, either it wouldn’t function or we wouldn’t be able to afford it.

We don’t charge extra for YouTube though.

Thursday, December 16, 2010

When the weather map look like this….

wwa

Odds are the traffic map will look something like this:

Capture

I’m sure there is a parallel between the DOS attacks that mother nature periodically foists us and internet security. I’ll take a stab at describing the parallels.

Predictability: Snow storms and hurricanes are very predictable (compared to tornadoes, where one has 0-10 minutes warning and rarely has accurate predictions). It is possible to prepare for weather that can be predicted. In certain regions, snow storms or hurricanes are a high enough probability event that you will certainly experience them. The probability of a major snow storm  hitting my house in a particular winter is close enough to ‘one’ that it might as well be ‘one’. Tornadoes, on the other hand, even though there are dozens per year in my region, are localized enough that I probably will never experience a direct hit on my house.

I might tend to be prepared for a predictable event (snow storm), but rest assured that I have not taken any significant precautions for a tornado. I’m playing the odds on that one.

Preparation: Many people prepare for predictable events, some do not. I’m a lifetime veteran of snowstorms, yet I was replacing shear pins and changing oil on my snow blower in the middle of the DOS attack (snow storm). I could have done that in summer, but man was it hot last summer. Way too hot to be changing oil on a snow blower. On the other hand, 2/3 of my vehicles are true 4wd and my 4wd’s have dedicated winter tires, so I normally can get to where I have to go whether I clean my driveway or not. Local governments spend a fortune on DOS (snow storm) preparation.  They have snow removal equipment, snow removal planning, emergency notification, etc. Their preparation allows me to function fairly well even when I am not prepared.

Preparation costs money though, as I can attest when I fill up the gas tanks on my 4wd’s. They cost money every day I drive them; they have twice as many driveline parts and are expected to incur significant driveline maintenance costs, but I only really need them a couple days per year.

Preparation has limits though. Even though I may be able to make it down an unplowed street, my neighbor may not have made it down that street & may be blocking my path, or worse, my neighbor may lose control of his car and whack my car, disabling me in spite of my preparation. In rural areas – the wide open prairie around here – you will be limited by visibility (white out), not traction, so your 4wd vehicle will only serve to get you deep enough into trouble that you can’t dig yourself out.

Don’t ask me how I know. 

As it turned out, 4wd vehicle #1, a Subaru with 7in of ground clearance ,was expected to be operable in an event of magnitude ‘n’ (4-5” of snow), or perhaps marginally operable in an event of ‘2n’ (8-10 inches of snow). It was not expected to be operable in a ‘4n’ event (16-20” of snow) and predictably was not operable on unplowed roads last weekend. My 4wd vehicle #2 on the other hand, a robust pickup truck, was expected to be operable in a ‘4n’ event. After a half hour of trying to get the vehicle out to a plowed road so I could take my pharmacist neighbor to work at the 24 hour pharmacy, I concluded that getting 4wd vehicle #2 back into my driveway would be a far more reasonable outcome. Apparently I’ve either configured 4wd vehicle #2 wrong, or I didn’t have an adequate pre-purchase test plan. The currently working theory is that even though it was purchased for a ‘4n’ event, it is only configured for a ‘3n’ event.

There is no doubt that one could prepare for a ‘4n’ event like last weekend. I’d like to think that someone made a serious calculation on dollars spent versus level of preparation. Odds are though, that nobody did. It probably went more like ‘here is how much money you can have, be as prepared as possible given that constraint’.

Or – in my case – spend whatever is necessary to prepare for a ‘4n’ event, but then configure it wrong and inadequately test it, and watch as it fails to manage the event.

Incident handling: During a storm of magnitude ‘n’, a prepared person might conduct business as usual, perhaps with reduced capacity or response time. For example, one  might still get to work on time, but suffer a longer commute. A storm of magnitude ‘2n’ might cause a prepared person to have degraded operations, cancelling non-essential activities. A storm of magnitude ‘4n’ might cause most activity to come to a halt. Preparation can affect the value of ‘n’. A snow storm that would shut down Washington DC would probably have only a minor effect in Minneapolis or Buffalo.

Last weekends storm might have been a ‘4n’ event – something that maybe occurs every 20 years or so. The round red signs in the above image are closed roads. You get a ticket if you try to drive on them. Odds also are pretty good that you’d fail to make it to the other end of that particular road. The MSP airport has probably the best winter capability of any airport, yet they ended up more closed than not. Most of the municipal snow plowing was halted during the worst of the storm, buses were halted, and even fire trucks and ambulances were severely affected. Operations were certainly degraded during the DOS.

Degraded operations: During the DOS attack, most local governments have some ability to operate in a degraded mode. For example, the State Patrol may close roads, shedding load by restricting traffic to emergency vehicles only. Snow plows may stop plowing streets and only venture out to open up streets for emergency vehicles, airports may restrict incoming flights, schools and business may close, etc. Degraded operations are an accepted outcome of large scale DOS attacks (snow storms), and most entities have a pretty good idea what service need to be maintained during a DOS attack (snow storm).

In my case, degraded operations mode means avoiding travel, maintaining power, heat, Internet and food in approximately that order. Food is the easiest. I still have my Y2K stash in the basement. Year 2038 is just around the corner & I’d hate to be caught unprepared.

Sunday, December 5, 2010

“There is nothing the governments can do to put the genie back into the bottle”

From Paul Homer’s The Effects of Computers:

The “rich and powerful” are rich and powerful precisely because they have access to information that the rest of us don’t have.

And – once you give people the power to access the information:

“There is nothing the governments can do to put the genie back into the bottle”

Wikileaks related. A good read.

Sunday, November 7, 2010

The flaw has prompted the company to consider changes in its development process

The recent WSJ article on banks releasing mobile banking software that stores user names, passwords and bank accounts unencrypted on phones has opened up a sore topic for me.

Apparently we have very, very large corporations chocked full of highly paid analysts, architects, developers and QA staff believing that it is perfectly OK to store banking credentials in plain text on a mobile device a decade into the 21st century.

Something is broke. 

Possibilities include:
  • the bank's analysts, architects, developers and QA staff are unaware of the state of application security in the 21st century. They have no idea that a fraction of the worlds population enjoys compromising other peoples systems and they use the information to steal peoples money.  In other words  - they are unconscious of the environment to which they are deploying their application. They are sufficiently unconscious of their environment that they didn't know that there may be some sort of best practice on the storage of banking credentials. The lack of awareness has made them incompetent to build mobile banking applications. Because they don't know what they don't know, they go ahead and build banking applications anyway. 
  • the bank's analysts, architects, developers and QA staff are conscious of their environment, but are not capable of designing an application that can safely be deployed in that environment. In other words - they are conscious, but incompetent. They know that they need to deploy secure applications, but it didn't occur to them that it has been a couple decades since storage of credentials in plain text has been an acceptable practice. They missed the fact that operating systems and databases stopped doing that towards the end of the last century, and a quick Google of 'how to secure banking applications' didn't turn up anything interesting on the first page. 
  • the bank's analysts, architects, developers and QA staff are conscious and competent, but their highly paid managers and directors told them that building an application to withstand today's on line application environment was out of scope. In that case their management is either unconscious of their environment or they are aware of their environment but chose to ignore it  - due to incompetence, perhaps.
  • the bank's' analysts, architects, developers and QA staff and their managers are conscious of their environment and competent enough to build software for that environment, but some external force caused them to ignore basic decade-old security practices. Perhaps deadlines, market or financial pressures forced them to release the product with known defects. If so, I hope they kept the e-mail trail.
There might be other possibilities,  but writing about them wouldn't be be as much fun. 

I'll qualify all this by saying that I've never managed a staff of hundreds of developers, nor have I ever written a banking application. The largest application I've written was in the tens of thousands of lines of code. That puts me somewhere near the unconscious-incompetent end of the spectrum.

The good news is that:

"The flaw has prompted the company to consider changes in its development process..." says Wells Fargo CIO Mr Tumas. (according to the WSJ.)

I wonder what they'll consider changing, the unconsciousness? the incompetence? the external factors?

I tend to have some sympathy for vendors who get hit by complex stack smashing attacks that exploit their products in ways that are obscure or complex, and I might even have sympathy for vendors who have millions of lines of old, ugly code that predates the current threats. Those are hard problems to solve. On line banking applications for iOS and Andriod were created from the ground up long after  password encryption became the norm. No excuses this time.

When I think about this, I can't help but believe that we are a long, long ways from having a application development culture that values and understands security sufficiently that we can assume that software is relatively secure.

Via Michael Cotes, Mind Tools and the Conscious Competence Matrix 

Monday, October 25, 2010

Application Security Challenges

Assume that all your application security challenges are conquered. You've got smart people and you've trained them well. They catch all their exceptions, they bound their arrays, they else their if's and sanitize their inputs and outputs.

Congratulations. You've solved your biggest security problem.

Maybe.

How about your crufty old apps?
image
You’ve got de-provisioning down pat, right? No old apps laying around waiting to be exploited? Nobody would ever use the wayback machine to find out where your app used to be, would they?
Or
image
An associate of mine did the forensics on one like that. Yes, you can upload a Unix rootkit to a blob in a SQL server, execute it directly from the database process and target a nearby Unix server. Heck – you can even load a proxy on the SQL server, poison the Unix servers ARP cache, and proxy all it’s traffic. No need to root it. Just  proxy it. Network segmentation anyone?
Any old libraries laying around?
image
Yech. I have no clue how someone who downloads [insert module here] from [insert web site here] and builds it into [insert app here] can possibly keep track of the vulnerabilities in [insert module here], update the right modules, track the dependencies, test  for newly introduced bugs and keep the whole mess up to date.
How about those firewall rules?
image
Speaking of de-provisioning – Firewalls rules, load balancer rules… Are they routinely pulled as apps are shut down? Have you ever put a new app on the IP addresses of an old application?
Shared Credentials anyone?
image
That’s that whole single sign-on, single domain thing. It’s cool, but there are times and places where credentials should not be shared. Heresy to the SSO crowd, but valid none the less.
You Management Infrastructure is secure?
image
Of course you don’t manage your servers from your desktop, you’ve realized long ago that if it can surf the Internet, it can’t be secured, so you’ve got  shiny new servers dedicated to managing your servers and apps. Now that you have them, how about making them a conduit that bridges the gap between insecure and secure?
Speaking of System Management

image
Do you system managers surf the Internet? Hang out at coffee shops? Do you trust their desktops?

Same goes for DBA’s by the way. A trojan’d DBA desktop would be a bad day. I’m just too lazy to draw another picture.

Don’t forget Storage management
image
Can’t figure out how to hack the really important [secure] systems? How about cloning a LUN and presenting it to the really unimportant [insecure] system? Your really cool storage vendor gives you really cool tools that make that really easy, right?
Or your Backup Infrastructure
image
We aren’t too far removed from the day when Legato tech support insisted that backing up data through a firewall was unsupported. Even if you tried, you soon figured out that by the time you opened up enough firewall holes to get backups to work, you’ve pretty much lost the ability to segment your network. And after you’ve poked the required Swiss cheese holes, the firewalls role over and die when you stream backups through them anyway. We can’t afford to a dedicated backup server for every app, so we build shared backup networks, right? What do backup networks do - connect every network together?
How about remote access?
image
Capture credentials on the sysadmin’s home computer, try them against the corporate SSH gateway? Don’t worry about it. nobody would ever think to try that. You’ve got two-factor, obviously.
Your Build/Deploy Process?
image
I think it’d be so cool to have my parasite malware get deployed to prod using the enterprise change management/deployment process. Heck – it’d even inherit source control and versioning from the host. Neat!
Take a look around.

When you are buried deep in your code, believing that your design is perfect, your code is checked, tested and declared perfect, and you think you've solved your security problems, stop & take a look at the security challenges surrounding your app.

Thursday, October 21, 2010

Log Reliability & Automotive Data Recorders

When are logs reliable?

Toyota's official answer seems to be either It depends or "The data retrieved from the EDR is far from reliable", unless the data exonerates them, in which case "the EDR information obtained in those specific incidents is accurate".

There’s got to be a blog post somewhere in that.

Accuracy:

  • Did the log record what actually happened. Did the log record when something actually happened?
  • Do the logs represent the events in the order that they occurred?
  • Are the time stamps accurate?
Time syncing all you systems is fundamental, obvious and a best practice for the last fifteen years or so, but unless you log time sync failures, you don't necessarily know if the time stamp on a logs is accurate. I like syslog capable systems that time stamp the logs at the source and syslog servers that time stamp them again as they are caught and written. That helps verify the accuracy of time stamps.

Completeness:

  • Are there gaps in the logs? If so, can we determine where the gaps are?

Unless the logs are stamped with a serial number, odds are that you cannot verify completeness.

I've never testified in court.

I’m not sure what it is, but I’m sure it’s rootable

I have no clue why anyone would still run RealPlayer. I’ve pretty much forgot that it existed.  But I know that those who know what it is and still run it are screwed. If they even know they are running it. They probably don’t. That makes them extra screwed.

If you accidently configure RDS in your Linux kernel, you’ve got something to fix.  From what I can see, we can blamethank Oracle for RootableReliable Datagram Sockets. You’d think that be now we’d be able to introduce something new and interesting without making the old & stable rootable.

I guess not.

If obscure media players and Infiniband protocols are rootable, the most popular OS in the world must be rootable, right? Yep, it’s rootable. Again. Damn. It’s probably also running Java, which makes it double-extra rootable. Speaking of Java, Microsoft thinks that there is an unprecedented wave of Java exploitations. I wonder who wrote the operating system that allows itself to be exploited by such an unprecedented wave. Waves aren’t unprecedented. They are periodic.

I used to think that running a non-Microsoft browser would help keep my desktops clean. I’m not sure anymore though. The alternatives don’t appear to be any less rootable. Nor does the running the best alternative operating system make you immune. Safer maybe, but immune? Nope. Not even close.

Adobe, apparently feeling rather left out by all the recent attention that Java has received, decided that Flash Player, Reader, Acrobat and Shockwave must be vulnerable too. Can’t let the competition leave you behind, can you? I can imagine some VP reading about Java exploits and demanding that all Adobe products support exploits too.

And of course if you are bored, you can remotely root a Blackberry Enterprise Server. All you need to do is have one of your poor sales schmuks open up a PDF on their Blackberry. Sounds like fun, eh? The sales droid opens a PDF, the system manager of the BES server gets screwed.

If it can surf the Internet, it can’t be secured.

Wednesday, September 15, 2010

DNS RPZ - I like the idea

An opt-in real time black hole list for untrustworthy domain names?

Interesting.

Some thoughts:

I certainly don't think that offering the capability is a bad thing. Nobody is forced to use it.

Individual operators can decide what capability to enable and which blacklists to enable. ISP's could offer their customers resolvers with reputation filters and resolvers without.  ISP's can offer blacklisted/greylisted resolvers for their 'family safe' offerings. Corporations/enterprises can decide for themselves what they blacklist.

A reputation based white list would be interesting. Reputation could be determined by the registrar, perhaps based on the registrar having a valid, verified street address, phone and e-mail for the domain owner. A domain that has the above and has been registered for a month or so could be part of a white list. A domain that hasn't met the above could be gray listed. Operators could direct those to an internal 'caution' web page.

A downside:

Fast flux DNS based botnets are a significant issue, but I don't think that a black list of known-bad domains will solve the problem. If a malware domain is created as a part of a fast flux botnet, a black list will never be able to keep up. It could still be useful though. Some malware is hosted on static domains.

Optional:

A domain squatters blacklist. I'd love to be able to redirect address bar typos to an internal target rather than the confusing, misleading web pages that squatters use to misdirect users. I don't care if domain squatters business model is disrupted. They are speculators. They should expect to have their business models disrupted once in a while.

Are we creating more vulnerabilities than we are fixing?

Looking at ZDNet's Zero Day blog:

Sept 15th: Apple QuickTime flaws puts Windows users at risk
Sept 14th: Stuxnet attackers used 4 Windows zero-day exploits
Sept 13th: Adobe Flash Player zero-day under attack
Sept 10th: Primitive 'Here you have' e-mail worm spreading fast
Sept 9th: Patch Tuesday heads-up: 9 bulletins, 13 Windows vulnerabilities
Sept 9th: Security flaws haunt Cisco Wireless LAN Controller
Sept 9th: Apple patches FaceTime redirect security hole in iPhone
Sept 8th: New Adobe PDF zero-day under attack
Sept 8th: Mozilla patches DLL load hijacking vulnerability
Sept 8th: Apple plugs drive-by download flaws in Safari browser
Sept 2nd: Google Chrome celebrates 2nd birthday with security patches
Sept 2nd: Apple patches 13 iTunes security holes
Sept 1st: RealPlayer haunted by 'critical' security holes
Aug 24th: Critical security holes in Adobe Shockwave
Aug 24th: Apple patches 13 Mac OS X vulnerabilities
Aug 20th: Google pays $10,000 to fix 10 high-risk Chrome flaws
Aug 19th: Adobe ships critical PDF Reader patch
Aug 19th: HD Moore: Critical bug in 40 different Windows apps
Aug 13th: Critical Apple QuickTime flaw dings Windows OS
Aug 12th: Opera closes 'high severity' security hole
Aug 12th: Security flaws haunt NTLMv1-2 challenge-response protocolAug 11th: Adobe warns of critical Flash Player flaws
Aug 10th: Microsoft drops record 14 bulletins in largest-ever Patch Tuesday

I'm thinking there's a problem here.

Of course Zero Day only covers widely used software and operating systems - the tip of the iceberg.

Looking at Secunia's list for today, 09/15/2010:

Linux Kernel Privilege Escalation Vulnerabilities
e-press ONE Insecure Library Loading Vulnerability
MP3 Workstation PLS Parsing Buffer Overflow Vulnerability
IBM Lotus Sametime Connect Webcontainer Unspecified Vulnerability
Python asyncore Module "accept()" Denial of Service Vulnerability
AXIGEN Mail Server Two Vulnerabilities
3Com OfficeConnect Gigabit VPN Firewall Unspecified Cross-Site Scripting
Fedora update for webkitgtk
XSE Shopping Cart "id" and "type" Cross-Site Scripting Vulnerabilities
Linux Kernel Memory Leak Weaknesses
Slackware update for sudo
Slackware update for samba
Fedora update for samba
Red Hat update for samba
Red Hat update for samba3x
Google Chrome Multiple Vulnerabilities

Serious question:

Are we creating new vulnerabilities faster than we are fixing old ones?

I'd really like to know.

In some ways this looks like the early immature periods of other revolutionary industries.

We built cars. The early ones were modern wonders that revolutionized transportation and a wide swath of society. After a few decades we figured out that they also were pollution spewing modern wonder death traps. Auto manufactures sold their pollution spewing modern wonder death traps to customers who stood in line to buy them. Manufacturers claimed that there was nothing wrong with there products, that building clean autos with anything resembling safety was impossible, and that safe clean autos would cost so much that nobody could afford them. The customers were oblivious to the obvious. They piled their families into their death traps and drove them 85mph across South Dakota without seat belts (well - my dad did anyway - and he wasn't the fastest one out there, and I'm pretty sure I and my siblings weren't the only kids riding in the back of a station wagon with the tailgate window wide open...).

Some people described it as carnage. Others thought that autos were Unsafe at Any Speed.

Then came the safety & pollution lobbies. It took a few decades, a few hundreds million in lobbyists, lawyers and lawsuits, and many more billions in R&D, but we now have autos that are fast, economical, safe and clean.  A byproduct - completely unintended - was that autos became very low maintenance and very, very reliable. Maintenance windows went from hundreds of miles between shop visits to thousands of miles between shop visits (for oil changes) and tens of thousands of miles per shop visit (for everything but oil).

We need another Ralph Nader.  I don't want to wait a couple decades for the software industry to get its act together.

I'll be too old to enjoy it.

Monday, September 6, 2010

Thoughts on Application Logging

As a follow on to:

I have a few semi-random thoughts on application logging.

Things I like to see in logs are:

Machine parseable (yet human readable) format. I need to be able to write a regex that cleanly separates interesting messages and pipe them into sed/awk and extract critical fields from the messages. I typically use sed/awk/perl to strip out uninteresting parts of the message and sort/count pipe-to-Excel the rest of the fields. I also use logsurfer to catch real time events and alert interested parties. Even organizations with sophisticated tools still need to be able to parse the logs. Bonus points if all messages of a particular type have the same number of fields - or if variable word fields are at the end of the message.

Single line events. No XML. I'm not going to write a custom multi-line XML parser for every random app. Not a chance.

Date/Time stamp on every message. Really. I need to know, at least to the nearest second, when every line of a log was generated. Establishing time lines is critical to troubleshooting and forensics. Synchronized clocks and stamped messages are what makes time lines possible. I like it when the source system timestamps the log and the log catcher also timestamps the log. Then we know if a clock is off.

Rational message prioritization. I need to be able to reliably detect critical messages and do something with them. The ability to filter on some sort of priority is key. A simple regular expression should be able to extract interesting messages.

Unique identifiers for log message types. Cisco's firewalls generate syslogs with an identifier that uniquely identifies the log message type. No message are emitted from a firewall that are not uniquely identified and documented. (Don't open that link on a mobile connection - you'll use up your quota...). With PIX/ASA logs, I can write a regex to catch %ASA-4-411004, find all cases where network interfaces were administratively enabled, match them up with access logs & figure out who enabled the interface. Then I can check change management and figure out if it was part of a plan or part of a rootkit.

Sufficient information to link the logs from the application to upstream or downstream systems. I'd think that some form of user session identifier logging would be useful on most apps. User IP address is a candidate for an upstream identifier. Web, firewall and netflow logs can be correlated by IP & date/time. Web app session ID or similar perhaps would link the application logs together, and if stored along with a userid could link application activity to database and web server activity. Once the problem is narrowed down to a specific user, being able to track that users session would be very useful.

I once needed to link application activity to an IP address. I couldn't do it directly as the app didn’t log IP addresses. The only thing that saved me was that the particular activity was only possible through a given URL, and that URL had only been called from one IP address in the window surrounding the event. Had there been more than one instance of that URL in that time window, we would not have been able to correlate the event with an IP address. I sometimes am able to correlate application activity to an IP address (application, firewall & load balancer logs), the IP address to a MAC address (DHCP logs), the MAC address to a userid (AD domain logs), the MAC address to a physical computer (switch CAM tables), the physical computer to a person (security cameras).

This sort of information occasionally gets used by law enforcement as a part of various investigations, so the preference is to be able to link various log sources together unambiguously.

Logging of failures is as important as logging of successes. Failed privilege escalation attempts (for example) are always interesting. Any failure of that type is either a broken/misconfigured app, a hack attempt, or a sysadmin is ef’ing around on a production system. Firewall denies for packets source from inside the firewall are similar. If a server inside a data center is attempting to connect to a blocked port/IP on another system or subnet, something is wrong. Either the server/application is configured wrong, or the firewall is configured wrong, or the rootkit is making a reconnoiter pass through the data center (or sysadmins are ef’ing around on production systems).

Timeliness. Logs must leave the system in near real time. Logs that have been on a compromised system for more than a few minutes (or seconds?) after the compromise are presumed tainted.

Bonus points: Serialized numbering on log messages. That way I know if I'm missing any messages. If I am missing messages, either something is broke or the rootkit deleted some of my messages. Heck - I could even write a log catcher that logged something like 'Expecting message #86679514, received message #86679517, 3 messages missing'. (I have a Netflow collector that does that).

I’ve often  run into DBA’s and application folks who are afraid to generate large volumes of logs. I’m not afraid to generate large volumes of logs, provided that the logs are light weight, clear, concise and readable.  We currently log tens of thousands of firewall, DNS & netflow records each second. If they show up as light weight syslog-like events, they are not hard to handle. On our largest application the vendor has a separate database just for user activity logging. With a reasonable purge strategy in place, it’s still larger than the production database. That’s OK though. It’s good data, and it’s not a difficult database to maintain.

Update: Gunnar has a series of posts on application logging, A quote from the first post:

By climbing the stack and monitoring the application, you collect data located closer to the core enterprise assets like transactions, business logic, rules, and policies. This proximity to valuable assets make the application an ideal place to see and report on what is happening at the level of user and system behavior, which can (and does) establish patterns of good and bad behavior that can provide additional indications of attacks.

ZFS and NFSv4 ACL’s

I've been doing granular file access control lists since Netware 2.0. I'm used to being able to specify (for example) permissions such that a file can be modified, but not renamed or deleted, or setting permissions on a file so that it can be executed, but not read - (Yes, Netware could do that). And of course, it's obvious that more than one user or group permission should be allowable. I'm also used to having some control over inheritance, so that I can 'kneecap' permissions on a nested directory.

Obviously I've been very unimpressed with Unix's trivial rwxr-x--- style permissions. Sun band-aided the decades old rwxr-x--- up with POSIX getfacl and setfacl. That was a start. We now have NFSv4 style ACL’s on ZFS. It looks like they are almost usable.

For an experiment, I decided to clean up a few 'home directories' where the existing permissions are a mess of randomness left over from a decade of ufsdump/ufsrestore, POSIX ACL's, tar, cpio, pax, samba, rsync and who knows what else. Here's my attempt at simple ACL's on an OpenSolaris ZFS volume.

Specific requirements:

  • Owner gets the equivalent of 'full control'.
  • Group gets the equivalent of 'read only'.
  • Everyone gets nada.
  • Newly created files get predictable permissions

To ensure predictable permissions, I want inheritance in some form or another such that:

  • New files are automatically created to allow owner the equivalent of read, write, create, delete, modify, including ACL's and attributes, but without the ‘execute’ bit set.
  • New files are automatically created to allow group 'read-only' but without the ‘execute’ bit set.
  • New directories are automatically created to allow the owner the equivalent of read, write, create, delete, modify, browse, including ACL's and attributes.
  • New directories are automatically created as group read and browse.
  • New files and directories are automatically created with no permissions for ‘everyone’

Keep in mind that the newest ACL implementation needs the Solaris version of ls, chmod, etc., rather than the default gnu versions that ship with OpenSolaris. Also – I’m using Solaris ‘CIFS’, not samba.

First I set:

zfs set aclinherit=passthrough-x  filesystem

passthrough-x appears to mean 'only inherit the 'execute' bit if the application specifically requests the bit when the file is created'. At least that's what it appears to mean.

Then I fixed existing files. Note that I wanted to touch only the files (not the directories), hence the 'find'.

find . -type f  -exec /usr/bin/chmod A=\
owner@:rw-pdDaARWc--s::allow,\
group@:r-----a-R-c---::allow,\
everyone@:full_set::deny {} \;

Explanation:

find . -type f  \
-exec /usr/bin/chmod A=\
<= The 'A=' resets all ACL's rather than adding more ACL's
owner@:rw-pdDaARWc--s::allow,\ <= Set file owner to 'full control' minus the execute bit.
group@:r-----a-R-c---::allow,\ <= Set group to 'read'.
everyone@:full_set::deny {} \;  <= Set everyone else to 'deny all'.

This has a side effect of removing the execute bit from executable files. My standard policy is 'no executable files in home directories'. Those smart enough to know what the 'x' bit is are smart enough to know how to fix what just  broke. I wouldn’t do this in directories full of executable files.

Lastly, I tweaked the directories. Setting inheritance ensures that new files and directories have the desired ACL's:

find . -type d -exec /usr/bin/chmod  A=\
owner@:full_set:d:allow,\
owner@:rw-pdDaARWc--s:f:allow,\
group@:r-x---aAR-c---:d:allow,\
group@:r-----a-R-c---:f:allow,\
everyone@:full_set:fd:deny {} \;

Explanation:

find . -type d \
-exec /usr/bin/chmod  A=\
<= The 'A=' resets all ACL's rather than adding more ACL's
owner@:full_set:d:allow,\ <= Set directory owner to 'full control' with inheritance for newly created directories, including the execute bit.
owner@:rw-pdDaARWc--s:f:allow,\ <= Set directory owner to 'full control' with inheritance for newly created files, excluding the execute bit.
group@:r-x---aAR-c---:d:allow,\ <= Set group  to 'rx-' with inheritance for newly created directories
group@:r-----a-R-c---:f:allow,\ <= Set group to 'r' with inheritance for newly created files
everyone@:full_set:fd:deny {} \; <= Kneecap everyone else

In theory, new files will be created with the equivalent of rw-r-----, new directories will be created equivalent to rwxr-x---.

Maybe.

Helpful docs:

Sunday, August 29, 2010

Engineering by Roomba’ing Around

A simple random walk algorithm:
  • Start out systematically
  • Hit an obstacle
  • Change direction
  • Hit another obstacle
  • Change direction
  • Eventually cover the problem space.

As applied to the problem of cleaning a floor, the algorithm seems to work OK, particularly if you are willing to ignore the parts of the problem space that the device cannot solve (corners, low furniture, complex spaces).

I sometimes see similar algorithms used by IT engineers. They start out systematically, hit an obstacle, head off in a random direction, hit an obstacle, head off in a different direction, and (usually) solve the problem (eventually). Unfortunately many IT engineers troubleshoot this way.

It could be worse – Some engineers start out systematically, hit an obstacle, and instead of changing direction, they just keep on banging into the obstacle. They haven’t figured out that even a random direction change is better than no direction change.

I also see IT engineers ignore the problems that their tool or project cannot solve. Roomba owners presumably understand that the device has limitations and they compensate for those limitations by using conventional cleaning techniques in places where the Roomba has no coverage. With a Roomba, the house isn’t clean until the corners are covered. With IT projects, I sometimes see the project declared a success long before the corners are covered, and I often see requirements that are not easily addressed by a particular tool, technique or workgroup either ignored or arbitrarily declared ‘out of scope’. (I.E. ‘we cannot solve this problem in the allotted time/budget, therefore the problem doesn’t exist.’)

Over the last month or so, Sun Oracle graciously provided us with another opportunity to learn far more about ZFS than we really ought to need to know. While getting sucked into debugging another obscure issue, I tried occasionally to step back and ask if we were systematically troubleshooting or just Roomba’ing around.

Sometimes it’s hard to tell.


http://electronics.howstuffworks.com/gadgets/home/robotic-vacuum2.htm

Friday, August 27, 2010

So close, but yet so far. Microsoft almost gets it right.

When I read Steps 1 & 2 in the dialog box I thought I had died and gone to heaven.

Flash-Uninstall

Then I read step three.

Amusing – except that if Flash crashed, odds are that there is a reason. I don’t have any way of knowing if it crashed because it’s buggy defective, or if it crashed during an exploit attempt. Of course if it’s the later, I don’t have any way of knowing if the attempt successful or not.

Saturday, July 31, 2010

Bogus Drivers Licenses, Fake Passports

The State of Minnesota is running a facial recognition algorithm on Minnesota drivers licenses and state ID’s.

Partial results:

  • Ran the algorithm on 11 million license photos
  • Flagged 1 million for manual review
  • Of the 100,000 reviewed so far, 1200 licenses were cancelled

By simple extrapolation of the numbers, there could be as many as 10,000 bogus state issued ID’s or licenses out of the pool of 11 million. There isn’t enough data in the media to know if a simple extrapolation is valid, so the number could be less.

Meanwhile, Government Accounting Office investigators were able to obtain US passports with fake identification in three out of seven attempts.

I think there is a house of cards here somewhere.

Wednesday, July 21, 2010

Just another day in Internet-land

So I’m goofing off at work, gambling with other peoples money using my fully patched but rootable browser, running on a fully patched but rootable operating system, occasionally downloading digitally signed malware while I contemplate the possibility that my medical records are on a P2P network somewhere, knowing that I really should be patching the remotely exploitable database that I just installed on my shiny new sever that was thoughtfully preloaded with malware, and I’m thinking to myself:

“What’s new and interesting today?”

Nothing. Just another day in Internet-land.

Saturday, July 10, 2010

Oracle Continues to Write Defective Software, Customers Continue to Buy it

What’s worse:

  • Oracle continues to write and ship pathetically insecure software.

Or:

  • Customers continue to pay for it.

From the July 2010 Oracle CPU pre release announcement:

Oracle Product Vulnerability Rating License Cost/Server
Database Server Remote, No Auth[1] 7.8/10 $167,000[2]

Awesome. For a mere $167,000[2] I get the privilege of installing poorly written, remotely exploitable, defective database software on a $5,000 2-socket Intel server.

Impressive, isn’t it.

I’m not sure what a ‘Times-Ten’ server is – but I’m glad we don’t have it installed. The good news is that it’s only half the price of an Enterprise Edition install. The bad news is that it is trivially exploitable (score of 10 on a scale of 1-10).

Oracle Product Vulnerability Rating License Cost/Server
Times-Ten Server Remote, No Auth[1] 10/10 $83,000[3]

From what I can see from the July 2010 pre-release announcement, their entire product catalog is probably defective. Fortunately I only need to be interested in the products that we have installed and have an Oracle’s CVSS of 6 or greater & are remotely exploitable (the really pathetic incompetence).

If I were to buy a Toyota for $20,000, and if anytime during the first three years the Toyota was determined to be a smoldering pile of defective sh!t, Toyota would notify me and offer to fix the defect at no cost to me other than in inconvenience of having to drive to their dealership and wait in their lobby for a couple hours while they replace the defective parts. If they didn’t offer to replace or repair the defects, various federal regulatory agencies in various countries would force them the ‘fess up to the defect and fix it at no cost to me. Oracle is doing a great job on notification. But unfortunately they are handing me the parts and telling me to crawl under the car and replace them myself.

An anecdote: I used to work in manufacturing as a machinist, making parts with tolerances as low as +/-.0005in (+/-.013mm). If the blueprint called for a diameter of 1.000” +/-.0005 and I machined the part to a diameter of 1.0006” or .9994”, the part was defective. In manufacturing, when engineers designed defective parts and/or machinists missed the tolerances and made defective parts, we called it ‘scrap’ when the part was un-fixable or ‘re-work’ if it could be repaired to meet tolerances. We wrote it up, calculated the cost of repair/re-machining and presented it to senior management. If we did it too often, we got fired. The ‘you are fired’ part happen often enough that myself, my foreman and the plant manager had a system. The foreman invited the soon to be fired employee into the break room for coffee, the plant manager sat down with the employee, handed him a cup of coffee and delivered the bad news. Meanwhile I packed up the terminated employees tools, emptied their locker and set the whole mess out in the parking lot next to their car. The employee was escorted from the break room directly to their car.

Will that happen at Oracle? Probably not.

Another anecdote: Three decades ago I was working night shift in a small machine shop. The owner was a startup in a highly competitive market, barely making the payroll. If we made junk, his customers would not pay him, he’d fail to make payroll, his house of cards would collapse and 14 people would be out of work. One night I took a perfectly good stack of parts that each had hundreds of dollars of material and labor already invested in them and instead of machining them to specification, I spent the entire shift machining them wrong & turning them into un-repairable scrap.

  • One shift’s worth of labor wasted ($10/hour at the time)
  • One shift’s worth of CNC machining time wasted ($40/hour at the time)
  • Hundreds of dollars per part of raw material and labor from prior machining operations wasted
  • Thousands of dollars wasted total (the payroll for a handful of employees for that week.)

My boss could have (or should have) fired me. I decided to send him a message that hopefully would influence him. I turned in my timecard for the night (a 10 hour shift) with ‘0’ in the hours column.

Will that happen at Oracle? Probably not.

One more anecdote: Three decades ago, the factory that I worked at sold a large, multi-million dollar order of products to a foreign government. The products were sold as ‘NEMA-<mumble-something> Explosion Proof’. I’m not sure what the exact NEMA rating was. Back in the machine shop, we just called them ‘explosion proof’.

After the products were install on the pipeline in Siberia[4], the factory sent the product out for independent testing & NEMA certification. The product failed. Doh!

Too late for the pipeline in Siberia though. The defective products were already installed. The factory (and us peons back in the machine shop) frantically figured out how to get the dang gear boxes to pass certification. The end result was that we figure out how to re-work the gear boxes in the field and get them to pass. If I remember correctly,  the remedy was to re-drill 36 existing holes on each part 1/4” deeper, tap the holes with a special bottoming tap, and use longer, higher grade bolts. To remedy the defect, we sent field service techs to Siberia and had them fix the product in place.

The factory:

  1. Sold the product as having certain security related properties (safe to use in explosive environments)
  2. Failed to demonstrate that their product met their claims
  3. Figured out how to re-manufacture the product to meet their claims
  4. Independently certified that the claims were met.
  5. Upgraded the product in the field at no cost to the customer

Oracle certainly has met conditions #1, #2 and #3 above. Will they take action #4 and #5?

Probably not.


[1]Remotely exploitable - no authentication required implies that any system that can connect to the Oracle listener can exploit the database with no credentials, no session, no login, etc. In Oracles words: “may be exploited over a network without the need for a username and password”

[2]Per Core prices: Oracle EE $47,000, Partitioning $11,500, Advanced Security $11,500, Diag Pack $5000, Tuning Pack $5000, Patch Management $3500. Core factor of 0.5, discount of 50%  == $167,000. YMMV.

[3]All other prices calculated as list price * 8 cores * .5 core factor * 50% discount.

[4]I have no idea if the pipeline was the infamous pipeline that made the headlines in the early 1980’s or not, nor do I know if it is the one that is rumored to have been blown up by the CIA by letting the Soviets steal defective software. We made gearboxes that opened and closed valves, not the software that drove them. We were told by management that these were ‘on a pipeline in Siberia’.

Monday, June 28, 2010

Let’s Mix Critical Security Patches and Major Architecture Changes and see What Happens.

Is re-architecting key functionality on an N.n.n release unusual?

“Yes, this was an unusual release, and an experiment in shipping new features quicker than our major release cycle normally allows.”

On version 3.6.n, plugins shared process space. On 3.6.n+1, plugins do not.

The experiment appears to have suffered a setback.

The problem?

“…we are seeing an increasing number of reports that some users are unable to play Farmville, because Farmville hangs the browser long enough for out timeout to trigger and kill it.”

Apparently the “crashed plugin” timer needs to be long enough that Farmville can finish loading. Ten seconds isn’t long enough.

How did they originally arrive at a 10 second timeout?

“Originally a 10s timeout made a lot of sense considering that we had no actual data to go with.”

It looks like none of the Mozilla developers or testers play Farmville, or they’d have caught the problem prior to release.

Why make major changes to a minor release? To improve the customer experience, of course:

“Mozilla is always looking for more ways to bring users valuable features and improvements as quickly as possible. Crash protection offers significant stability enhancements, and product drivers wanted to make it available to Firefox users as soon as possible.”

The net effect of this is probably minor. Enterprises that actually have to spend real money and real staff time to roll out new code to some of the hundreds of millions of desktops that use Firefox can skip this release, and we’re all using the product for free so our time doesn’t count and we can’t complain.

I’m not a fan of mixing high priority security fixes with new functionality. Any change in functionality introduces the possibility that a high priority security patch/fix can’t be implemented because it breaks existing downstream dependencies.

Saturday, June 26, 2010

Sun/Oracle Finally Announces ZFS Data Loss Bug

If you’ve got a Sun/Oracle support login, you can read that an Abrupt System Reboot may Lead to ZFS Filesystem Data Integrity Issues on all Solaris kernels up through April 2010.

“Data written to a Solaris ZFS filesystem and confirmed by fsync(3C) may be lost in the event of an abrupt system reboot.”

This announcement came too late for us though.

If I am a customer of an ‘enterprise’ vendor with millions of dollars of that vendors hardware/software and hundreds of thousands in annual maintenance costs, I expect that vendor will proactively alert me of storage related data loss bugs. I don’t think that’s too much to expect, as vendors with which I do far less business with have done so for issues of far less consequence.

Sun failed.

Hopefully Oracle will change how incidents like this are managed.

Another Reason for Detailed Access Logs

Another poorly written application, another data leak. Not new, barely news.

This statement is interesting though:

“[company spokesperson] said it's unclear how many customers' information was viewed, but that letters were sent to 230,000 Californians out of an "abundance of caution.”

Had there been sufficient logging built into the application, Anthem Blue Cross would have known the extent of the breach and (perhaps) could have avoided sending out all 230,000 breach notifications. That’s a view on logging that I’ve expressed to my co-workers many times. Logs can verify what didn’t happen as well as what did happen, and sometimes that’s exactly what you need.

There are a couple of other interesting things in the story:

“the confidential information was briefly accessed, primarily by attorneys seeking information for a class action lawsuit against the insurer.”

That’ll probably cost Anthem a bundle. Letting lawsuit-happy attorneys discover your incompetence isn’t going to be the cheapest way to detect bad applications.

And:

“a third party vendor validated that all security measures were in place, when in fact they were not.”

Perhaps the third party vendor isn’t competent either?

Via Palisade Systems

Friday, June 25, 2010

Would You Give up Your Credit Card Number for an Hour of Free Wireless?

  • Cool: The City of Minneapolis has city-wide WiFi.
  • Cooler: The City of Minneapolis is offering free WiFi hotspots at selected spots in the city.
  • Coolest: It works.
  • Uncool: To use the free hot spots, you have to surrender a credit card number.

Would you give up a CC number just to get free WiFi?

  • This probably isn’t any worse that handing your card to a waiter at a restaurant.
  • I don’t know what else one could request that would provide a bit of verification of a users identity. A drivers license?
  • One could simply decide to not care who uses the free hotspots. Our big brothers at the University don’t. They offer free guest wireless with only an e-mail address. For them, a@b.com is an e-mail address.
  • It’s entirely possible that the vendor that built and owns the the network is PCI-DSS SAQ-whatever compliant.

I don’t think that I’d give up a card number just to get free WiFi.

I will assert though, that most people will.

Friday, June 4, 2010

What’s an Important Update?

Windows update runs (good).

Windows update classifies some updates as important, and some updates as optional (good).

SNIP1

Windows update decides that a Silverlight update is important. It appears security related (good) but also add features (maybe good, maybe bad).

 SNIP3

Windows update decides that a security definition update is optional (bad).SNIP2

How can a definition update for a signature based security product be optional? That’s annoying, ‘cause now I have to make sure to check optional updates just in case they’re important.

Sunday, May 30, 2010

Where are your administrative interfaces, and how are they protected?

One of the many things that keeps me awake at night:

For each {application|platform|database|technology} where are the administrative interfaces located, and how are they protected?

I've run into administrative interface SNAFU's on both FOSS and purchased software. A common problem is applications that present an interface that allows access to application configuration and administration via the same ports and protocols as the application user interface. A good example is Twitter,  where hacker-useful support tools were exposed to the Internet with ordinary authentication.

In the case of the Pennsylvania school spy cam caper, the 'administrative interface' that the school placed on the laptops apparently is relatively easy to exploit, and because they sent the students home with the district laptops, the interface is/was exploitable from the Internet.

Years ago one of our applications came with a vendor provided Tomcat install configured with the Tomcat management interface (/manager/*) open to the Internet on the same port as the application, ‘secured’ with a  password of 'manager', without the quotes. Doh!

The most recent JMX console vulnerability show us the type of administrative interface that should never, ever be exposed to the Internet. (A Google search shows that at least a few hundred JMX consoles exposed to the Internet.)

I try to get a handle on ‘rogue’ administrative interfaces by white listing URL’s in the load balancers. I’ll ask the vendor for a list of top level URL’s and build regex rules for them. (/myapp/*, /anotherapp/*, etc.). When the list includes things like /manager/*, /admin/*, /config/* we open up a dialogue with the vendor.

Our standard RFP template asks the vendor for information on the location and security controls for all administrative and management interfaces to their application. We are obviously hoping that they are one step ahead of us and they've built interfaces that allow configuration & management of the product to be forced to a separate 'channel' of some sort (a regex-able URL listening on a separate port, etc.).

Some vendors 'get it'.

Some do not.

Wednesday, May 19, 2010

Usenet services have been made unnecessary…

“…by the growing use of blogs, social networking sites and RSS feeds.”

Duke University, 2010

The end of an era.

Major ISP’s have been shedding their Usenet services for years, but when the originator of the service dumps it, the Internet really ought to mark a date on a global calendar somewhere.

Friday, May 7, 2010

IPv6 Tunnels & Solaris

Following Dan Anderson’s instructions here I set up an IPv6 tunnel and put my home network on IPv6. It was surprisingly easy. I have an OpenSolaris server acting as the tunnel end point and IPv6 router, with IPv6 tunneled to Hurricane Electric, and didn’t spend much more than an hour doing it.

Following Dan’s instructions, I:

  • Signed up at Hurricane Electrics tunnel broker service, requested a /64 & created a tunnel
  • Configured my OpenSolaris server as a tunnel end point
  • Configured Solaris’s IPv6 Neighbor Discovery Protocol (NDP) service & reloaded it
  • Pointed my devices at HE’s DNS’s
  • ‘Bounced’ the wireless adapters on my various notebooks, netbooks and Mac’s

I didn’t have to reboot anything – and better yet – when I did reboot the various devices, IPv6 still worked.

I’m not sure why I needed to use HE’s name servers, but things started working a lot better when I did, and their name servers seem to work as good as anyone’s.

I think I got lucky – my DLINK DIR-655 home router/access point routes protocol 41 just find. No configuration necessary.

I don’t have a static IP, so when my ISP moves me around, I’ll have to log in to Tunnel Broker and tweak the tunnel end point. That shouldn’t be a big deal – my ISP only changes my IP address once a year or so.

The ‘ShowIP’ Firefox plugin was very useful. It makes it clear when I’m  using V6 vs V4.

Security considerations? I’ll tack them on to the end of the project and address them after implementation.

Sunday, April 25, 2010

Oracle/Sun ZFS Data Loss – Still Vulnerable

Last week I wrote about how we got bit by a bug and ended up with lost/corrupted Oracle archive logs and a major outage. Unfortunately, Oracle/Sun’s recommendation – to patch to MU8 – doesn’t resolve all of the ZFS data loss issues.

There are two distinct bugs, one fsync() related, the other sync() related. Update 8 may fix 6791160 zfs has problems after a panic , but

Bug ID 6880764 “fsync on zfs is broken if writes are greater than 32kb on a hard crash and no log attached”

is apparently not resolved until 142900-09 released on 2010-04-20.

DBA’s pay attention: Any Solaris 10 server kernel earlier than Update 8 + 142900-09 that is running any application that synchronously writes more than 32k chunks is vulnerable to data loss on abnormal shutdown.

As best as I can figure – with no access to any information from Sun other than what’s publically available – these bugs affect synchronous writes large enough to be written directly to the pool instead of indirectly via the ZIL. After an abnormal shutdown, on reboot the ZIL replay looks at the metadata in the ZIL wacks the write (and your Oracle Archive logs).

It appears that you can

  • limit database writes to 32k (and kill database performance)
  • or you can force writes larger than 32k to be written to the ZIL instead of the pool by setting zfs_immediate_write_sz  larger than your largest database write (and kill database performance)
  • or you can use a separate log intent device (slog)
  • or you can update to 142900-09

Ironically, the ZFS Evil Tuning Guide recommends the opposite – set “the zfs_immediate_write_sz parameter to be lower than the database block size” so that all database writes take the broken direct path.

Another bug that is a consideration for an out of order patch cycle and rapid move to 142900-09:

Bug ID 6867095: “User applications that are using Shared Memory extensively or large pages extensively may see data corruption or an unexpected failure or receive a SIGBUS signal and terminate.”

This sounds like an Oracle killer.

I’m not crabby about a killer data loss bug in ZFS. I’m crabby because Oracle/Sun knew about the bug and it’s enormous consequences and didn’t do a dammed thing to warn their customers. Unlike Entrust – who warned us that we had a bad cert even though it was our fault that our SSL certs had no entropy, and unlike Microsoft who warned it’s customers about potential data loss, Sun/Oracle really has their head in the sand on this.

Unfortunately – when your head is in the sand, your ass is in the air.

Saturday, April 24, 2010

We do not retest System [..] every time a new version of Java is released.

This post’s title is a quote from Oracle technical support on a ticket we opened to get help running one of their products on a current, patched JRE.

Oracle’s response:

“1. Please do not upgrade Java if you do not have to
2. If you have to upgrade Java, please test this on your test server before implemeting [sic] on production
3. On test and on production, please make a full backup of your environment (files and database) before upgrading Java and make sure you can roll back if any issue occurs.”

In other words – you are on your own. The hundreds of thousands of dollars in licensing fees and maintenance that you pay us don’t do you sh!t for security.

Let’s pretend that we have a simple, clear and unambiguous standard: ‘There will be no unpatched Java runtime on any server’.

There isn’t a chance in hell that standard can be met.

This seems to be a cross vendor problem. IBM’s remote server management requires a JRE on the system that has the application that connects to the chassis and allows remote chassis administration. As far as we can tell, and as far as IBM’s support is telling us, there is no possibility of managing an IBM xSeries using a patched JRE.

“It is not recommended to upgrade or change the JRE version that's built inside Director. Doing so will create an unsupported configuration as Director has only been tested to work with its built-in version.”

We have JRE’s everywhere. Most of them are embedded in products. The vendors of the products rarely if ever provide security related updates for their embedded JRE’s. When there are JRE updates, we open up support calls with them watch them dance around while they tell us that we need to leave them unpatched.

My expectations? If a vendor bundles or requires third party software such as a JRE, that vendor will treat a security vulnerability in the dependent third party software as though it were a vulnerability in their own software, and they will not make me open up support requests for something this obvious.

It’s the least they could do.