Tuesday, February 15, 2011

Somewhere in the OraBorg, an RSS feed is being updated

It’s Tuesday. My pre-OraBorg Google reader subscription shows a stream of security updates. Looks pretty bad:

greader

Wow – there are security vulnerabilities Mozilla 1.4, ImageMagick, a2ps, umount & a slew of other apps. I’d better kick our patch management process into high gear. It’s time to dig into these and see which ones need escalation.

Clicking on the links leads to sunsolve, the go-to place for all things Solaris. Sunsolve redirects to support.oracle.com. support.oracle.com has no clue what to do with the re-direct.

Bummer… I’d better do some serious research. GoogleResearch, of course:

google

2004, 2005, 2006…WTF???

Conclusion: Oracle is asking us sysadmins to patch five year old vulnerabilities. They must think that this will keep us from whining about their current pile of sh!t.

Diversion. Good plan. The borg would be proud.

One last (amusing) remnant of the absorption of Sun into to OraBorg.

Friday, February 11, 2011

Backup Performance or Recovery Performance?

“There is not a guaranteed 1:1 mapping between backup and recovery performance…” Preston de Guise, “The Networker Blog

Prestons post reminded me of one of our attempts to build a sane disaster recovery plan. The attempt went something like this:

  1. Hire consultants
  2. Consultants interview key staff
  3. Consultants draft recovery plan
  4. Consultants present recovery plan to executives

In the general case, consultants may or may not add value to a process like this. Consultants are in it for the money. The distinguishing factor (in my eyes) is whether consultants are attempting to perform good, cost effect work such that they maintain a long term relationship with the organization, or whether  the consultants are attempting to extract maximum income from a particular engagement. There is a difference.

On this particular attempt, the consultants did a reasonably good job of building a process and documentation for declaring and event, notifying executives, decision makers and technical staff; and managing communication. The first fifty pages of the fifty thousand dollar document we generally useful. They fell down badly on page 51, where they described how we would recover our data center.

Their plan was:

  • choose a recover site
  • buy dozens of servers
  • hire an army of technicians (one for each server)
  • simultaneously recover each server from a dedicated tape drive that came pre-installed in each of the shiny new servers.
  • complete recovery in fifty seven hours

To emphasize to the executives how firm they were on the fifty seven hour recovery, they pasted Veritas specific server recovery documentation as an addendum to the fifty thousand dollar plan.

Unfortunately, their recovery plan bore no relationship to how we backed up our servers. That made it unusable.

Reality: at the time of the engagement:

  • we did not have a recovery site
  • we had not started looking for a recovery site
  • we did not have one tape drive per server. All backups were multiplexed onto four fiber channel attached tape drives
  • we did not have Veritas Netbackup, we had Legato Networker
  • we could not recover individual servers from individual tape drives. All backups jobs were multiplexed onto shared tapes
  • we could not recover dozens of servers simultaneously. All backups jobs were multiplexed onto shared tapes

Unfortunately, the executive layer heard ‘fifty seven hours’, declared victory and moved on.

I tried to feed the consultants useful information, such as the necessity of having a the SAN up first, the architecture of our Legato Networker system, the number of groups and pools, the single threaded nature of our server restores (vs the multi-threaded backups), the improbability of being able to purchase servers that exactly match our hardware (hence the unlikelihood of a successful bare metal recovery on new hardware), not having recovery site pre-planned, not having power and network at the recovery site, and various other failures of their plan.

You get the idea.

The consultants objected to my objections. They basically told me that their plan was perfect, and that it was proven so by it’s adoption by a very large nation wide electronics retailer headquartered nearby. I suggested that we prepare a realistic recovery plan, accounting for the above deficiencies, and that plan be substituted for the ‘fifty seven hours’ part of the consultants plan. The declared me to be a crackpot and ignored my objections.

Using what I thought were optimistic estimates for an actual recovery I built a marginally realistic Gantt chart. It looked something like this:

  • Order all new hardware – 48 hours. Including an HP EVA SAN and fiber channel switches, an HP GS160, DLT tape changers, A Sun E10K and miscellaneous SPARC & Wintel servers. Call in favors from vendors, beg, borrow or extra-legally appropriate hardware as necessary. HP had a program called ‘Recoverall’ that would have facilitated hardware replacement. Sun didn’t.
  • Locate new site – 48 hours. Call in favors from other state agencies, the governors office, other colleges and universities, and uncle Bob. Can be done in parallel with hardware ordering.
  • Provision new site with power, network, fiber channel – 72 hours. I’m optimistic. At the time (a half dozen years ago) we could have brought most systems up with a duct tape and bailing wire for a network, skipped inconveniences like VLAN’s and firewall rules. used gaffers tape to protect the fiber channel runs, etc.
  • Deliver and install hardware – 72 hours. (Optimistic).
  • Configure SAN fabric, zoning, LUN’s, tape drives, network – 12 hours.
  • Bootstrap Legato, connect up DLT drives, recover indexes – 8 hours.

Then (roughly a week into the recovery) we’d be able to start recovering individual servers. When estimating the server recovery times, I assumed:

  • that because we threaded all backups into four tape drives, and because each tape had multiple servers on it, that we’d only be able to recover four servers at a time.
  • that a server recovery would take twice as long as the server backup
  • that staff could only work 16 hours per day. If a server finished restoring while staff were sleeping, the next server recovery would start when staff woke up.

Throw in a few more assumptions, add a bit of friction, temper my optimism, and my Gantt chart showed three weeks as the best possible outcome. That’s quite a stretch from fifty seven hours.

The outcome of the consulting gig was generally a failure. Their plan was only partially useful. If we would have followed the plan, we would have known whom to call in a disaster, decision makers, communication plans, etc.,but we would not have had a usable plan for recovering a data center.

It wasn’t a total loss though. I used that analysis internally to convince management that given organizational expectations for recovery vs. the complexity of our applications, a pre-built fully redundant recovery site was the only valid option.

That’s the path we are taking.

Wednesday, February 9, 2011

Tipping Point Vulnerability Disclosures–IBM Incompetence?

Last August, Tipping Point decided to publically disclose vulnerabilities six months after vendor notification. The six months is up.

Take a look at the IBM’s vulnerability list and actions taken to resolve the vulnerabilities. If you don’t feel like reading the whole list, the snip below pretty much sums it up:

Timeline:
[08/26/2008] ZDI reports vulnerability to IBM
[08/26/2008] IBM acknowledges receipt
[08/27/2008] IBM requests proof of concept
[08/27/2008] ZDI provides proof of concept .c files
[07/12/2010] IBM requests proof of concept again and inquires as to version affected
[07/13/2010] ZDI acknowledges request
[07/14/2010] ZDI re-sends proof of concept .c files
[07/14/2010] IBM inquires regarding version affected
[07/19/2010] IBM states they are unable to reproduce and asks how to compile the proof of concept
[07/19/2010] ZDI replies with instructions for compiling C and command line usage
[01/10/2011] IBM states they are unable to reproduce and requests proprietary crash dump logs

Tipping Point: Two Thumbs Up.

IBM: Two and a half years. Still no clue.

What IBM’s executive layer needs to know is that people like me read about the failure of their software development methodology in one division and assume that the incompetence spans their entire organization. That may not be fair – IBM is a big company and it’s highly likely that some/most software development groups within IBM are capable/competent. However – if one group is allowed to flounder/fail, then it’s clear to me that software quality is not receiving sufficient attention high enough within IBM to ensure that all software development groups are capable/competent. If some/most software developed within IBM is of high quality, it’s because some/most software development groups believe in doing the right thing. It’s not because IBM believes in doing the right thing.

In other news, IBM’s local sales team is aggressively pushing for me to switch our entire Unix/Database infrastructure from Solaris/Oracle on SPARC to AIX/DB2 on Power.

Guess what my next e-mail to their sales team is going to reference?

Sunday, February 6, 2011

Well formed Comcast phishing attempt - “Update Your Account Information”

A well formed e-mail:

Email

No obvious spelling errors, reasonably good grammar, etc. One red flag is the URL to the Comcast logo, but I wouldn’t bet on users catching that. The embedded link is another red flag:

http://login.comcast.net.billings.bulkemail4sale.com/update/l0gin.htm

[s/0/o/]

But one that would fool many. Users will not see that URL unless their e-mail client has the ability to ‘hover’ a link destination.

The ‘login page’ is well formed & indistinguishable from Comcast’s Xfinity login page:

 Login

All the links in the bogus login page (except the form submit) go to real Comcast URL’s, the images are real, the page layout is nearly identical. The only hint is that the form submit doesn’t post to Comcast, but rather  to[snip].bulkemail4sale.com/Zola.php:

Zola

Zola.php? Hmmm…

Filling out the bogus login page with a random user and password leads to a “Comcast Billing Verification” form requesting last, middle & first names, billing address, credit card details including PIN number, card issuing bank, bank routing number, SSN, date of birth, mothers maiden name, drivers license number, etc…

The “Comcast Billing Verification” form is very well constructed, generally indistinguishable from normal Comcast/Xfinity web pages. The submit action for the “Comcast Billing Verification” form is:

Hacker

Hacker.php? This is not going to end well.

This is a very well constructed phishing attempt. Impressive, ‘eh?

It took me a bit of detective work to determine the non-validity of this phish. Ordinary users don’t have a chance.

Where is anonymous when you need them?

Tuesday, February 1, 2011

The benevolent dictator has determined…

…that you are not qualified to decide what content you read on the device you’ve purchased.

If the New York Times story is true, Apple is rejecting an application because the application allows access to purchased documents outside the walled garden of the iTunes app store.

“Apple told Sony that from now on, all in-app purchases would have to go through Apple, said Steve Haber, president of Sony’s digital reading division.”

I keep thinking that there’d have been an outcry if Microsoft, at the height of their monopoly, had exercised complete control over the documents that you were allowed to purchase and read on your Windows PC’s.