Thursday, January 29, 2009

Rogue Sysadmin Sabotage Attempt

 

A terminated system admin attempted massive data deletion with a script that would have attempted to wipe out all disks on all servers.

"If this script were executed, the total damage would include cleaning out and restoring all 4,000 ABC servers, restoring and securing the automation of mortgages, and restoring all data that was erased."

It was detected before it could execute.

Think about your conversion from tape based backups to disk based backups. Would that script have wiped out the disk pools that store your most recent backups?

Unless you have clear separation of duties and rights between system admins who support production and those who administer the backup software and servers, this would be a tough risk to mitigate. I‘ll bet that most shops have the same system admins for both the production servers and the backup infrastructure. If so, the rogue ‘wipe all’ script would take out the disk based backups also.

That’d be a bad day.

(Via Security Circus).

Hardware is Expensive, Programmers are Cheap II

In response to Hardware is Cheap, Programmers are Expensive at Coding Horror:

The million dollar question: What’s wrong with this picture?

cpu-day

20% CPU utilization, that’s what’s wrong. It’s way too low.

The hardware that’s running at 20% on a busy day is 32 cores of IBM’s finest x3950 series servers and a bunch of terabytes of IBM’s DS4800 storage. The application has three of them (active/passive and remote DR) at a total cost of about $1.5 m. That’s right, $1.5 million in database hardware running less than 20% CPU utilization on a normal day and barely 30% CPU on the busiest day of the year.

How did that happen?

Because a software vendor, run by programmers, thought that they were too expensive to design an efficient and optimized application. Instead they spent their precious and valuable time adding shiny new features. So the customer had no choice but to buy hardware. Lots of it. Then – after the hardware was bought, the software vendor figured out that they actually could write an application that was efficient and optimized, and that their customers couldn’t buy enough hardware to compensate for their poor programming.

Too late though. The hardware was already bought.

The app in question was delivered with a whole series of performance limiting design and coding flaws. The worst of them:

  • No session caching combined with a bug that forced two database updates to the same session state table for each session state change (several hundred updates/second and a really, really nasty page latch issue)
  • Broken connection pooling caused by poor application design, forcing app servers to log in & out of the database server several hundred times per second.
  • Session variables not cached, forcing database rounds trips for user information like language, home page customizations, background colors, etc., once per component per web page. Thousands per second.
  • Failure to properly parameterize SQL calls, forcing hundreds of SQL recompilations per second of the same dammed friggen query. And of course, filling up the procedure cache with nearly infinite query/parameter combinations.
  • Poorly designed on screen widgets & components, some of which used up 30% of 32 database cores all by themselves.
  • A design that prevents anything resembling horizontally scaled databases.
  • (the whole list wouldn’t fit in a blog post, so I’ll quit here….)

After suffering nasty performance and system outages, and after spending tens of thousands of dollars on consulting and tens of thousands on Tier 3 Microsoft support, and after discovering the above flaws and reporting them to the software vendor, the customer was advised to buy more hardware. Lots of it.

The database server growth went something like this:

  • 8 CPU, 100% busy. 100% growth per year. 18 month life
  • 16 CPU, 80% busy. 50% growth per year. 6 month life
  • 32 Core (16 dual cores), 50% busy. 30% growth per year.

Throw in Microsoft Database licenses, Microsoft Data Center Edition software and support, IBM storage (because HP wouldn’t support Data Center Edition on an EVA), and it’s not hard to see seven figures getting thrown at the database server production cluster and failover severs. Oh, we shouldn't forget to add in expenses for additional power, cooling and floor space in the datacenter.

Fast forward a few years, a handful of application code upgrades, and a million and a half hardware dollars later.

  • Beautifully designed session and user variable caching, intelligent enough to only cache what it needs and only use the database when is has to.
  • Fully optimized widgets.
  • Minimal SQL recompilations.
  • An optimized data model.
  • An efficient, well running application.
  • A pleasure to host.
  • And 30% peak CPU.

Had the app been as efficient three years ago as it is today, I'm estimating that about half of what was spent on hardware and related licensing and support costs would not have been necessary. They would not have had to buy Datacenter Edition when they did, if at all. Existing EVA's would have been supported, eliminating the need to buy IBM storage. Overall support and licensing costs would have been much lower, but more importantly, they would have been on the downhill side of Moore's law instead of climbing uphill against it. Realistically, they still would have bought hardware, but they'd have bought it later and gotten faster bits for fewer dollars.

If each of the optimizations and bug fixes that the software vendor applied as part of the last 4 years of software upgrades been available only six months earlier than they were, the customer still would have saved a pile of money. That six month acceleration probably would have been enough time to allow them to wait for dual-core processors to come out instead of buying single-cores and then upgrading them to dual-cores six months later, and the dual-cores would still have lasted until quad-cores came out. That would have allowed the customer to stick with eight-socket boxes and save a programmer’s salary worth of licensing and operating system costs.

What’s the worst part of all this?

There’s lots of ‘worst’ parts of this.

  • More than one customer had to burn seven figures compensating for the poor application. There is at least one more customer at the same scale that made the same hardware decisions.

And:

  • The customer detected and advised the vendor of potential solutions to most of the above problems. The vendor’s development staff insisted that there were no significant design issues.

And:

  • The vendor really didn’t give a rat’s ass about efficiency until they started hosting large customers themselves. When they figured out that hardware was expensive, their programmers suddenly were cheap enough to waste on optimization.

And:

  • The dollars burned are not the customers. They are yours. Taxes & tuition paid for it all.

Hardware is expensive, Programmers are cheap.

In this case, a couple of customers burned something like 10 programmer-years worth of salary on unnecessary hardware, when the cost to optimize software was clearly an order of magnitude lower than the cost to compensate with hardware.

To be fair though, I’ll post another example of a case where hardware was cheaper than programmers.


Related:

Hardware is Expensive, Programmers are Cheap

Hardware is Cheap, Programmers are Expensive

Sunday, January 18, 2009

A Simple Solution, Well Executed

I’m trying out a new mantra:
All other things being equal, a simple solution, well executed, is superior to a complex solution, poorly executed.
Since data destruction[1] discussions seem to have resurfaced again, I’ll try it out on that topic.

In Imperfect, but Still Useful Jim Graves writes
“Almost any method of data destruction is so much better than nothing that any differences between methods are usually insignificant.[2]
Drive-DestructionAs Jim indicates, choosing a technical method or algorithm for destroying data shouldn’t be the problem that we spend significant resources solving. Ensuring that all media gets processed with any destruction method is the problem that needs to be solved. In other words, it is critical that all data on all media is destroyed, and that no media bypasses the process that destroys the media. This is a different problem that requires a different solution and a different skill set than the problem of determining the best method of destroying data. The problem that needs to be solved is one of completeness in coverage, not one of completeness of destruction. It’s a process problem, not a technology problem.

In 2002 our organization had an HGE (Headline Generating Event) related to improper disposal of media. The reaction from the technical people (me) was to research the effectiveness of various media deletion techniques, which inevitably went down the path of data remanence and magnetic force microscopes. It was pretty obvious at the time that the problem wasn’t one of determining the correct number of wipes to which to subject our media, but rather one of ensuring that all media get some form of destruction, even if the destruction isn’t perfect. We didn’t need to make the data completely unrecoverable by all known technology. We needed to make the data significantly expensive to recover compared to its value, and we needed to make sure that of the tens of thousands of disks that we disposed of every year, as few as possible leaked though the destruction process un-destroyed.

In my opinion the real problem (completeness of coverage) needed to be addressed by making the data destruction process as simple as possible, thereby increasing the probability that the process would actually get executed on all media. That isn’t a technical problem, it’s a process and person problem. I used to work on an assembly line. For any process involving humans and repetition, simple is good, and the process that the person follows must be person-proof. In this case, a simple and person-proof process was what was needed.

Unfortunately, our internal legal staff was driving the bus, and they got focused on the technical problems of numbers of passes and zero’s versus ones. Attempts to steer them towards simple destruction processes that had a low probability of getting bypassed were not successful, nor were attempts to match the value of the data against the effort required to recover the data. We ended up with a complex, time consuming process that ensures that the media that went through the process is unrecoverable, but does little to ensure that no media escaped the process.

In a related discussion at Black Fist Security, the principle of the blog writes:
“What if you had one person wipe the drive with all zeros. Then have a second person run a script that randomly checks a representative sample of the disk to see if it finds anything that isn't a zero.[3]
That’s the kind of person-process thinking that can solve security problems. It’s simple (one pass plus a sampling) and has a good possibility of being well executed on any media that is subject to the process.

Related Posts:


References:


[1]
Single drive wipe protects data, research finds, Security Focus
[2]
Imperfect, but Still Useful Jim Graves, Graves Concerns
[3]
Fear and Terror! All your data are being stolen! Black Fist Security

Sunday, January 11, 2009

Windows 7 – Looks good so far

I bit the bullet and installed the new beta. Here’s my first day impressions, written using Live Writer on Windows 7.

It’s an easy install. After freeing up and formatting a 20GB partition, I downloaded a 2.5GB ISO, burned it, mounted it and ran setup. A handful of questions and a couple reboots and it was up and running. It was fast, easy and painless compared to the typical new computer first time setup.

No crapware needed. I have a functional computer with no add-on or third party drivers. I gave up on Linux as a desktop years ago simply because of the hassle of managing hardware compatibility, and my experience with Vista indicates that removing non-essential vendor provided crud helps performance and memory utilization. Win 7’s initial impression is that a GB of memory will be adequate provided that I don’t have to load up with 3rd party drivers and utilities, and so far I don’t, so I should be OK with 1GB of RAM.

It’s fast. I dual boot Vista on this computer (a cheap 1.6Ghz T2060 dual core notebook with 1GB RAM) and there is no comparison between the two. Either it’s faster or it’s got me fooled into thinking it’s faster. Either way, Win 7 wins. Vista on the same hardware is like a two year old on the potty. A simple thing, like opening a Vista control panel app, seems like it’s a big production, with lots of effort and whining. “I donwaannaaa poop” “You can do it, just try it” “I caaann’t poooop” “Sure you can, squeeze harder” “waaaa……I caaann’t”. Eventually a turd the size of a dime comes out. “Yea! Good Job"!”. And the Vista control panel app finally opens. You feel like you need to thank it & give it a cookie for being so good.

Bluetooth A2DP works, but I haven’t been able to pair with my phone. Tethering is a must-have, and ideally I wouldn’t have to add a 3rd party BT stack to get it.

The user interface is cleaned up. It’s still got some non-intuitiveness in places, but it’s a step up from Vista. I’ve gotten a lot farther with this UI in the first day than I did with either Vista or OSx. I’m used to Vista, so it’s an easy transfer, but some of it is because it’s more intuitive.

The Resource Monitor is a step up from Vista and now includes netstat equivalent functionality, including TCP socket & ports per process.

ResourceMonitor-Network

It’s got a nifty and simple memory usage graph. You can see that after a fresh install, I’ve got about half a gig used.

Win7-Memory-bar

The Resource Monitor also includes a process explorer that shows open files handles and DLL’s.

Win7-Process

Microsoft finally is providing what I think is the minimal functionality for that type of tool. Mapping a process to file handles and network sockets is essential for troubleshooting.

And – check this out - it’s got a real shell:

Shell

(OK – I had to add the Unix subsystem to get it, but it’s there and it works)

So far – it’s a win.

Thursday, January 8, 2009

Pack Rat or Prudent Record Keeping?

I've kept paper copies of all financial transactions that I've had with banks, credit card companies, utilities and the like ever since I opened up my first checking account 30+ years ago, and to this day I maintain a paper trail of all electronic statements, electronic bill payments, canceled checks and pay stubs, filed away in boxes stuffed into attics and basements. I've always assumed that someday I'd need those records, but I never knew why.

Now I know why. The victims of Madoffs ponzi scheme are being asked to provide records:
They are asked to provide their most recent account statements, and proof of wire transfers or canceled checks showing deposits "from as far back as you have documentation."   (Emphasis added)
The form also asks for all information regarding any withdrawals or payments received from Madoff.
Supplying such records could be nearly impossible for many longstanding clients, said Harry Susman, a partner at law firm Susman Godfrey LLP. "You've got people who were investing with Madoff for 20 years and didn't keep records," he said. 

I'll bet that the people with good records come out better than the ones with poor or no records.

In an age of electronic-everything, where is the 20 year record trail that a person needs in a case like this?

By the way - if anybody wants to track dollars, therms, kilowatt-hours or BTU per square foot per heating degree day for a couple houses in Minnesota over a 25 year period of time, I've got the data. I just need to get it all out of the attic and into a spreadsheet so I can play with it.

Sunday, January 4, 2009

Hardware is Expensive, Programmers are Cheap

The case that Jeff Atwood attempts to make is basically that hardware is generally cheap enough that code optimization doesn’t pay (or – don’t bother optimizing code until after you’ve tried solving the problem with cheap hardware). I read & re-read the argument. I’m convinced that in the general case, it doesn’t add up.

In my experience, the ‘hardware is cheap, programmers are expensive’ mantra really only applies to small, lightly used systems, where the fully loaded hardware cost is actually is trivial compared to the cost to put a programmer in a cube. Once the application is scaled to the point where adding a server or two no longer solves the scalability problem, or in cases where there are database, middleware or virtualization licenses involved, the cost to add hardware is not trivial. The relative importance of software optimization versus more hardware then shifts toward smart, optimized software, and the cheap hardware argument at Coding Horror quickly falls apart.

The comments at Coding Horror descended into the all too common ‘if I had a better monitoring I could write better code’ nonsense pretty quickly, which of course misses the point of the post. Some of the commenters got it right though:
“The initial cost of hardware (servers) is not the only cost, and - yes hardware is cheap, but is a drop in the proverbial bucket compared to the total cost of ownership” – JakeBrake
“Throwing more hardware at problems does not make them go away. It may delay them, but as an application scales either…you may get a combinatoric issue pop up outstripping you ability to add hardware…[or]…you just shift the problem to systems administration who have to pay to maintain the hardware. Someone pays either way.” – PJL
“Throwing hardware at a software problem has its place in smaller, locally hosted data facilities. When you're running in a hardened facility the leasing of space, power, etc. begins to hurt. One could argue the amount of time and labor necessary to design and implement a new server, along w/ the hardware costs, space, power -- and don't forget disk if you're running on a SAN (fibre channel disk isn't cheap!) -- can easily negate the time of a programmer to fix bad code.” – Jonathan Brown
The above comments correctly emphasize that the purchase price of a server is only a fraction of the cost of the server. A fully loaded server cost must include space, power, cooling, a replacement server every 3-4 years, system management, security, hardware and software maintenance costs and software licensing costs. And if the server needs a SAN attach, then fiber channel port costs can equal the server hardware costs. Some estimates (here and here) imply that the loaded power, space and cooling cost of a server can approximately equal the cost of the server.

Fortunately the hardware-is-cheap argument was promptly retorted with a well written post by David Berk:
“To add linear capacity improvements the organization starts to pay exponential costs. Compare this with exponential capacity improvements with linear programming optimization costs.”
In other words, time spent optimizing code pays back cost saving dividends continuously over the life of the application, with little or no additional ongoing costs. Money spent on hardware that only compensates for poorly written code costs money every day, and as the application grows, that cost rises exponentially.

That’s basically where we are at with a couple of our applications. They are at the the size/scale where doubling the hardware and associated maintenance, power, cooling, database licenses will cost more than a small team of developers, and because of the inherent limits of scalability in the design of these applications, the large outlay in capitol will at best result in minor capacity/scalability/performance improvements.

Adding on the David Berk’s response, I’d add that one should consider greenhouse gases (a ton or two per server per year!), database licensing costs (the list price for one CPU’s worth of Oracle Enterprise plus Oracle RAC is close to the cost of a programmer’s salary).

Another way of looking at this is well written, properly optimized software pays itself back in hardware, datacenter, cooling and system manager cost in a broad range of scenarios, the exception being the small, lightly used applications. For those – throw hardware at the problem and hope it goes away.