Thursday, January 29, 2009

Hardware is Expensive, Programmers are Cheap II

In response to Hardware is Cheap, Programmers are Expensive at Coding Horror:

The million dollar question: What’s wrong with this picture?

cpu-day

20% CPU utilization, that’s what’s wrong. It’s way too low.

The hardware that’s running at 20% on a busy day is 32 cores of IBM’s finest x3950 series servers and a bunch of terabytes of IBM’s DS4800 storage. The application has three of them (active/passive and remote DR) at a total cost of about $1.5 m. That’s right, $1.5 million in database hardware running less than 20% CPU utilization on a normal day and barely 30% CPU on the busiest day of the year.

How did that happen?

Because a software vendor, run by programmers, thought that they were too expensive to design an efficient and optimized application. Instead they spent their precious and valuable time adding shiny new features. So the customer had no choice but to buy hardware. Lots of it. Then – after the hardware was bought, the software vendor figured out that they actually could write an application that was efficient and optimized, and that their customers couldn’t buy enough hardware to compensate for their poor programming.

Too late though. The hardware was already bought.

The app in question was delivered with a whole series of performance limiting design and coding flaws. The worst of them:

  • No session caching combined with a bug that forced two database updates to the same session state table for each session state change (several hundred updates/second and a really, really nasty page latch issue)
  • Broken connection pooling caused by poor application design, forcing app servers to log in & out of the database server several hundred times per second.
  • Session variables not cached, forcing database rounds trips for user information like language, home page customizations, background colors, etc., once per component per web page. Thousands per second.
  • Failure to properly parameterize SQL calls, forcing hundreds of SQL recompilations per second of the same dammed friggen query. And of course, filling up the procedure cache with nearly infinite query/parameter combinations.
  • Poorly designed on screen widgets & components, some of which used up 30% of 32 database cores all by themselves.
  • A design that prevents anything resembling horizontally scaled databases.
  • (the whole list wouldn’t fit in a blog post, so I’ll quit here….)

After suffering nasty performance and system outages, and after spending tens of thousands of dollars on consulting and tens of thousands on Tier 3 Microsoft support, and after discovering the above flaws and reporting them to the software vendor, the customer was advised to buy more hardware. Lots of it.

The database server growth went something like this:

  • 8 CPU, 100% busy. 100% growth per year. 18 month life
  • 16 CPU, 80% busy. 50% growth per year. 6 month life
  • 32 Core (16 dual cores), 50% busy. 30% growth per year.

Throw in Microsoft Database licenses, Microsoft Data Center Edition software and support, IBM storage (because HP wouldn’t support Data Center Edition on an EVA), and it’s not hard to see seven figures getting thrown at the database server production cluster and failover severs. Oh, we shouldn't forget to add in expenses for additional power, cooling and floor space in the datacenter.

Fast forward a few years, a handful of application code upgrades, and a million and a half hardware dollars later.

  • Beautifully designed session and user variable caching, intelligent enough to only cache what it needs and only use the database when is has to.
  • Fully optimized widgets.
  • Minimal SQL recompilations.
  • An optimized data model.
  • An efficient, well running application.
  • A pleasure to host.
  • And 30% peak CPU.

Had the app been as efficient three years ago as it is today, I'm estimating that about half of what was spent on hardware and related licensing and support costs would not have been necessary. They would not have had to buy Datacenter Edition when they did, if at all. Existing EVA's would have been supported, eliminating the need to buy IBM storage. Overall support and licensing costs would have been much lower, but more importantly, they would have been on the downhill side of Moore's law instead of climbing uphill against it. Realistically, they still would have bought hardware, but they'd have bought it later and gotten faster bits for fewer dollars.

If each of the optimizations and bug fixes that the software vendor applied as part of the last 4 years of software upgrades been available only six months earlier than they were, the customer still would have saved a pile of money. That six month acceleration probably would have been enough time to allow them to wait for dual-core processors to come out instead of buying single-cores and then upgrading them to dual-cores six months later, and the dual-cores would still have lasted until quad-cores came out. That would have allowed the customer to stick with eight-socket boxes and save a programmer’s salary worth of licensing and operating system costs.

What’s the worst part of all this?

There’s lots of ‘worst’ parts of this.

  • More than one customer had to burn seven figures compensating for the poor application. There is at least one more customer at the same scale that made the same hardware decisions.

And:

  • The customer detected and advised the vendor of potential solutions to most of the above problems. The vendor’s development staff insisted that there were no significant design issues.

And:

  • The vendor really didn’t give a rat’s ass about efficiency until they started hosting large customers themselves. When they figured out that hardware was expensive, their programmers suddenly were cheap enough to waste on optimization.

And:

  • The dollars burned are not the customers. They are yours. Taxes & tuition paid for it all.

Hardware is expensive, Programmers are cheap.

In this case, a couple of customers burned something like 10 programmer-years worth of salary on unnecessary hardware, when the cost to optimize software was clearly an order of magnitude lower than the cost to compensate with hardware.

To be fair though, I’ll post another example of a case where hardware was cheaper than programmers.


Related:

Hardware is Expensive, Programmers are Cheap

Hardware is Cheap, Programmers are Expensive

12 comments:

  1. There is a co-location customer of my old employers who have by far the most over-specced hardware in the suite, possibly even the most powerful. Okay their website is fairly popular and they have a fair amount of data to process but nothing justifying that hardware.
    We had one customer who's infrastructure handled similar amounts of data and 50-100 times more traffic with significantly less spec machines.

    They had 6 load balanced Web front ends, boxes that essentially just handled IIS & ASP. All had 8 cores (4x dual core Xeon) and 8Gb of RAM. Just the web front ends. Their DB cluster had similar spec machines that weren't pushed as hard as those front ends. That is serious alarm bells territory, surely?

    Being co-located we just did occasional reboots for them, but eventually one of our guys managed to have a technical discussion with one of their frustrated sysadmins. The code for the site is outsourced.

    Instead of fixing the bugs, the programmers worked out ways to hide the numerous errors from customers, and management.

    They queried the database server in a SELECT * FROM table basis, pulling tens of thousands of records across and then picked out the odd field or two they actually wanted, instead of writing proper SQL queries.

    They didn't close down connections properly, used global variables when local would be better, and wrote code with so many memory leaks it made the Niagara Falls look like a dripping tap, all meaning their front end servers had to be regularly rebooted just to stay alive. They assured the management of the company "that's standard practice". I really wanted to introduce the management to our heavily used servers that had 5 year up times.

    In typical fashion, of course, the points raised by sysadmins were ignored by management. "They're a respected programming company, they know what they're doing"

    ReplyDelete
  2. Garp -

    Unfortunately, stories like that are far too common. In our case, we were chewing up 9 dual CPU app servers & recycling app pools every few minutes on the front end, and 16 CPU's @ 75 or 80% CPU on the database.

    I might use your comments in a future post - with attribution - if you don't mind. It's a good war story.

    ReplyDelete
  3. Feel free to use it if you desire. They're solution was long touted amongst staff as an example of "How not to do it"

    ReplyDelete
  4. I work with a lot of "Web 2.0" sites. Mostly running LAMP stacks. I see poor code daily. And given my SQL query and PHP skills are minimal, it must be bad for me to spot it in 2 seconds.

    At least with the small businesses I work with, they find it easier to add hardware than to fix the code. I think this is even more problematic when development is outsourced or handled by a small group of insiders.

    We've had clients leasing $1000's of hardware to power a start up site when a couple of mid-range servers would do the job.

    I am not sure how to avoid this problem. Post like this one may be a good start.

    ReplyDelete
  5. You don't understand the meaning of "Programmers are expensive, hardware is cheap."

    The point is not that it is always cheaper just to buy more hardware. The point is that it is only really economical to optimize the low hanging fruit; writing your application in assembly or something won't work.

    This is just an example of incompetence.

    ReplyDelete
  6. @Benjamin

    There's a level of hardware, which Garp is talking about here, where Programmers ARE cheaper, in some cases by orders of magnitude (I'm actually picturing an old employer of mine that had several fully loaded IBM System Z8 cabinets, starting price of over $2mil, ours came in substantially more than that, at the time, and we had 5 of them)

    We had a development team of 22 developers, the highest paid was $110k, the lowest $40k, with a decent spread in between, so worst case 22*110000 = 2,420,000, a yearly wage for all the developers came in less than ONE of those cabinets, meaning in the managers eyes it cost them more to run those servers for a year than it did to run the developers, by far in our case.

    Forget that hardware is a one time lump cost with a small maintenance cost, managers don't think that way.

    Thankfully they saw it that way, in a good way (for us), we got as much overtime as we wanted (yeah, overtime, no one was salary, and no one treated as exempt, I love HR departments sometimes), so we worked a LOT of hours sometimes.

    But there's a point where stars align, and Programmers ARE cheaper than hardware, even today.



    @Garp, great post, really brings back memories.
    Still not sure if its in a good way or not.

    ReplyDelete
  7. @Munky - I think you see my point.

    In this case, the app didn't scale horizontally, as the only possible implementation supported by the application vendor was a single SQL 2000 database. So the customer had no choice but to add CPU's and memory as the customer base grew.

    During that time (2004->2006), the options for servers with more than 4 CPU's were limited and expensive. After burning up an 8 socket, 32GB HP DL740, the choices for vertical scalability were pretty slim.

    One of the few choices were IBM x460's with 16 sockets, which forced them to Datacenter Edition of Windows. That drove up the costs of the system exponentially, not linearly. The customer base grew, but the app didn't get fixed, so the 16-way got burned up, and was upgraded to a 32-way (16 dual cores).

    Had the app been horizontally scalable, the costs would have grown linearly, not exponentially, and then perhaps the hardware would have been cheap compared to fixing the app.

    ReplyDelete
  8. People optimize for the bills *they* pay, not globally. Perhaps the expression should be "Other peoples' hardware is cheap, my programmers are expensive"

    ReplyDelete
  9. First off most of that 1.5 million was software and support costs so even out of that cost manpower was the man issue and HW was cheep. Second suggesting that rolling out a system using MS servers and support is expensive is missing the point. Your still paying for programmers time just not your programmers time. You are paying for some fraction of all that Microsoft R&D and calling it HW costs which is just stupid.

    ReplyDelete
  10. On the contray, it sounds like those programmers were very expensive.

    Lesson learned: paying for better programmers- or, more likely, better budgeting, means less wasted money down the road.

    But, we already knew that fixing problems earlier is less expensive than fixing them later?

    ReplyDelete
  11. Great post and good point.

    Its funny to read some of the negative responses from "programmers" here. It's funny because if you had phrased the point in the opposite way, they'd all agree with you! if you said "Programmers can be so efficient (with that better algorithm) that they can save a lot of money in hardware" they'd all be clapping.

    ReplyDelete
  12. Scott - Certainly true. But in this case, I'm sure that the vendor was capable of doing much better (they did eventually) but chose not to. They needed features to get new customers, so they tacked on half written junk and in the process, sacrificed optimization and performance.

    Anonymous(1) - No, most of the cost was plain old hardware. In 2005, a 16 CPU 128GB server was expensive ($225k or so), and three were needed (active, passive and hot site). And the upgrade to 16 dual cores in 2006 was quite a big bill to swallow also.

    Anonymous(2) - Yep - Between Reddit, Ycombinator and Dzone, the comments were pretty snarky. (Reddit especially).

    I feel like a little kid who poked a stick into a hornets nest. :)

    ReplyDelete