Last In - First Out: Thirty-four years in IT - Swimming with the Itanic (Part 8)

For historical reasons, we were a strong VMS shop. Before they imploded, Digital Equipment treated EDU's very kindly, offering extremely good pricing on software in exchange for hardware adoption. In essence, a college could get an unlimited right to use a whole suite of Digital Equipment software for a nominal annual fee, and Digital had a very complete software catalog. So starting in the early 1990's, our internally developed student records system (ERP) ended up on the VMS/VAX/RDB stack.

Digital imploded and got bought by Compaq, who got bought by HP, Somewhere along the line the RDB database line ended up at Oracle.

For most of our time on VMS & RDB we suffered from severe performance problems. Our failure in addressing the problems was two-fold - the infrastructure team didn't have good performance data to feed back to the developers, and the development team considered performance to be an infrastructure/hardware problem. This resulted in a series of frantic and extremely expensive scrambles to upgrade VAX/Alpha server hardware. It did not however, result in any significant effort to improve the application design.

Between 1993 and 2005, we cycled through each of:

Standalone VAX 4000's
Clustered AlphaServer 4100's
Standalone AlphaServer GS140's
Standalone AlphaServer GS160's

And of course mid-life upgrades to each platform.

Each upgrade cost $millions in hardware, and each upgrade only solved performance problems for a brief period of time. The GS160's lasted the longest and performed the best, but at an extremely high cost. At no point in time did we drill deeply into application architecture and determine where the performance problems originated.

During that time frame we got advice from Gartner that suggested that moving from VMS to Unix was desirable, but moving from RDB to Oracle was critical, as they did not expect Oracle to live up to their support commitments for the RDB database product. So in 2009 we moved from 35 individual RDB databases spread across four GS160's, to one Oracle 10G database on a Sun Microsystems E25k, in a single, extremely well implemented weekend-long database migration, platform migration, and 35:1 database merger. Kudos to the development team for pulling that off.

Unfortunately we carried forward large parts of the poor application design and transferred the performance problems from RDB to Oracle. At time though, the DBA's were part of my team. I had a very good Oracle DBA and Unix sysadmin, both of whom were able to dig into performance problems and communicate back to developers. We were pretty good at detailing the performance problems and offering remedies and suggested design changes.

Though performance slowly got better, the full impact of poor application design was yet to be felt.

As soon as the databases were combined and hosted on SPARC hardware, continuing with the GS160's made no sense. They were costing $600k/yr in hardware and software maintenance, now were significantly oversize, and were still running the dead-end OpenVMS operating system. This put us in a tough spot. The development team was focused on minimizing their commitment to any re-platforming and was only interested in a move from AlphaServer to Itanium. For me, Itanium (or Itanic, as I called it at the time) was a dead end, and our only move should be to Unix (Solaris). But because the cost to migrate to Itanic was much lower - the application would only have to be recompiled, not re-platformed - the Itanic advocates won the argument. We ended up purchasing Itanium blade servers at a 3-year cost roughly equal to 18 months of support on the GS160's.

By that time HP's support for OpenVMS had eroded badly. Support for Oracle clients, Java, and other commonly used software was poor or non-existent. That OpenVMS was dead was visible to all but the few for whom OpenVMS was a religious experience.

As we were bashing the decision around in 2009, I strongly suggested that if we purchased Itanium in we'd be on the dead-end OpenVMS platform for five more years. I was wrong. We were on Itanium AlphaServer blades and OpenVMS nine years, until 2018. The (only) good part of that decision was that the Itanium blade servers ran very well and were inexpensive to maintain. And as OpenVMS was pretty dead by then, we did not spend very much time on patches and upgrades, as few were forthcoming from HP.

This is a case where our reluctance to take on some short-term pain resulted in our having to maintain a dead-end obsolete system for many years.

Part 7 - Addressing Application
Part 9 - The Application that Almost Broke Me