It's not done until it's deployed and working!
This seems so obvious, yet my experience hosting a couple large ERP-like OLTP applications indicates that sometimes the obvious needs to be made obvious. Because obviously it isn't obvious.
A few years ago, when I inherited the app server environment for one of the ERP-like applications (from another workgroup), I decided to take a look around and see how things were installed, secured, managed and monitored. The punch list of things to fix got pretty long, pretty quickly. Other than the obvious 'everyone is root' and 'everything runs as root' type of problems, one big red flag was the deployment process. We had two production app servers. They were not identical. Not even close. The simplest things, like the operating system rev & patch level, the location of configuration files, file system rights and application installation locations were different, and in probably the most amusing and dangerous application configuration I've ever seen, the critical config files on one of the app servers was in the samples directory.
So how do we fix a mess like that? We started over, twice (or three times).
Our first pass through was intended to to fix obvious security problems and to make all servers identical. We deployed a standard vendor operating system installation on four identical servers, 1 QA, 3 production, deployed the necessary JBoss builds on each of the servers in a fairly standard install, and migrated the application to the new servers. This fixed some immediate and glaring problems and gave us the experience we needed to take on the next couple of steps.
The second pass through the environment was designed to get us closer to the 'least bit' installation best practice, clean up some ugly library and application dependencies, and develop a standardized, scripted deployment, guaranteed to be identical on each servers, and guaranteed to be identical to test/QA.
The first part, the 'least bit' installation, simply means that in the JBoss directory structure, if one were to remove a single bit from file system permissions, config files, or the application itself, the app would break. This ensures that file system permissions are optimal and that the application has no extra junk (sample configs, sample JDBC connections) laying around that can cause security or availability problems.
The application deployment process that we developed was very interesting. We decided that we wanted to have a completely version controlled deployment, that included all files that the entire application needs for functionality other than the vendor provided operating system files. We checked the entire JBoss application, including the binaries, configs, war's, jar's & whatever, into Subversion (SVN). The deployment is essentially a process that checks out the entire application from SVN, drops it onto a production app server and removes the entire old application. The idea is that we now know not only what is deployed on each server, and that all servers are identical, we also know exactly what was deployed on all servers at any arbitrary point in the past.
The process to get the application ready to deploy is also version controlled and scripted. Our dev team builds a JBoss instance and their application binaries and configs using their normal dev tool kit. When they think that they have a deployable application, including the code they wrote, the JBoss application and any configuration files, they package up the application, check it into a special deployment SVN repository, and deploy the entire application to a QA/test environment. They've got the ability to deploy to the test/QA servers any time they want, but they do not have the ability to modify the QA/test environment other than through the scripted, version controlled deployment process. If the app doesn't work, they re-configure, bug fix, check it back into the deployment SVN repository and re-deploy to QA/test. Once the app is QA'd and tested, a deployment czar in the dev team modifies any config files that need to be different between QA and prod (like database connect strings) and commits the application back into the deployment SVN repository.
Then we get a note to 'deploy version 1652 to prod next Monday'. On Monday, we take each app server offline, in turn, run scripts that archive the old installation, copy the entire new application to the production server, test it, and bring the app server back on line. Repeat once for each app server & we are done.
We made a third pass through the app servers. This time we changed the OS version, the platform and we implemented basic Solaris 'Zones' to give us a sort of pseudo-virtualization, and we applied the 'least bit' principle to the operating system itself (reducing its disk footprint from 6GB down to 1GB). We also, as a result of the systematic deployment process, have the ability to light up new app servers fairly easily. A Solaris WAN boot from a standard image, a bit of one-time tweaking of the OS (hostname, IP address) and a deploy of the JBoss app from SVN gets us pretty close to a production app server.
We have a ways to go yet. Some parts of the deploy process are clunky, and our testing methodology is pretty painful and time consuming, and we are not fully 'containerized' into Solaris zones. The dev team wants at least one more pre-production environment, and we need to make the deployment process less time consuming.
We aren't done yet.
2008-03-08
2008-03-08