Skip to main content

On Least Bit Installations and Structured Application Deployment

I ran across this post by 'ADK'.

It's not done until it's deployed and working!

This seems so obvious, yet my experience hosting a couple large ERP-like OLTP applications indicates that sometimes the obvious needs to be made obvious. Because obviously it isn't obvious.

A few years ago, when I inherited the app server environment for one of the ERP-like applications (from another workgroup), I decided to take a look around and see how things were installed, secured, managed and monitored. The punch list of things to fix got pretty long, pretty quickly. Other than the obvious 'everyone is root' and 'everything runs as root' type of problems, one big red flag was the deployment process. We had two production app servers. They were not identical. Not even close. The simplest things, like the operating system rev & patch level, the location of configuration files, file system rights and application installation locations were different, and in probably the most amusing and dangerous application configuration I've ever seen, the critical config files on one of the app servers was in the samples directory.

So how do we fix a mess like that? We started over, twice (or three times).

Our first pass through was intended to to fix obvious security problems and to make all servers identical. We deployed a standard vendor operating system installation on four identical servers, 1 QA, 3 production, deployed the necessary JBoss builds on each of the servers in a fairly standard install, and migrated the application to the new servers. This fixed some immediate and glaring problems and gave us the experience we needed to take on the next couple of steps. 

The second pass through the environment was designed to get us closer to the 'least bit' installation best practice, clean up some ugly library and application dependencies, and develop a standardized, scripted deployment, guaranteed to be identical on each servers, and guaranteed to be identical to test/QA.

The first part, the 'least bit' installation, simply means that in the JBoss directory structure, if one were to remove a single bit from file system permissions, config files, or the application itself, the app would break. This ensures that file system permissions are optimal and that the application has no extra junk (sample configs, sample JDBC connections) laying around that can cause security or availability problems.

The application deployment process that we developed was very interesting. We decided that we wanted to have a completely version controlled deployment, that included all files that the entire application needs for functionality other than the vendor provided operating system files.  We checked the entire JBoss application, including the binaries, configs, war's, jar's & whatever, into Subversion (SVN). The deployment is essentially a process that checks out the entire application from SVN, drops it onto a production app server and removes the entire old application. The idea is that we now know not only what is deployed on each server, and that all servers are identical, we also know exactly what was deployed on all servers at any arbitrary point in the past. 

The process to get the application ready to deploy is also version controlled and scripted. Our dev team builds a JBoss instance and their application binaries and configs using their normal dev tool kit. When they think that they have a deployable application, including the code they wrote, the JBoss application and any configuration files, they package up the application, check it into a special deployment SVN repository, and deploy the entire application to a QA/test environment.  They've got the ability to deploy to the test/QA servers any time they want, but they do not have the ability to modify the QA/test environment other than through the scripted, version controlled deployment process. If the app doesn't work, they re-configure, bug fix, check it back into the deployment SVN repository and re-deploy to QA/test.  Once the app is QA'd and tested, a deployment czar in the dev team modifies any config files that need to be different between QA and prod (like database connect strings) and commits the application back into the deployment SVN repository.

Then we get a note to 'deploy version 1652 to prod next Monday'. On Monday, we take each app server offline, in turn,  run scripts that archive the old installation, copy the entire new application to the production server, test it, and bring the app server back on line. Repeat once for each app server & we are done. 

We made a third pass through the app servers. This time we changed the OS version, the platform and we implemented basic Solaris 'Zones'  to give us a sort of pseudo-virtualization, and we applied the 'least bit' principle to the operating system itself (reducing its disk footprint from 6GB down to 1GB). We also, as a result of the systematic deployment process, have the ability to light up new app servers fairly easily. A Solaris WAN boot from a standard image, a bit of one-time tweaking of the OS (hostname, IP address) and a deploy of the JBoss app from SVN gets us pretty close to a production app server. 

We have a ways to go yet. Some parts of the deploy process are clunky, and our testing methodology is pretty painful and time consuming, and we are not fully 'containerized' into Solaris zones.  The dev team wants at least one more pre-production environment, and we need to make the deployment process less time consuming. 

We aren't done yet.

2008-03-08

Comments

Popular posts from this blog

Cargo Cult System Administration

“imitate the superficial exterior of a process or system without having any understanding of the underlying substance” --Wikipedia During and after WWII, some native south pacific islanders erroneously associated the presence of war related technology with the delivery of highly desirable cargo. When the war ended and the cargo stopped showing up, they built crude facsimiles of runways, control towers, and airplanes in the belief that the presence of war technology caused the delivery of desirable cargo. From our point of view, it looks pretty amusing to see people build fake airplanes, runways and control towers  and wait for cargo to fall from the sky.The question is, how amusing are we?We have cargo cult science[1], cargo cult management[2], cargo cult programming[3], how about cargo cult system management?Here’s some common system administration failures that might be ‘cargo cult’:Failing to understand the difference between necessary and sufficient. A daily backup is necessary, b…

Ad-Hoc Verses Structured System Management

Structured system management is a concept that covers the fundamentals of building, securing, deploying, monitoring, logging, alerting, and documenting networks, servers and applications. Structured system management implies that you have those fundamentals in place, you execute them consistently, and you know all cases where you are inconsistent. The converse of structured system management is what I call ad hoc system management, where every system has it own plan, undocumented and inconsistent, and you don't know how inconsistent they are, because you've never looked.

In previous posts (here and here) I implied that structured system management was an integral part of improving system availability. Having inherited several platforms that had, at best, ad hoc system management, and having moved the platforms to something resembling structured system management, I've concluded that implementing basic structure around system management will be the best and fastest path to …

The Cloud – Provider Failure Modes

In The Cloud - Outsourcing Moved up the Stack[1] I compared the outsourcing that we do routinely (wide area networks) with the outsourcing of the higher layers of the application stack (processor, memory, storage). Conceptually they are similar:
In both cases you’ve entrusted your bits to someone else, you’ve shared physical and logical resources with others, you’ve disassociated physical devices (circuits or servers) from logical devices (virtual circuits, virtual severs), and in exchange for what is hopefully better, faster, cheaper service, you give up visibility, manageability and control to a provider. There are differences though. In the case of networking, your cloud provider is only entrusted with your bits for the time it takes for those bits to cross the providers network, and the loss of a few bits is not catastrophic. For providers of higher layer services, the bits are entrusted to the provider for the life of the bits, and the loss of a few bits is a major problem. The…