Skip to main content

Red power, Blue power

While building out a new data center, we took a look at what our practice had been for provisioning power for servers and devices within a rack. As pretty much everyone does, we bring two separate 220v circuits into each rack. One circuit terminates on some kind of power distribution unit (PDU) on one side of the back of the rack, the other circuit terminates on a different PDU on the other side of the rack.

When we provision a new rack and install equipment into the rack, obviously we carefully route and label all the power cables and make sure that each power supply on a dual powered server or device is connected to a different PDU. We also make sure that if we have an HA pair of single power supply devices, one device is connected to each PDU. We typically connect the left power supply to the left PDU and the right power supply to the right PDU. In most cases we test the power by turning on all the servers in the rack and intentionally failing each PDU, one at a time. In theory, if either circuit fails, any dual power supply devices will alarm us on a power supply failure, the load will transfer to the other PDU (which will not be overloaded, right?) and your application will not notice. The HA pair of single powered devices will do a failover and failback.

Everything is good, live goes on.

Until a few years go by, a few servers get re-racked, and a few vendor tech's swap a few server parts. Eventually someone will plug something in wrong, and during the next circuit failure you'll have unexpected downtime. (Don't ask me how I know....). The question then is: How do you know that you still have proper power redundancy after the rack has been tweaked around with for a few years?

Red power, Blue power

We bought a couple rolls of colored electrical tape from the local big box home store and wrapped a band around each end of the power cords connected to one PDU in one color and the power cords connected to the other PDU with the other color. (For us, red on the right side, blue on the left.). Now a quick glance at the back of a rack after intrusive maintenance will tell us if we have properly attached dual power supply devices, and more importantly, will tell is that our redundant pairs of single power supply devices are each connected to separate power. Mismatched colors stick out like a sore thumb.

I hate it when I sound like Martha Stewart.


Popular posts from this blog

Cargo Cult System Administration

“imitate the superficial exterior of a process or system without having any understanding of the underlying substance” --Wikipedia During and after WWII, some native south pacific islanders erroneously associated the presence of war related technology with the delivery of highly desirable cargo. When the war ended and the cargo stopped showing up, they built crude facsimiles of runways, control towers, and airplanes in the belief that the presence of war technology caused the delivery of desirable cargo. From our point of view, it looks pretty amusing to see people build fake airplanes, runways and control towers  and wait for cargo to fall from the sky.The question is, how amusing are we?We have cargo cult science[1], cargo cult management[2], cargo cult programming[3], how about cargo cult system management?Here’s some common system administration failures that might be ‘cargo cult’:Failing to understand the difference between necessary and sufficient. A daily backup is necessary, b…

Ad-Hoc Versus Structured System Management

Structured system management is a concept that covers the fundamentals of building, securing, deploying, monitoring, logging, alerting, and documenting networks, servers and applications. Structured system management implies that you have those fundamentals in place, you execute them consistently, and you know all cases where you are inconsistent. The converse of structured system management is what I call ad hoc system management, where every system has it own plan, undocumented and inconsistent, and you don't know how inconsistent they are, because you've never looked.

In previous posts (here and here) I implied that structured system management was an integral part of improving system availability. Having inherited several platforms that had, at best, ad hoc system management, and having moved the platforms to something resembling structured system management, I've concluded that implementing basic structure around system management will be the best and fastest path to…

The Cloud – Provider Failure Modes

In The Cloud - Outsourcing Moved up the Stack[1] I compared the outsourcing that we do routinely (wide area networks) with the outsourcing of the higher layers of the application stack (processor, memory, storage). Conceptually they are similar:
In both cases you’ve entrusted your bits to someone else, you’ve shared physical and logical resources with others, you’ve disassociated physical devices (circuits or servers) from logical devices (virtual circuits, virtual severs), and in exchange for what is hopefully better, faster, cheaper service, you give up visibility, manageability and control to a provider. There are differences though. In the case of networking, your cloud provider is only entrusted with your bits for the time it takes for those bits to cross the providers network, and the loss of a few bits is not catastrophic. For providers of higher layer services, the bits are entrusted to the provider for the life of the bits, and the loss of a few bits is a major problem. The…