Monday, March 17, 2008

Introducing a new technology to an enterprise (ZFS)

The introduction of something as critical as a new file system results in an interesting exercise in introducing and managing new technology. Like most small or medium sized shops, we have limitations on our ability to experiment, test and QA new technology. Our engineering and operations staff together is a small handful of persons per technology. Dedicated test labs barely exist and all of our people have daily operational and on-call roles with no formal 'play' time. Spending large blocks of time on things that are too far ahead of where we are today isn't feasible. Yet the pace of technology introduction dictates that we do not slide too far behind the curve on things that are critical to our enterprise.

So how do you go about introducing something this critical to an enterprise under that sort of constraint? We try to find a mix of caution, mitigated risk taking and methodical deployment. Our resources do not permit dedicated test staff or formal test plans, so we compensate and reduce risk by methodical and measured deployment.

Introducing ZFS:

I'm assuming that most would agree that file systems are probably the most critical technology that IT professionals manage. Networks tend to be tolerant of occasional loss of packets or scrambled bits. The IP protocol stack tends to be tolerant of that sort of thing, having been designed with enough resiliency built in to recover from all sorts of errors. A file system isn't quite like that. Failure or corruption of a critical file system is certain to be an event that you'll not want to have happen too often in your career. New file systems don't come along very often, and because we tend to be rather risk adverse on things like file systems, they tend to be difficult to introduce into an enterprise.

ZFS promises to be significant, but because it is radically different from previous Sun file systems, we have to assume that it will have bugs & need time to get sorted out, and we assume that we need time to get our skill set an operational proficiency focused on the new technology.

The process that we use to introduce the technology will be critical to future availability and performance of our systems.

Here's the path we took:
  • Sanity check
  • Test/lab environment
  • Limited deploy, non critical, non-customer, low I/O load
  • Limited deploy, non critical, non-customer, high I/O load
  • Limited deploy, critical, customer, low I/O load.
  • Limited deploy, critical, customer, high I/O load.
  • General deployment

Sanity Check:


We started out with a simple sanity check on the technology.

  • Does it offer significant advantages over current technology?
  • Does it appear to solve an identified operational or security problem?
  • Could a rough cost/benefit be calculated base on an initial review of the technology?
  • Is this the strategic direction for the vendor, and are we aligned with that vendors strategic direction?
  • Are the vendors claims reasonable and verifiable?
  • Will we be able to manage the technology?
  • Will we be able to replace or deprecate some other technology, or is a duplicate of existing technology?

A pass through the sanity check indicated that we ought to at least spend a few spare cycles looking at ZFS. We spend significant effort in managing UFS file systems using traditional logical volume managers, and we have pain points around dynamically adding & removing disk space for databases and applications. Our current LVM model looks very much like pooled storage, but with the overhead of having to manually manage extents withing logical volumes. The promise of pool based storage, similar to our EVA's but at the operating system layer, looked interesting. Sun claimed commitment to ZFS, and we have a significant investment in Sun technology.

So we started to 'play' around with ZFS on a low-priority, off hours basis, to determine if the excitement surrounding the technology was justified, and more importantly, would the technology fit with, and add value to our enterprise hosting service. All the exercises outlined here were informal & adhoc.

Lab #1:

Our initial exposure to ZFS was a simple series of informal tests on test severs running early access or developer preview ZFS code. We built pools and file systems, first on pseudo mounts of ordinary files on UFS file systems, then on real disk slices. The initial tests were mostly just replicating the simple examples that Sun engineers and others posted about in their blogs. We built & destroyed the pools and file systems, intentionally failed disks and lun's, snapped & cloned, and otherwise explored the basic feature set. Based on these initial tests, we concluded that even this early, the technology was roughly as manageable as our existing technology, and that the potential for simplifying disk management might make the cost of implementation recoverable. In short, it was interesting enough to take a look at in a bit more detail.

Lab #2:

If lab #1 was a simple review of what the reviewers already blogged about, lab #2 was intended to explore the edges a little bit more, primarily looking at how gracefully the file system would fail. If we thought that the technology had well defined edges and would fail predictably, or at least at in a predictable and recoverable manner.

Enter the 'RAIF'. (Redundant Array of Inexpensive Flash devices). USB flash drives have some interesting properties. They are cheap, easy to plug in, configure, shuffle around, and they can easily be moved to other computers for testing write failures and to introduce data corruption We looked at building temporary SCSI arrays, but for various time, space and power reasons, we picked a USB based test platform. The bill of materials was something like:

  • Two cheap USB adapters
  • Two powered 4 port USB hubs
  • 8 USB flash drives of various size (whatever was cheap.)
That was enough to build a handful of different ZFS pools in various configurations, and easily test physical and logical failure modes. The USB drives were pretty good at inducing failures in the file system, so they made a good platform for testing the general resiliency of ZFS and gave us a pretty good idea how well Sun thought through the edges of the technology and how well the file system was at managing it own failure modes and cover cases. The file system recovered when we expected it to, and failed when we expected it to, and vendor claims were generally verified by our test. (The performance of the USB driver stack was not considered part of the test.)

Our conclusion was that we should keep looking at the technology on a low priority basis.

Lab #3

From the RAIF we went to a more conventional file system test on ordinary SCSI drives. The goal of this test was to simulate disk I/O load with ordinary test & benchmark suites and compare ZFS to UFS under something that resembles ordinary applications. A series of benchmark-like tests indicated to us that the technology lived up to it claims at least as well as any other new technology, and we agreed that the technology might be valuable to us if we could manage it and if it could be used to replace the UFS file systems that are under a logical volume manager.

Deployment #1, low I/O, low impact.

Eventually, as time permitted, we decided to try a file system on a server that was active, in production, but wasn't failure critical. We essentially gave ZFS a 'test run' by using it on a few severs that are part of our sever management infrastructure. Any pain would be felt by our peers, not our customers. If we felt no pain, we could keep moving. By this time Sun had added ZFS to Solaris 10, so we could move ahead on a low impact server and be fully supported by Sun.

The first production obstacle was vendor support for 3rd party tools and utilities on ZFS. Legato qualified ZFS just about the time that we needed it, and our management infrastructure in mostly home-grown, so we didn't have other significant software compatibility issues. The file system performed as designed under the various low use, low impact environments.

Deployment #2, high I/O, low impact.

Our next step was to start using ZFS in places where we have interesting I/O loads, but where we don't have data that is absolutely irreplaceable. At the time that we were ready to move forward with another ZFS implementation, we were also re-engineering our enterprise backup to use a disk pool as a staging area for the tape backups. This gave us an opportunity to test ZFS in parallel to the technology it replaced at greatly reduced risk.

Our first large, high I/O ZFS implementation was a FATA disk pool on an EVA8000 that we use as the staging area for server backup jobs. Because we were in a position where if it didn't work we could back out and re-configure fairly easily, we took advantage of the opportunity and went with ZFS. We started out with a single 2TB lun, so ZFS sees the lun as one large disk, not many small disks.The performance was excellent, and because it was trouble free, we used ZFS for the entire disk pool.

This pool is now more or less made up of 5 2TB luns in a single pool, for a total of 10TB. We initially write all Legato save sets to this disk pool, then clone them out to other media, both real tape & virtual tape. So far, that ZFS file system has performed very well. We routinely read & write well over 100MBps to the pool, or somewhere around 10 TB per weekend, with no significant issues directly related to the file system. (There are kernel issues indirectly related to the file system that sometimes affect performance, but the file system itself works.)

ZFS pool #2 is a syslog server. We spool tens of thousands of syslog, apache and netflow logs per second to a 2+TB disk pool on older Sun storage on a first generation T2000. That pool works as expected.

Both of these are systems would be non-customer affecting if they failed. Backups can be re-run, and logs can be recovered from tape.

Deployment#3, Low I/O, high impact;

We have other ZFS file systems in various spots where we have opportunity to experiment, but now we are using ZFS for production, customer impact systems, but not on production databases. The customer facing implementations are all covered by load balancing or some other non-file system dependent redundancy. So far they all work as expected. We are also working through the details of hosting zones on ZFS file systems, with the intent of giving us more flexibility in hosting zoned applications. We have not yet put large Oracle instances on ZFS. For us, database file systems are the most critical, so we are the most cautious.

Future -

High I/O, high impact.


We are exercising ZFS and gaining enough operational experience that we should soon be comfortable with moving toward general, unrestricted deployment on ZFS. Our next implementation should be a customer facing, high I/O application, but probably not a database sever. Right now we do not have an application that fits those requirements, so this phase is delayed.

General Deployment

Barring any major problems with the current ZFS implementations, and the above mentioned kernel issue aside, it looks to us like ZFS, will eventually be our default file system, to be used generally across Solaris servers. Our rollout has spanned almost a couple of years, but we have had no setbacks, compatibility problems, outages or data loss related to any of the ZFS implementations.

No comments:

Post a Comment