Oracle/Sun ZFS Data Loss – Still Vulnerable

Last week I wrote about how we got bit by a bug and ended up with lost/corrupted Oracle archive logs and a major outage. Unfortunately, Oracle/Sun’s recommendation – to patch to MU8 – doesn’t resolve all of the ZFS data loss issues.

There are two distinct bugs, one fsync() related, the other sync() related. Update 8 may fix 6791160 zfs has problems after a panic, but

Bug ID 6880764 “fsync on zfs is broken if writes are greater than 32kb on a hard crash and no log attached”

is apparently not resolved until 142900-09 released on 2010-04-20.

DBA’s pay attention: Any Solaris 10 server kernel earlier than Update 8 + 142900-09 that is running any application that synchronously writes more than 32k chunks is vulnerable to data loss on abnormal shutdown.

As best as I can figure – with no access to any information from Sun other than what’s publicly available – these bugs affect synchronous writes large enough to be written directly to the pool instead of indirectly via the ZIL. After an abnormal shutdown, on reboot the ZIL replay looks at the metadata in the ZIL wacks the write (and your Oracle Archive logs).

It appears that you can 
  • limit database writes to 32k (and kill database performance)
  • or you can force writes larger than 32k to be written to the ZIL instead of the pool by setting zfs_immediate_write_sz larger than your largest database write (and kill database performance)
  • or you can use a separate log intent device (slog)
  • or you can update to 142900-09
Ironically, the ZFS Evil Tuning Guide recommends the opposite – set “the zfs_immediate_write_sz parameter to be lower than the database block size” so that all database writes take the broken direct path.

Another bug that is a consideration for an out of order patch cycle and rapid move to 142900-09:

Bug ID 6867095: “User applications that are using Shared Memory extensively or large pages extensively may see data corruption or an unexpected failure or receive a SIGBUS signal and terminate.”

This sounds like an Oracle killer.

I’m not crabby about a killer data loss bug in ZFS. I’m crabby because Oracle/Sun knew about the bug and its enormous consequences and didn’t do a dammed thing to warn their customers. Unlike Entrust – who warned us that we had a bad cert even though it was our fault that our SSL certs had no entropy, and unlike Microsoft who warned its customers about potential data loss, Sun/Oracle really has their head in the sand on this.

Unfortunately – when your head is in the sand, your ass is in the air.