Skip to main content

Posts

Showing posts from April, 2010

Oracle/Sun ZFS Data Loss – Still Vulnerable

Last week I wrote about how we got bit by a bug and ended up with lost/corrupted Oracle archive logs and a major outage. Unfortunately, Oracle/Sun’s recommendation – to patch to MU8 – doesn’t resolve all of the ZFS data loss issues.There are two distinct bugs, one fsync() related, the other sync() related. Update 8 may fix 6791160 zfs has problems after a panic , butBug ID 6880764“fsync on zfs is broken if writes are greater than 32kb on a hard crash and no log attached”is apparently not resolved until 142900-09 released on 2010-04-20.DBA’s pay attention: Any Solaris 10 server kernel earlier than Update 8 + 142900-09 that is running any application that synchronously writes more than 32k chunks is vulnerable to data loss on abnormal shutdown.As best as I can figure – with no access to any information from Sun other than what’s publically available – these bugs affect synchronous writes large enough to be written directly to the pool instead of indirectly via the ZIL. After an abnormal…

We do not retest System [..] every time a new version of Java is released.

This post’s title is a quote from Oracle technical support on a ticket we opened to get help running one of their products on a current, patched JRE. Oracle’s response:“1. Please do not upgrade Java if you do not have to
2. If you have to upgrade Java, please test this on your test server before implemeting [sic] on production
3. On test and on production, please make a full backup of your environment (files and database) before upgrading Java and make sure you can roll back if any issue occurs.”In other words – you are on your own. The hundreds of thousands of dollars in licensing fees and maintenance that you pay us don’t do you sh!t for security.Let’s pretend that we have a simple, clear and unambiguous standard: ‘There will be no unpatched Java runtime on any server’.There isn’t a chance in hell that standard can be met. This seems to be a cross vendor problem. IBM’s remote server management requires a JRE on the system that has the application that connects to the …

Bit by a Bug – Data loss Running Oracle on ZFS on Solaris 10, pre 142900-09 (was: pre Update 8)

We recently hit a major ZFS bug, causing the worst system outage of my 20 year IT career. The root cause:Synchronous writes on ZFS file systems prior to Solaris 10 Update 8 are not properly committed to stable media prior to returning from fsync() call, as required by POSIX and expected by Oracle archive log writing processes.On pre Update 8 MU8 + 142900-09[1], we believe that a programs utilizing fsync() or O_DSYNC writes to disk are displaying buffered-write-like behavior rather than un-buffered synchronous writes behavior. Additionally, when there is a disk/storage interruption on the zpool device and a subsequent system crash, we see a "rollback" of fsync() and O_DSYNC files. This should never occur, as write with fsync() or O_DSYNC are supposed to be on stable media when the kernel call returns. If there is a storage failure followed by a server crash[2], the file system is recovered to an inconsistent state. Either blocks of data that were supposedly synchronously writ…

Phishing Attempt or Poor Customer Communications?

I’ve just ran into what’s either a really poor customer communications from Hewlett-Packard, or a pretty good targeted phishing attempt.The e-mail, as received earlier today:From: gcss-case@hpordercenter.comSubject: PCC-Cust_advisoryDear MIKE JAHNKE,
HP has identified a potential, yet extremely rare issue with HP
BladeSystem c7000 Enclosure 2250W Hot-Plug Power Supplies manufactured prior to March 20, 2008. This issue is extremely rare; however, if it does occur, the power supply may fail and this may result in the unplanned shutdown of the enclosure, despite redundancy, and the enclosure may become inoperable.
HP strongly recommends performing this required action at the customer's earliest possible convenience. Neglecting to perform the required action could result in the potential for one or more of the failure symptoms listed in the advisory to occur. By disregarding this notification, the customer accepts the risk of incurring future power supply failures.

3.5 Tbps

Interesting stats from Akamai:12 million requests per second peak 500 billion requests per day 61,000 servers at 1000 service providers The University hosts an Akamai cache. My organization uses the University as our upstream ISP, so we benefit from the cache.The  Universities Akamai cache also saw high utilization on Thursday and Friday of last week. Bandwidth from the cache to our combined networks nearly doubled, from about 1.2Gbps to just over 2Gbps. The Akamai cache works something like this:Akamai places a rack of gear on the University network in University address space, attached to University routers. The Akamai rack contains cached content from Akamai customers. Akamai mangles DNS entries to point our users to the IP addresses of the Akamai servers at the University for Akamai cached content. Akamai cached content is then delivered to us via their cache servers rather than via our upstream ISP’s. It works because:the content provider doesn’t pay the Tier 1 ISP’s for transpor…

The Internet is Unpatched – It’s Not Hard to See Why

It’s brutal. We have Internet Explorer vulnerabilities that need a chart to explain, a Mac OS X update that’s larger than a bootable Solaris image, a Java security update, two FirefoxUpdates, Adobe and Foxit! PDF readers that apparently are broken, as designed, and three flagship browsers that rolled over and died in one contest.Responsible network administrators and home users have been placed into patch hell by software vendors that simply are not capable of writing software that can stand up to the Internet.There is no operating system or platform that has built in patch management technology that is both comprehensive and easy for network administrators and home users to understand or use.There is no reason to expect that even if software vendors were actually able to release good code, that the release would make it out to users desktops.Some vendors (Microsoft) have robust and easy to use patch distribution systems, but those systems only distribute patches for their software. E…