Thursday, February 26, 2009

Regulation E.

Spent the weekend digging into Regulation E., particularly Section 205.11. That’s the part where you try to convince your regional bank that you really didn’t authorize those charges, that you were not ‘card present’ in New York, and you didn’t have homeless people in your house rummaging through your stuff, borrowing your debit card, jetting to the east coast, buying cosmetics and jetting back.

This isn’t unexpected. We’ve kept this debit card attached to a special checking account that we never have more than $400 in at any time, just for this reason. The theory is that transactions will start to fail before the damage gets too expensive. In practice, I’m not sure if the bank will honor the overdraft attempts or not. I’d be un-amused if they had some sort of ‘convenience’ feature that turned the fraud into overdrafts and then into 22% loans. That would be a bad day.

This particular card was only used at a small number of merchants, mostly local and regional grocery chains, so my guess is that either a local/regional merchant or their upstream provider has a leak. The bank had already pulled the card and reissued it a couple days before we saw the bogus transactions.

So now I’m in paranoid mode, or more likely I’m in more-paranoid-than-usual mode. The good news is that I can finally close the loop on what I’ve been saying for years, namely ‘I wouldn’t be paranoid if everyone weren’t out to get me!’.

BofA-AlertsUnfortunately the regional bank doesn’t have anything that helps mitigate something like this other than checking your online statement every day and sending a postal letter to ‘Regulation E Department’ when bad things show up. Bank of America, on the other hand, lets me do a few interesting things. First they let me use my cell phone as a two-factor SMS based proxy when logging in to their web portal with what they call SafePass® (details here).

Second, they allow me to generate single-merchant, limited value card numbers for online transactions with what they call ShopSafe®. With ShopSafe I can spin up a different card numbers with different limits and expiration dates for each online vendor on an ad-hoc or as needed basis. This allows me to approximate single use cards.

Third, they have a reasonably robust SMS alerting system that allows me to set up alerts for routine activity that may or may not be an indicator of irregular activity, such as ‘any charge over $50’ or ‘Transaction outside of US’. BofA-Alerts2They send me the SMS, I decide if it’s irregular. I like the idea of getting an SMS when someone logs into my account, changes my address, charges purchases online, orders checks, etc. Having some information ‘out of band’ can’t hurt. Unfortunately none of this really prevents anything, it just makes detection faster and easier.

The images list the various alerts that are configurable.

The only down side to getting an SMS every time you use your card is that some merchants don’t post transactions at the time of purchase. Occasionally I’ll buy something at noon and get woke up at 4am with an SMS from BofA telling me that I bought something 16 hours ago. Overall though, that’s better than any alternative that I know of, and in this case would have alerted us to the fraud much sooner.

For me, the more SMS’s the better.

Wednesday, February 25, 2009

Your PaaS Provider Failed, What’s Plan B?

Update: It's only beta, so no harm done,  but here's another example: Contacts on Ovi beta database failed

-----------------------Original Post-------------------------
Coghead:
SAP has purchased Coghead’s intellectual property assets…SAP did not assume any of Coghead’s customer relationships or obligations and, at this point in time, SAP does not have plans to continue offering the Coghead service commercially…
Infoweek:
"Customers can take the XML out that describes their application, but the reality is that only runs on Coghead, so customers will need to rewrite their app with something different,"
Hoff:
“It's a friendly reminder that "whens you rolls da dice, you takes your chances." Prudent and pragmatic risk assessment and relevant business decisions still have to be made when you decide to place your bets on a startup.  Just because you move to the Cloud doesn't mean you stop employing pragmatic common sense. I hope these customers have a Plan B."
Rich Miller:
"Now, what this DOES emphasize is the importance of standards (de facto or de jure) by which interoperability or portability can be assured. Remember... the fear of "cloud vendor lock-in" doesn't only apply to that scenario in which the vendor has captured the customer and is "extorting" unreasonable fees. Lock-in also applies to being hand-cuffed to a boat anchor with no ship to prevent it (and the customer) from sinking into the depths. It argues for minimum sufficient means by which a customer can be assured of a migration path... not necessarily a cost-free, frictionless move from one platform to another, but an assurance of salvagability at a cost that's significantly less than a 100% do-over."
(emphasis is mine)

Janke:

You've bet the farm, your career, your bosses career on a technology, vendor or cloud provider. Assume the technology, vendor or cloud provider fails. What is your exit strategy? There is a clear case here for standards. The closer you are to something that is standardized and/or multi-vendor, the better off you'll be when things go bad.

I've had one really bad experience with a propriety technology (document management) that went bad. The vendor (DEC) sold off the application as they were gasping for air in the early 90's. The company that they sold it to disappeared. The documents (that now had no paper backups) existed only in the depths of a proprietary format, accessible only from hardware and software that was no longer available, built by a company that no longer existed. And the retirement incomes of thousands of faculty depended on those documents. That sucked. Finding a former employee of DEC who new enough about the software to write a custom program to convert thousands of proprietary formatted documents into a format readable by something non-proprietary was difficult. Paying that person, who was very aware of the difficultly of our situation and probably rather bitter about the whole layoff/unemployment thing, was expensive.

I'm in a relatively stable organization and I tend to stick around long enough to clean up any messes that I make, so for me, having at least a rough idea of an exit strategy makes jumping in with both feet much easier. If there is a failure, at least we have an idea how to salvage the project or technology at a cost less than '100% do-over'. My guess is that if I were a in startup, where failure means you shut down the startup and pop up another, or if career-wise I were a jumper (new job every few years), I'd have a different attitude. In that case,Damn the torpedoes! Four bells. Captain Drayton, go ahead! Jouett, full speed”

In this particular case, the abandoned customers might have gotten lucky. The competitors to the failed PaaS worked day and night to build a migration tool that lets you convert to their platform.

That’s really cool. Would you bet on that though?

See also: The Cloud - Provider Failure Modes

Thursday, February 19, 2009

Performance Benchmarks that Include Energy Efficiency Data

Signs of the times:

Energy Benchmarking: Rich Miller at Datacenter Knowledge is reporting that TPC will update their performance benchmarks to include energy efficiency data. In the future, they’ll measure performance, price and energy in their benchmarks.

Actual datacenter energy costs (rather than power supply nameplate ratings) are hard to generalize. The numbers that I can find are all over the map. Energy use depends on server load, server configuration, server efficiency, power distribution efficiency and cooling efficiency, none of which are easily calculated and rarely measured. As a rough estimate, it looks like for small servers the cost of power + cooling approaches the cost of purchasing the server hardware and amortizing it over 4 years. Figuring energy use into the price/performance calculations for systems should skew future purchases toward efficiency.

Power Calculators: HP has a rack power calculator tool that provides useful estimates of power use for a given HP server and rack configuration. APC and others provide similar tools.  I’m sure they build the tools to help figure per-rack UPS, power and cooling for custom rack configurations, but the tools can easily be used to help estimate energy costs.

Don’t forget cooling: One thing I’ve noticed is that people tend to forget that for every watt of electricity that their systems use, they’ll have more than one watt of cooling that they need to supply to remove the heat from the datacenter (or their house if they have air conditioning). The process of removing the watt of energy from the room is not 100% efficient. For example, if I have a rack that uses 5000 watts, a cooling system that was 100% efficient would use an additional 5000 watts to remove the heat from the room. But cooling systems are not 100% efficient. Worst case, you might spend up to an additional 10,000 watts of energy to cool the 5000 watt server rack. 

Wednesday, February 18, 2009

Universal Phone Chargers

This might be interesting:

…17 leading mobile operators and manufacturers….[have] set an ambitious target that by 2012 a universal charging solution (UCS) will be widely available in the market worldwide and will use Micro-USB as the common universal charging interface.

I’ve had ‘uses standard mini or micro USB’ on my ‘required’ check list for any phone, bluetooth headset, pda, navigation or other portable device for quite a while. That’s simply because I’m tired of trying to maintain multiple chargers at work, home, in each of my cars and whenever I travel. Today, If I travel with a laptop, I carry a USB->Mini cable and mini->micro adapter, and I charge my phone & headset from the laptop. If I travel w/o a laptop, I carry a couple of Motorola mini USB battery packs along (P790’s). The bottom line? If it isn’t mini or micro USB, I don’t buy it.

What’s nice about this is for manufacturers is there will no longer by any reason to include a charger with the device. For consumers, it will mean that a new phone, PDA or headset will not require the extra purchase of (in my case) three car chargers and a couple of sync cables. Everyone already has a drawer full of mini/micro USB chargers and one in every car (or at least I do).

It’ll be interesting to see if this is more than just a press release.

Via Tech at Play

Tuesday, February 17, 2009

Failed Backups – Unrecoverable Service

A small but high profile social bookmarking site ma.gnolia.com recently suffered catastrophic, unrecoverable data loss. The site’s creator and owner Larry Halff posted a video blog is which he talks about the failure and lessons learned.


Citizen Garden Episode 11: Whither Ma.gnolia? from Larry Halff on Vimeo.

Highlights from the vlog:

  • Software RAID volume or database corruption was the original cause.
  • The site was self hosted.
  • Complex dependencies made moving the site to professional hosting difficult.
  • The only backups were a copy to an attached firewire drive.
  • There were no integrity checks or test restores.
  • The site was hosted on Apple xServe’s and Mac Mini’s.

It’s a great ‘lessons learned’ for small startups. My take is that the people who create cool things on the Internet aren’t necessarily the ones that should be hosting those cool things. Those are rather different skill sets. The corollary is probably that people who are good at hosting the cool things on the Internet are likely not capable of creating them.

The big picture? When trusting others with your data, how do you know if they are taking appropriate steps to protect the data?

You don’t.

Monday, February 16, 2009

Small Banks Online – An Example

Here’s an example of the online presence for a small credit union (bank). It’s so advanced it’s featured on thedailywtf.com. My guess is that maintaining a robust, secure online presence is difficult for small credit unions and banks. They might be as small as a single branch office and a few dozen employees. Outsourcing to service providers is pretty much the only option, and it is unlikely that they have the resources to perform a technical evaluation of their service providers. The service provider that this credit union (bank) uses seems to be used by many small credit unions, so there is no reason to name the specific credit union.

The initial login requires a captcha that they call a ‘Security Code’. I’m not sure what the purpose of the captcha might be, other than slowing down bots a bit.

Captcha 

They care enough about their clients to recommend a current browser.

Browser-2

Wait – isn’t one of those browsers dead? Let me check.

Netscape-EOL

That must be a mistake. Look around a bit. There is another link with browser requirements:

Browser-3

Which clarifies things just a bit.

Browser-Req

Mozilla 1.0, IE 5.0 or Netscape 6.2. The official recommendation is not one, not two, but THREE end of life browsers. Interesting. Perhaps they believe in security by obscurity? The good news is that unlike many popular sites, they don’t balk at using too new of a browser. IE 8 beta’s & the various daily builds of Minefield seem to work just fine.

Authentication is by user id, password and security questions.  They let customers create their own questions. This is a step up from forcing fixed questions. They still allow selection of pre-determined questions though.

Security-Questions

I prefer being allowed to create my own questions, but only because it annoys me less, not because I think that it adds significantly to the security of the system. I can’t imagine ordinary people creating challenging question/response pairs.

When you get logged in, you see:

Bank-UI

Classic frame based HTML, the kind of old fashioned goodness that rarely is seen today. I’d be worried if they let the code go stale. The threats from the Internet change so rapidly that code can go stale pretty quickly. The good news is that each year they update the copyright notice at the bottom of the page.

This is the 90’s 21st century, so we should be able to get statements electronically. The credit union (bank) outsources on line statements to a different third party provider, accessible from the credit unions site. But only if you have a decent, state of the art browser:

 Browser-req-Stmts

The statement provider raises the bar significantly, requiring any of IE 5.5, AOL 5, Netscape 7.

When you change your password, your new password is effective immediately. The password change function uses postal mail to notify the user of the change. The new password also gets mailed to you a couple days after you’ve changed it. It is not possible to change a home address online, so postal mail is reasonably effective out-of-band notification. There is no e-mail based notification of any on line transaction activity.

The user facing parts of the system appear to be minimally maintained and rarely upgraded. I have no way of knowing if the back end of the system is well designed and reasonably secure or not.

I really, really hate banking with large mega-banks. I did that once a couple of decades ago. It was such a bad experience that I’m loath to repeat it. For loans and major transactions, I much prefer dealing in person with a small bank or credit union, and if there ever is a problem with an account or a loan, having the ability to talk to real people in person is invaluable. Unfortunately, when using the small players, you probably are giving up a certain amount of online security.

Friday, February 6, 2009

Not all Data Loss is Security Related

Matt invited me to guest author a post on his Standalone Sysadmin blog. One of the topics that I've had in the To-Blog pile is to dump out some thoughts on system backups. Head over to Matt's blog and read them.

Data loss events that result in data that is deleted, destroyed or corrupted are the DBA's and Sysadmins nightmare. Compare the results of these events:
  • A firewall or IPS has a hardware or software failure and throws away a few packets of good data.
  • A router gets overloaded and tosses a few packets in the bit bucket.
  • A SAN fabric has a hardware or software failure an throws away a few frames of data.
The latter is going to be a far, far more serious problem. Databases and file systems are extremely intolerant of missing bits.

Here's an example:
The reason that we suffered data loss (about 2.5 days) is because the data transfer issues with the SAN switch caused data corruption in both the Oracle data files and the archive log files. We had tape backups of the data files and archive log files, but they were also corrupt. Unfortunately, we could only recover the database to the last point that we had clean archive log files.
A SAN fabric scrambled a few bits. Data files and archive logs got toasted. Redundancy didn't help (the redundancy build into the SAN stored the scrambled bits - redundantly). The backup system either backed up the corrupt data, or failed because the data was corrupt. In either case the data is not recoverable. That scenario happens more often that anyone will admit.

Imagine a network where a few missing bits on Tuesday causes a loss of all data transferred across the network any time from Tuesday through Thursday.

Or worse:
So far, my efforts to recover Ma.gnolia's data store have been unsuccessful. While I'm continuing to work at it, both from the data store and other sources on the web, I don't want to raise expectations about our prospects. While certainly unanticipated, I do take responsibility and apologize for this widespread loss of data.

As of this writing, the recovery method appears to involve searching Google for cached copies of your missing data.  That's a good trick to remember. Someday I might lose my SSN or banking credentials and need to recover them.

Networks were designed from the ground up to assuming there would be missing bits. And just to make sure that network applications are always aware that they need to be tolerant of network data loss, network engineers intentionally build low level data loss into their designs. We wouldn't want network users to have too high of expectations, would we? Smile_dude Seriously though, lost packets have been a part of networking since day one, and as a result, any network protocol or application that couldn't tolerate loss quit working the day it got deployed.

Storage isn't designed to tolerate missing bits (though Sun is trying to fix that with ZFS). We've learned that we need to be extremely paranoid about storage related errors and events. There can be no tolerance for frame, CRC, port or other errors on a SAN fabric. Unfortunately, SAN switches are often represented as simple, low maintenance devices. They are not.

A quote from the post I wrote for Matt:

One of the things I've done to drive home the importance of backups is to walk up to a sysadmins cube and ask them to delete their home directory. I'm the boss, I can do that. Trust me, its fun. Smile_wink If they hesitate, I know right away that they don't have confidence in their backups. That's bad – for them, for me, and for our customers.

That covers the simple case. The files on that server are backed up and recoverable. Database backup and recovery is much more complex. Failure to recover a single incremental backup (archive log, transaction log) prevents recovery of the database past the point in time of the failed incremental. If that happens, it will be ugly.

The DBA’s that I know don’t look out at the hackers on the Internet a think ‘they are out to get me…I’ve got to be prepared…’. They are too busy looking down at the controllers and disks and thinking ‘they are out to get me…I’ve got to be prepared…’.

I’ve personally been faced with critical data loss incidents a handful of times. In one case, a network card decided to occasionally flip bits in transmitted packets before the various check sums & such that keep packets intact. The result was a situation where a ‘1’ at the client would end up as a ‘!’ at the server  - and in the database, and other single bit anomalies. In another case, the cache in a high end raid controller was scrambling bits and corrupting volumes. With the cache enabled, the volumes would error & dismount. With the cache disabled, the server worked fine. The worst though, was a human initiated logical failure of a 270 million row, 1000 table OLTP database. When I got called a couple minutes after the failure, it was a zero table, zero row database. A point-in-time recovery to the minute prior to the incident brought us back to where we were, minus a few seconds of data.

In each case, the backups worked. 

With apologies to Ms Browning:

How do I love thee? Let me count the ways
How do I fail thee? Let me count the ways
I fail thee to the controllers and drivers and ports
My pain can reach, when feeling out of sight
For the ends of the fabric and LUN
I fail thee to the level of everyday’s
Most critical data….

Sunday, February 1, 2009

Swatting – New Use for Internet Phones

This one is new to me. Call 911 from an Internet phone, faking the caller ID for a random address on the other side of the country. Then pretend to be the victim of a killer on a rampage and have the local SWAT team dispatched to an innocent persons house.

…a new kind of telephone fraud that exploits a weakness in the way the 911 system handles calls from Internet-based phone services. The attacks — called "swatting" because armed police SWAT teams usually respond…

Sounds useful. You can annoy your neighbors by having the police bust down their doors with a battering ram -  right from the comfort of your local coffee shop. With the help of online maps, you can probably make it pretty realistic  - ‘he’s in the back yard behind the big tree….wait…he’s coming toward the back door…’.

…fake calls about a workplace shooting included realistic gunshot sounds and moaning in the background…

Beats the heck out of using spoofed caller ID to sent bogus pizza deliveries to your ex-girlfriends.