Friday, May 29, 2009

Availability & SLA’s – Simple Rules

From theDailyWtf, a story about availability & SLA’s that’s worth a read about an impossible availability/SLA conundrum. It’s a good lead in to a couple of my rules of thumb.

“If you add a nine to the availability requirement, you’ll add a zero to the price.”

In other words, to go from 99.9% to 99.99% (adding a nine to the availability requirement), you’ll increase the cost of the project by a factor of 10 (adding a zero to the cost).

There is a certain symmetry to this. Assume that it’ll cost 20,000 to build the system to support three nines, then:

99.9 = 20,000
99.99 = 200,000
99.999 = 2,000,000

The other rule of thumb that this brings up is

Each technology in the stack must be designed for one nine more than the overall system availability.

This one is simple in concept. If the whole system must have three nines, then each technology in the stack (DNS, WAN, firewalls, load balancers, switches, routers, servers, databases, storage, power, cooling, etc.) must be designed for four nines. Why? ‘cause your stack has about 10 technologies in a serial dependency chain, and each one of them contributes to the overall MTBF/MTTR. Of course you can over-design some layers of the stack and ‘reserve’ some outage time for other layers of the stack, but in the end, it all has to add up.

Obviously these are really, really, really rough estimates, but for a simple rule of thumb to use to get business units and IT work groups thinking about the cost and complexity of providing high availability, it’s close enough. When it comes time to sign the SLA, you will have to have real numbers.

Via The Networker Blog

More thoughts on availability, MTTR and MTBF:

NAC or Rootkit - How Would I know?

I show up for a meeting, flip open my netbook and start looking around for a wireless connection. The meeting host suggests an SSID. I attach to the network and get directed to a captive portal with an ‘I agree’ button. I press the  magic button an get a security warning dialogue.

NAC-Rootkit It looks like the network is NAC’d. You can’t tell that from the dialogue though. ‘Impluse Point LLC’ could be a NAC vendor or a malware vendor. How would I know? If I were running a rouge access point and wanted to install a root kit, what would it take to get people to run the installer?  Probably not much. We encourage users  to ignore security warnings.

Anyway – it was amusing. After I switched to my admin account and installed the ‘root kit’ service agent and switched back to my normal user, I got blocked anyway. I’m running Windows 7 RC without anti-virus. I guess NAC did what it was supposed to do. It kept my anti-virus free computer off the network.

I’d like someone to build a shim that fakes NAC into thinking I’ve got AV installed. That’d be useful.

Thursday, May 28, 2009

Consulting Fail, or How to Get Removed from my Address Book

Here’s some things that consultants do that annoy me.

Some consultants brag about who is backing their company or whom they claim as their customers. I’ve never figured that rich people are any smarter than poor people so I’m not impressed by consultants who brag about who is backing them or who founded their company. Recent ponzi and hedge fund implosions confirm my thinking. And it seems like the really smart people who invented technology 1.0 and made a billion are not reliably repeating their success with technology 2.0. It happens, but not predictably, so mentioning that [insert famous web 1.0 person here] founded or is backing your company is a waste of a slide IMHO.

I’m also not impressed by consultants who list [insert Fortune 500 here] as their clients. Perhaps [insert Fortune 500 here] has a world class IT operation and the consultant was instrumental in making them world class. Perhaps not. I have no way of knowing. It’s possible that some tiny corner of [insert Fortune 500 here] hired them to do [insert tiny project here] and they screwed it up, but that’s all they needed to brag about how they have [insert Fortune 500 here] as their customer and add another logo to their power point.

I’m really unimpressed when consultants tell me that they are the only ones who are competent enough to solve my problems or that I’m not competent enough to solve my own problems. One consulting house tried that on me years ago, claiming that firewalling fifty campuses was beyond the capability of ordinary mortals, and that If we did it ourselves, we’d botch it up. That got them a lifetime ban from my address book. They didn’t know that we had already ACL’d fifty campuses, and that inserting a firewall in line with a router was a trivial network problem, and that converting the router ACL’s to firewall rules was scriptable, and that I already written the script.

I’ve also had consultants ‘accidently’ show me ‘secret’ topologies for the security perimeters of [insert fortune 500 here] on their conference room white board. Either they are incompetent for disclosing customer information to a third party, or they drew up a bogus whiteboard to try to impress me. Either way I’m not impressed. Another lifetime ban.

Consultants who attempt to implement technology or projects or processes that the organization can’t support or maintain is another annoyance. I’ve see people come in and try to implement processes or technologies that although they might be what the book says or what every one else is doing, aren’t going to fit the organization, for whatever reason. If the organization can’t manage the project, application or technology after the consultant leaves, a perceptive consultant will steer the client towards a solution that is manageable and maintainable. In some cases, the consultant obtained the necessary perception only after significant effort on my part with the verbal equivalent of a blunt object.

Recent experiences with a SaaS vendor annoyed me pretty badly when they insisted on pointing out how great their whole suite of products integrate, even after I repeatedly and clearly told them I was only interested in one small product, and they were on site to tell me about that product, and nothing else. “I want to integrate your CMDB with MY existing management infrastructure, not YOUR whole suite. Next slide please. <dammit!>”. Then it went down hill. I asked them what protocols they use to integrate their product with other products in their suite. The reply: a VPN. Technically they weren’t consultants though. They were pre-sales.

That’s not to say that I’m anti consultant. I’ve seen many very competent consultants who have done an excellent job. At times I’ve been extremely impressed.

Obviously I’ve also been disappointed.

Wednesday, May 27, 2009

Your Application is a Rotting Old Shack, Now What?

In response to A Shack in the Woods, Crumbling at the Core, colleague Jim Graves commented:

“…it only works if application owners are like long-term homeowners, not house flippers.”

Shack-Jason_PrattGood point. Who cares if the shack gets a cheap paint job instead of a foundation and a comprehensive re-modeling? Will the business owner know or care? Do the contractors you hired care? Are you going to be around long enough to care? Are you and your employees, managers and consultants acting as house job flippers, painting over the flaws so you can update your resumes, take the profits and move on?

Jim asks:

“Are long-term employees more likely to care about problems that may happen five years from now? Are Highly Paid Consultants much less likely to?” 

Good question. Suppose that I want to fix the shack. Maybe I’m tired of having to empty the buckets that catch the drips from the roof (or restart the J2EE app that runs itself out of database connections a couple times a week). If this repair is to be anything other than a paint-over, at least one element in the business owner-->employee-->consultant-->contractor chain will have to care enough about the application to ensure that the remodeling is oriented toward long term structural repairs and not just a paint-over. The other elements in the chain need to concur.

For much of what I host and/or manage, I’m on the second or third remodeling cycle. I’ve seen the consultants parachute in, slap a coat of paint on the turd and walk off with a half decade of my salary. I don’t like it. This puts me squarely in the camp of putting the effort into fixing foundations instead of slapping on paint and shingles. I’ve seen apps that have been around for ten years and two refresh cycles, have had ten million dollars spent on them and still have mold, rot and leaks from a decade ago. But they have a shiny new skin and state of the art UI. For wireframes, UI models, usability studies and really nice Power Points? Spare no expense. Do they have a sane data model and even trivial application security? Not even close. Granite countertops are in scope, fixing the foundation is out of scope.

Things like that make me crabby.

For now I’ll assume that the blame lies with IT. Somewhere along the line the non-technical business owners have been led to believe that the shiny face and the UI is the application, and that the foundational elements (back end code, databases, servers, networks and security) are invisible, unimportant, or otherwise non-essential.

Resume Driven Design

Sam Buchanan, a long time colleague, commenting on a consultants design for a small web application:

“I'm telling you: this app reeks of resume-driven design”

In ‘Your Application is a Rotting old Shack’ I whined mused about applications that get face lifts while core problems get ignored. Let’s assume for a moment that business units finally figure out that their apps have a crumbling foundation and need structural overhauls. Assuming that internal resources don’t exist, how do we know that the consultants and contractors that we hire to design and build our systems aren’t more interested in building their resumes than our applications?

I’d like to think that I would be able to tell if a consultant tried to recommend an architecture or design that exists more to pad their resume than solve my problems. It’s probably not that straight forward though. Consultants have motivations that may intersect with your needs, or they may have motivations that significantly deviate from what you need, and if their motivations are resume driven, there is a chance that you’ll end up with a design that helps someone's resume more that it helps you.

Short term employees may share some of the same motivations. If they are using you to fill out their resume, you’d better have needs that line up with the holes in their resume. I’m pretty sure that ‘slogged through a decade old poorly written application, identified unused code and database objects’ or ‘documented and cleaned up an ad hoc, poorly organized, data model’ isn’t the first thing people want on their resume.

They probably want something shiny.

Tuesday, May 26, 2009

A Mime in a Box

Picking up on a thread by Andy the IT Guy, Which of these things is not like the other?

  1. 3087641982_3edc49c7c9_mA developer who doesn’t understand the databases, networks or firewalls.
  2. A system manager or DBA who doesn’t understand applications, networks and firewalls.
  3. A firewall or network administrator who doesn’t understand operating systems and applications.
  4. A mime in a box.

Trick question. They’re the same. The mime’s box is imaginary, as are the cross disciplinary restrictions that we place on developers, system and network administrators.

In the example from Andy’s post, the developer didn’t understand the difference between an app installed on a desktop and an app installed on a server. Similarly, non-network people often don’t understand the critical difference between source and destination when an app server connects to a database.

For example, I often see this diagram:

  app--database

showing an application updating a database, when from a network point of view, what we really need to see is:

 app-database

showing the application making a network connection to the database. But that subtle difference doesn’t mean much unless the person understands firewalls. They’ll need to understand them though, because I’m going to do this:

 app-database-firewall

If they don’t understand the difference between TCP and UCP, between Inbound and Outbound and between Source and Destination, that firewall is probably going to break things.

This problem seems to occur nearly universally.

Lets call this System Management Principle #5:

Each technology specialist must understand enough about the adjoining technologies to design and build systems that make maximum use of the those technologies. 

(If you’ve got a better way of phrasing that, let me know.)


I’ve got System Management Principle #6, and now, with Andy’s help, principle #5. Someday we’ll dream up the rest of them.
Photo by B. Tse, found here.

Monday, May 25, 2009

Home Server Energy Consumption

I'm moving toward 'less is more', where 'less' is measured in watts. Right now my entire home entertainment and technology stack uses about 150 watts total  (server + network + storage + Sun Rays + laptops + wall warts). I no longer use the stereo or television -  that stack is unplugged and consuming zero energy, and I don’t have any watt sucking game consoles. My next iteration of home entertainment & technology should use about 25 watts for all servers and storage and about 20 watts for each user end point (laptop). The server and network should be the only devices that run continuously. End points should suspend and resume quickly and reliably so that no more than one is normally running at a time, so the net of all server, network and user devices should be under 50 watts.

To get under my energy target, I’ve got to swap a 60 watt, 6 year old Sparc based SunBlade 150 with something that uses somewhere between 5 and 15 watts. Worst case energy-wise would be  netbook running Solaris, best case would be and ARM based micro server. A netbook running Solaris would use more power, but it would have more CPU and memory, ZFS, and a built in UPS. Storage would have to be USB powered notebook drives rather than 3.5” desktop drives. My total storage needs are under 250GB, so a pair of redundant USB powered notebook drives are adequate. (Transferring long term storage to DVD’s reduces its energy cost to zero, so as they fill up, I either delete or transfer.)
For user devices, low power laptops with generous use of power saving features should keep me near the 20 watt target. My XP and Vista computers don’t sleep or hibernate reliably so they tend to be running most of the time. Windows 7 and OSX sleep and hibernate reliably, so devices with those OS’s are set for maximum power savings even when plugged in. Switching the XP and Vista notebooks to W7 will allow me to use aggressive power saving settings and reduce their energy footprint.

Power measurements at my breaker box show about a 300 watt parasitic draw when there are no lights, refrigerators or other appliances running. I can account for 150 watts as computer related. Someday I’ll have to track down the other 150 watts.

As home lighting moves from 60 watts per device to 13 watts per device, so should home computing.
Inspired by this discussion.
…every geek seems to need to one-up those around them and somehow differentiate and prove their geekdom... this is done in one of two ways:
  1. More is More: These are the guys with a deep wallet that always have the fastest processors, biggest screens, flashy furniture, etc.
  2. Less is More: The geek who does the most with the least (and generally brags about it). The more obscure your setup the better.

Sunday, May 24, 2009

Expecting Stewardship Without Understanding

What are the consequences of building a society where we rely on technology that we don’t understand? Is lack of stewardship one of those consequences?

From Wayne Porter:
Most people no longer understand anything about the technology they use everyday and because of this ignorance many people use it without good stewardship. We drive cars we cannot fix, eat food we cannot make or produce, and many operate in an environment they do not understand with a false sense of security. We run and gun this technology with fuel that has probably reached its peak point.
Can we expect people who don’t understand a technology to be good stewards of the technology?
Should we expect application developers, who largely don’t understand relational databases, database security, firewalls or networks, to write applications that rationally utilize or properly protect those resources? Should we expect ordinary computer users, who  understand almost nothing of how their computers work, to operate their computers in a manner that protects them and us from themselves and the Internet?

For some technologies (automobiles for example) we’ve almost completely given up on users understanding the technology well enough to make rational decisions and exhibit good stewardship. Drivers will never understand tire contact patches and slip angles, so we give them speed limits, ABS brakes, stability control, crumple zones and air bags. Drivers don’t understand engines and engine maintenance, so we give them idiot lights and dashboard messages. Drivers don’t understand the consequences of fossil fuel consumption, so we legislate minimum mileage and emission standards. We force drivers to be good stewards whether they like it or not.

Home owners don’t understand strength of beams, dynamic wind loads and electricity's propensity to escape to the ground via the path of least resistance, so we have building codes and permits, building inspectors, fire inspectors, licensed contractors and tradesmen to force homeowners into reasonable stewardship of their property.

On the other hand, most computer users don’t have even a basic understanding of how their computer works, yet we give them administrative access, allow them to install random software from the Internet, and then somehow expect them to keep their computer secure and functional. We expect them to be good stewards of the technology and not allow their home computers to be malware infested botnet nodes without them having even a vague understanding of how their computer works.

That’s probably not going to work.

Wednesday, May 20, 2009

The Irony Of Preventing Security Failures

Gadi Evron muses about the possibility that a successful security program might result in result in difficulty justifying future spending.

The Irony Of Preventing Security Failures, Gadi Evron, Dark Reading

But what if nothing happens because we stopped it? That may be the most dangerous option in the long term […] The obvious risk is that the security industry will be accused of crying wolf and not believed next time when something serious happens.

Roll back to 2001 and the hype surrounding Code Red. The lead story on major news outlets was the impending implosion of the Internet. The Internet didn’t implode. The hype went away. Slammer circa 2003 snuck up on the world, wreaked havoc, major corporate networks imploded, the internet hiccupped for a few hours. I’d like to think that Code Red was pretty good at culling out the incompetent sysadmins and raising the awareness of patching and hardening amongst the competent but clueless, and that Slammer was pretty good at culling out the incompetent IT departments and raising the awareness of the clueless CIO’s and executives.

Do we need to fear our own success?

Here’s a proposal. Simply allow your peers (or competitors) to continue to fail at security, and use their failure to justify continuing to spend money on your own success. You shouldn’t have too much trouble finding peer failures to use as your benchmarks. I’m pretty sure that the average executive can observe the impact of security events on peers and competitors compared to the lack of similar internal events and associate the difference with the level of competence and funding of the internal IT. If corporate executives can’t figure that out on their own, I’ll bet we can come up with a couple of power points with impressive looking but indecipherable charts to help them out.

Monday, May 18, 2009

Secret Questions are not a Secret

Technology Review took a look at an advance copy of a study that validates what Ms. Palin already knew. Secret questions don’t help much: 
In research to be presented at the IEEE Symposium on Security and Privacy this week […] the researchers found that 28 percent of the people who knew and were trusted by the study's participants could guess the correct answers to the participant's secret questions. Even people not trusted by the participant still had a 17 percent chance of guessing the correct answer to a secret question. 
This is a fundamental and well known problem. Putting real numbers on it should help those who are in  the design meeting where secret questions get brought up.

To re-hash the secret question problem, either I answer the questions correctly and risk a 1 in 5 chance that a stranger will guess them, or I fabricate unique, nonsensical answers. If the fabricated answers are such that they can’t be reasonably guessed, then there isn’t much chance that I’ll remember what I answered, so I’m stuck writing them down somewhere and tracking them for a decade or so.

Obviously there are solutions that I can implement myself, like using a password safe of some type to store the made up questions and answers. But what about the vast majority of ordinary users? How many of them are going to set up a password safe, figure out how to keep it up to date, replicate it to a safe location and not lock themselves out? Not many. They’ll have no choice but to write everything down.

I can image trying to explain to non-technical users that they need to have made-up answers to made-up questions, and that the made up questions and answers must be unique for each on line account, and the questions and answers need to be atypical enough that someone close to them can’t guess the answers even if they know the questions, and that instead of writing the questions and answers down, they need to store the made up questions and answers in a magic piece of software called a ‘password safe’, and they need to put a really strong password that they’ll remember and nobody else will guess on the password safe, and that they can’t write that down either, and they need to replicate the password safe data file to some other media, and if they forget the password to the password safe or lose the password safe data file, they’ll lose access to just about everything.

“Hey ma – here’s what I need you to do…download something called password safe…”

“I already have a safe….”

There’s got to be a better way.

Monday, May 4, 2009

Hijacking a Botnet – What Can We Learn?

The Computer Security Group at UC Santa Barbara hijacked a botnet long enough to grab interesting data. But if are keeping up with security news, you already know that.

Among the findings:

  • Passwords tend to be weak (not new).
  • Passwords tend to get reused across multiple web sites (not new).
  • Botnet sizes may be overestimated (interesting).

Other interesting (to me) bits:

The whole ‘make money working from home’ thing has a new twist:

Of particular interest is the case of a single victim from whom 30 credit card numbers were extracted. Upon manual examination, we discovered that the victim was an agent for an at-home, distributed call center. It seems that the card numbers were those of customers of the company that the agent was working for, and they were being entered into the call center’s central database for order processing.

I’m pretty sure that some of the $270/hr Tier 3 vendor support engineers that we’ve had on support calls were at home when they got paged. I could hear kids and dogs in the background.

I was very interested when bad guys started targeting phishing attempts for a local credit union to employees of the organizations that were affiliated with the credit union. But this opens up a new form of precision targeting:

For instance, armed with information provided by social networking sites, an attacker may find pictures, personal interests, and other contact information that could be used to construct personalized phishing and spam campaigns or to blackmail victims.

And for those who wish to keep personal and professional identities separated:

For example, Torpig records a user logging into his LinkedIn account. His profile presents him as the CEO of a tech company with a large number of professional contacts. Torpig also records the same user logging into three sexually explicit web sites.

So much for that plan.

Public machines? Don’t even try:

This analysis turned up some interesting results, including a machine responsible for over 80 university webmail logins, another that sent close to 60 distinct credentials to a university health care web site, and one providing at least 25 agent logins at what seems to be a travel agency.

The MBR is hooked, so the odds of you knowing that the public machine is infected are pretty slim.

The botnet authors apparently didn’t want to get left out of the whole Twitter revolution fad:

…the new algorithm also relies on search trends from Twitter to generate one additional seed byte. […] This letter is then used to calculate a "magic number", which is used to compute the domain name…

Yep – grab a page from Twitter and use it to seed the next domain name in C&C algorithm. Amusing.

We have created a monster, and it is us.

Your Application is a Rotting Old Shack, Now What?

In response to A Shack in the Woods, Crumbling at the Core, colleague Jim Graves commented:
“…it only works if application owners are like long-term homeowners, not house flippers.”
Shack-Jason_PrattGood point. Who cares if the shack gets a cheap paint job instead of a foundation and a comprehensive re-modeling? Will the business owner know or care? Do the contractors you hired care? Are you going to be around long enough to care? Are you and your employees, managers and consultants acting as house job flippers, painting over the flaws so you can update your resumes, take the profits and move on?

Jim asks:
“Are long-term employees more likely to care about problems that may happen five years from now? Are Highly Paid Consultants much less likely to?” 
Good question. Suppose that I want to fix the shack. Maybe I’m tired of having to empty the buckets that catch the drips from the roof (or restart the J2EE app that runs itself out of database connections a couple times a week). If this repair is to be anything other than a paint-over, at least one element in the business owner-->employee-->consultant-->contractor chain will have to care enough about the application to ensure that the remodeling is oriented toward long term structural repairs and not just a paint-over. The other elements in the chain need to concur.

For much of what I host and/or manage, I’m on the second or third remodeling cycle. I’ve seen the consultants parachute in, slap a coat of paint on the turd and walk off with a half decade of my salary. I don’t like it. This puts me squarely in the camp of putting the effort into fixing foundations instead of slapping on paint and shingles. I’ve seen apps that have been around for ten years and two refresh cycles, have had ten million dollars spent on them and still have mold, rot and leaks from a decade ago. But they have a shiny new skin and state of the art UI. For wireframes, UI models, usability studies and really nice Power Points? Spare no expense. Do they have a sane data model and even trivial application security? Not even close. Granite countertops are in scope, fixing the foundation is out of scope.

Things like that make me crabby.

For now I’ll assume that the blame lies with IT. Somewhere along the line the non-technical business owners have been led to believe that the shiny face and the UI is the application, and that the foundational elements (back end code, databases, servers, networks and security) are invisible, unimportant, or otherwise non-essential.

Friday, May 1, 2009

One, Two, Buckle my Shoes. How many Laptops Can We Lose.

Last summer Dell commissioned a study[1] to determine how many laptops were lost/stolen at airports. The study reported 12000 lost laptops per week at US airports. The study was reported as fact just about everywhere, including a whole bunch of high profile security related blogs.  I did some quick mental math & thought 'where do they store them all? There must be a heck of a big pile of them somewhere...'. So I bookmarked the study and thought that it'd make a good data loss related blog post someday.

In the mean time, I ran across a few other related articles, including this one[2] from the New York Times, published in 2002:

At Seattle-Tacoma International Airport, 330 laptops were left behind between September and April, up sharply from only 7 in the comparable period a year earlier...in the last three months, the airport collected 204 misplaced laptops. In Denver, airport officials resorted to posting signs at security checkpoints saying, ''Got laptop?'' after 95 computers were left in February alone...
...The efforts to find laptop owners have largely paid off. In Denver, for instance, all but 20 of the 600 or so laptops left behind since September have been reunited with their owners...
...At other major airports, including those in Boston, Chicago and New York, officials say the problem of misplaced laptops has barely registered...

That obviously doesn't jive with the Dell study. If you figure a large airports were finding at most a few laptops per day in the months just after 9/11, either the study is way off, or the laptop loss problem has gotten an order of magnitude worse since then.
Then I ran across this article[3] from Computerworld, published shortly after the study first hit the news. They called around a little bit, asked a few simple questions, and concluded that the study is suspect. Several real media sites[4] took the time to do the math and question the numbers (damn them journalists, always checking facts and spoiling the fun).  Some blogs also questioned the numbers[5]. One security blog modified the original post and indicated that this was not a story[6]. Many high profile[7] sites still have the original articles posted without modification.

The lesson Dell wanted us to hear? Buy our security software.
The lesson we all heard? There is an epidemic of lost laptops.

Me? I’m still trying to figure out where they pile all those lost laptops (unless, of course, the study is bogus).


[1]  Airport Insecurity: The Case of Missing & Lost Laptops, Dell
[2]  At Airport X-Ray Machines, a Mountain of Forgotten Laptops, Jeffery Selingo, New York Times
[3]  Data doesn't add up on study of missing laptops at U.S. airports Agam Shah, Computerworld
[4]  Who really believes that fliers lose 12,000 laptops a week?Sean O'Neill, Newsweek
[5]  The Airport Notebook Revolving Door, Robert Richardson, CSI
[6]  Hundreds of Thousands of Laptops Lost at U.S. Airports Annually, Bruce Schneier.
[7]U.S. Travelers Lose 12,000 Laptops Every Week Elaine Chow, Gizmodo