Skip to main content

Error Handling – an Anecdote

A long time ago, shortly after the University I was attending migrated students off of punch cards, I had an assignment to write a batch based hotel room reservation program. We were on top of the world - we had dumb terminals instead of punch cards. The 9600 baud terminals were reserved for professors, but if you got lucky, [WooHoo!] you could get one of the 4800 baud terminals instead of a 2400 or 1200 baud DECwriters.

The instructors mantra - I'll never forget - is that students need to learn how to write programs that gracefully handle errors. 'You don't want an operator calling at 2am telling you your program failed. That sucks.' He was a part time instructor and full time programmer who got tired of getting woke up, and he figured that we needed our sleep, so he made robustness part of his grading criteria.

Here's how he made that stick in my mind for 30 years:  When the assignment was handed to us, the instructor gave us the location of sample input data files to use to test our programs. The files were usually laced with data errors. Things like short records, missing fields and random ASCII characters in integer fields were routine, and we got graded on our error handling, so students quickly learned to program with a healthy bit of paranoia and lots of error checking.

That was a great idea and we learned fast. But here's how he caught us all: A few hours before the assignment was due, the instructor gave us a new input file that we had to process with our programs, the results of which would determine our grade.

What was in the final data file?

……[insert drum roll  here]……

Nothing. It was a zero byte file.

Try to picture this - the data wasn’t available until a couple hours before the deadline, it was a frantic dash to get a terminal (long lines of students on most days, especially at the end of the semester), edit the source file to gracefully handle the error and exit (think ‘edlin’ or ‘ed’ ), submit it into the batch queue for the compiler (sometimes that queue was backed up for an hour or more) and re-run it against the broken data file, all by the deadline.

How many students caught that error the first time? Not many, certainly not me. My program crashed and I did the frantic thing. The rest of the semester? We all had so dammed many paranoid if-thens in our code you'd probably laugh if you saw it.

He was teaching us to think about building robust programs - to code for what goes wrong, not just what goes right. For him this was an availability problem, not a security problem. But what he taught is relevant today, except the bad guys are feeding your programs the data, not your instructor. That makes it a security problem.

I can't remember the operating system or platform (PDP-something?), I can't remember the language (Pascal, I think, but we learned SNOBOL and FORTH in that class too, so it could have been one of those), but I'll never forget that !@$%^# zero byte file!

Comments

  1. Oops, I forgot the zero length file test... http://code.google.com/p/timetag/issues/detail?id=25

    Thanks for the reminder.

    ReplyDelete

Post a Comment

Popular posts from this blog

Cargo Cult System Administration

“imitate the superficial exterior of a process or system without having any understanding of the underlying substance” --Wikipedia During and after WWII, some native south pacific islanders erroneously associated the presence of war related technology with the delivery of highly desirable cargo. When the war ended and the cargo stopped showing up, they built crude facsimiles of runways, control towers, and airplanes in the belief that the presence of war technology caused the delivery of desirable cargo. From our point of view, it looks pretty amusing to see people build fake airplanes, runways and control towers  and wait for cargo to fall from the sky.The question is, how amusing are we?We have cargo cult science[1], cargo cult management[2], cargo cult programming[3], how about cargo cult system management?Here’s some common system administration failures that might be ‘cargo cult’:Failing to understand the difference between necessary and sufficient. A daily backup is necessary, b…

Ad-Hoc Verses Structured System Management

Structured system management is a concept that covers the fundamentals of building, securing, deploying, monitoring, logging, alerting, and documenting networks, servers and applications. Structured system management implies that you have those fundamentals in place, you execute them consistently, and you know all cases where you are inconsistent. The converse of structured system management is what I call ad hoc system management, where every system has it own plan, undocumented and inconsistent, and you don't know how inconsistent they are, because you've never looked.

In previous posts (here and here) I implied that structured system management was an integral part of improving system availability. Having inherited several platforms that had, at best, ad hoc system management, and having moved the platforms to something resembling structured system management, I've concluded that implementing basic structure around system management will be the best and fastest path to …

The Cloud – Provider Failure Modes

In The Cloud - Outsourcing Moved up the Stack[1] I compared the outsourcing that we do routinely (wide area networks) with the outsourcing of the higher layers of the application stack (processor, memory, storage). Conceptually they are similar:
In both cases you’ve entrusted your bits to someone else, you’ve shared physical and logical resources with others, you’ve disassociated physical devices (circuits or servers) from logical devices (virtual circuits, virtual severs), and in exchange for what is hopefully better, faster, cheaper service, you give up visibility, manageability and control to a provider. There are differences though. In the case of networking, your cloud provider is only entrusted with your bits for the time it takes for those bits to cross the providers network, and the loss of a few bits is not catastrophic. For providers of higher layer services, the bits are entrusted to the provider for the life of the bits, and the loss of a few bits is a major problem. The…