Wednesday, September 15, 2010

DNS RPZ - I like the idea

An opt-in real time black hole list for untrustworthy domain names?


Some thoughts:

I certainly don't think that offering the capability is a bad thing. Nobody is forced to use it.

Individual operators can decide what capability to enable and which blacklists to enable. ISP's could offer their customers resolvers with reputation filters and resolvers without.  ISP's can offer blacklisted/greylisted resolvers for their 'family safe' offerings. Corporations/enterprises can decide for themselves what they blacklist.

A reputation based white list would be interesting. Reputation could be determined by the registrar, perhaps based on the registrar having a valid, verified street address, phone and e-mail for the domain owner. A domain that has the above and has been registered for a month or so could be part of a white list. A domain that hasn't met the above could be gray listed. Operators could direct those to an internal 'caution' web page.

A downside:

Fast flux DNS based botnets are a significant issue, but I don't think that a black list of known-bad domains will solve the problem. If a malware domain is created as a part of a fast flux botnet, a black list will never be able to keep up. It could still be useful though. Some malware is hosted on static domains.


A domain squatters blacklist. I'd love to be able to redirect address bar typos to an internal target rather than the confusing, misleading web pages that squatters use to misdirect users. I don't care if domain squatters business model is disrupted. They are speculators. They should expect to have their business models disrupted once in a while.

Are we creating more vulnerabilities than we are fixing?

Looking at ZDNet's Zero Day blog:

Sept 15th: Apple QuickTime flaws puts Windows users at risk
Sept 14th: Stuxnet attackers used 4 Windows zero-day exploits
Sept 13th: Adobe Flash Player zero-day under attack
Sept 10th: Primitive 'Here you have' e-mail worm spreading fast
Sept 9th: Patch Tuesday heads-up: 9 bulletins, 13 Windows vulnerabilities
Sept 9th: Security flaws haunt Cisco Wireless LAN Controller
Sept 9th: Apple patches FaceTime redirect security hole in iPhone
Sept 8th: New Adobe PDF zero-day under attack
Sept 8th: Mozilla patches DLL load hijacking vulnerability
Sept 8th: Apple plugs drive-by download flaws in Safari browser
Sept 2nd: Google Chrome celebrates 2nd birthday with security patches
Sept 2nd: Apple patches 13 iTunes security holes
Sept 1st: RealPlayer haunted by 'critical' security holes
Aug 24th: Critical security holes in Adobe Shockwave
Aug 24th: Apple patches 13 Mac OS X vulnerabilities
Aug 20th: Google pays $10,000 to fix 10 high-risk Chrome flaws
Aug 19th: Adobe ships critical PDF Reader patch
Aug 19th: HD Moore: Critical bug in 40 different Windows apps
Aug 13th: Critical Apple QuickTime flaw dings Windows OS
Aug 12th: Opera closes 'high severity' security hole
Aug 12th: Security flaws haunt NTLMv1-2 challenge-response protocolAug 11th: Adobe warns of critical Flash Player flaws
Aug 10th: Microsoft drops record 14 bulletins in largest-ever Patch Tuesday

I'm thinking there's a problem here.

Of course Zero Day only covers widely used software and operating systems - the tip of the iceberg.

Looking at Secunia's list for today, 09/15/2010:

Linux Kernel Privilege Escalation Vulnerabilities
e-press ONE Insecure Library Loading Vulnerability
MP3 Workstation PLS Parsing Buffer Overflow Vulnerability
IBM Lotus Sametime Connect Webcontainer Unspecified Vulnerability
Python asyncore Module "accept()" Denial of Service Vulnerability
AXIGEN Mail Server Two Vulnerabilities
3Com OfficeConnect Gigabit VPN Firewall Unspecified Cross-Site Scripting
Fedora update for webkitgtk
XSE Shopping Cart "id" and "type" Cross-Site Scripting Vulnerabilities
Linux Kernel Memory Leak Weaknesses
Slackware update for sudo
Slackware update for samba
Fedora update for samba
Red Hat update for samba
Red Hat update for samba3x
Google Chrome Multiple Vulnerabilities

Serious question:

Are we creating new vulnerabilities faster than we are fixing old ones?

I'd really like to know.

In some ways this looks like the early immature periods of other revolutionary industries.

We built cars. The early ones were modern wonders that revolutionized transportation and a wide swath of society. After a few decades we figured out that they also were pollution spewing modern wonder death traps. Auto manufactures sold their pollution spewing modern wonder death traps to customers who stood in line to buy them. Manufacturers claimed that there was nothing wrong with there products, that building clean autos with anything resembling safety was impossible, and that safe clean autos would cost so much that nobody could afford them. The customers were oblivious to the obvious. They piled their families into their death traps and drove them 85mph across South Dakota without seat belts (well - my dad did anyway - and he wasn't the fastest one out there, and I'm pretty sure I and my siblings weren't the only kids riding in the back of a station wagon with the tailgate window wide open...).

Some people described it as carnage. Others thought that autos were Unsafe at Any Speed.

Then came the safety & pollution lobbies. It took a few decades, a few hundreds million in lobbyists, lawyers and lawsuits, and many more billions in R&D, but we now have autos that are fast, economical, safe and clean.  A byproduct - completely unintended - was that autos became very low maintenance and very, very reliable. Maintenance windows went from hundreds of miles between shop visits to thousands of miles between shop visits (for oil changes) and tens of thousands of miles per shop visit (for everything but oil).

We need another Ralph Nader.  I don't want to wait a couple decades for the software industry to get its act together.

I'll be too old to enjoy it.

Monday, September 6, 2010

Thoughts on Application Logging

As a follow on to:

I have a few semi-random thoughts on application logging.

Things I like to see in logs are:

Machine parseable (yet human readable) format. I need to be able to write a regex that cleanly separates interesting messages and pipe them into sed/awk and extract critical fields from the messages. I typically use sed/awk/perl to strip out uninteresting parts of the message and sort/count pipe-to-Excel the rest of the fields. I also use logsurfer to catch real time events and alert interested parties. Even organizations with sophisticated tools still need to be able to parse the logs. Bonus points if all messages of a particular type have the same number of fields - or if variable word fields are at the end of the message.

Single line events. No XML. I'm not going to write a custom multi-line XML parser for every random app. Not a chance.

Date/Time stamp on every message. Really. I need to know, at least to the nearest second, when every line of a log was generated. Establishing time lines is critical to troubleshooting and forensics. Synchronized clocks and stamped messages are what makes time lines possible. I like it when the source system timestamps the log and the log catcher also timestamps the log. Then we know if a clock is off.

Rational message prioritization. I need to be able to reliably detect critical messages and do something with them. The ability to filter on some sort of priority is key. A simple regular expression should be able to extract interesting messages.

Unique identifiers for log message types. Cisco's firewalls generate syslogs with an identifier that uniquely identifies the log message type. No message are emitted from a firewall that are not uniquely identified and documented. (Don't open that link on a mobile connection - you'll use up your quota...). With PIX/ASA logs, I can write a regex to catch %ASA-4-411004, find all cases where network interfaces were administratively enabled, match them up with access logs & figure out who enabled the interface. Then I can check change management and figure out if it was part of a plan or part of a rootkit.

Sufficient information to link the logs from the application to upstream or downstream systems. I'd think that some form of user session identifier logging would be useful on most apps. User IP address is a candidate for an upstream identifier. Web, firewall and netflow logs can be correlated by IP & date/time. Web app session ID or similar perhaps would link the application logs together, and if stored along with a userid could link application activity to database and web server activity. Once the problem is narrowed down to a specific user, being able to track that users session would be very useful.

I once needed to link application activity to an IP address. I couldn't do it directly as the app didn’t log IP addresses. The only thing that saved me was that the particular activity was only possible through a given URL, and that URL had only been called from one IP address in the window surrounding the event. Had there been more than one instance of that URL in that time window, we would not have been able to correlate the event with an IP address. I sometimes am able to correlate application activity to an IP address (application, firewall & load balancer logs), the IP address to a MAC address (DHCP logs), the MAC address to a userid (AD domain logs), the MAC address to a physical computer (switch CAM tables), the physical computer to a person (security cameras).

This sort of information occasionally gets used by law enforcement as a part of various investigations, so the preference is to be able to link various log sources together unambiguously.

Logging of failures is as important as logging of successes. Failed privilege escalation attempts (for example) are always interesting. Any failure of that type is either a broken/misconfigured app, a hack attempt, or a sysadmin is ef’ing around on a production system. Firewall denies for packets source from inside the firewall are similar. If a server inside a data center is attempting to connect to a blocked port/IP on another system or subnet, something is wrong. Either the server/application is configured wrong, or the firewall is configured wrong, or the rootkit is making a reconnoiter pass through the data center (or sysadmins are ef’ing around on production systems).

Timeliness. Logs must leave the system in near real time. Logs that have been on a compromised system for more than a few minutes (or seconds?) after the compromise are presumed tainted.

Bonus points: Serialized numbering on log messages. That way I know if I'm missing any messages. If I am missing messages, either something is broke or the rootkit deleted some of my messages. Heck - I could even write a log catcher that logged something like 'Expecting message #86679514, received message #86679517, 3 messages missing'. (I have a Netflow collector that does that).

I’ve often  run into DBA’s and application folks who are afraid to generate large volumes of logs. I’m not afraid to generate large volumes of logs, provided that the logs are light weight, clear, concise and readable.  We currently log tens of thousands of firewall, DNS & netflow records each second. If they show up as light weight syslog-like events, they are not hard to handle. On our largest application the vendor has a separate database just for user activity logging. With a reasonable purge strategy in place, it’s still larger than the production database. That’s OK though. It’s good data, and it’s not a difficult database to maintain.

Update: Gunnar has a series of posts on application logging, A quote from the first post:

By climbing the stack and monitoring the application, you collect data located closer to the core enterprise assets like transactions, business logic, rules, and policies. This proximity to valuable assets make the application an ideal place to see and report on what is happening at the level of user and system behavior, which can (and does) establish patterns of good and bad behavior that can provide additional indications of attacks.

ZFS and NFSv4 ACL’s

I've been doing granular file access control lists since Netware 2.0. I'm used to being able to specify (for example) permissions such that a file can be modified, but not renamed or deleted, or setting permissions on a file so that it can be executed, but not read - (Yes, Netware could do that). And of course, it's obvious that more than one user or group permission should be allowable. I'm also used to having some control over inheritance, so that I can 'kneecap' permissions on a nested directory.

Obviously I've been very unimpressed with Unix's trivial rwxr-x--- style permissions. Sun band-aided the decades old rwxr-x--- up with POSIX getfacl and setfacl. That was a start. We now have NFSv4 style ACL’s on ZFS. It looks like they are almost usable.

For an experiment, I decided to clean up a few 'home directories' where the existing permissions are a mess of randomness left over from a decade of ufsdump/ufsrestore, POSIX ACL's, tar, cpio, pax, samba, rsync and who knows what else. Here's my attempt at simple ACL's on an OpenSolaris ZFS volume.

Specific requirements:

  • Owner gets the equivalent of 'full control'.
  • Group gets the equivalent of 'read only'.
  • Everyone gets nada.
  • Newly created files get predictable permissions

To ensure predictable permissions, I want inheritance in some form or another such that:

  • New files are automatically created to allow owner the equivalent of read, write, create, delete, modify, including ACL's and attributes, but without the ‘execute’ bit set.
  • New files are automatically created to allow group 'read-only' but without the ‘execute’ bit set.
  • New directories are automatically created to allow the owner the equivalent of read, write, create, delete, modify, browse, including ACL's and attributes.
  • New directories are automatically created as group read and browse.
  • New files and directories are automatically created with no permissions for ‘everyone’

Keep in mind that the newest ACL implementation needs the Solaris version of ls, chmod, etc., rather than the default gnu versions that ship with OpenSolaris. Also – I’m using Solaris ‘CIFS’, not samba.

First I set:

zfs set aclinherit=passthrough-x  filesystem

passthrough-x appears to mean 'only inherit the 'execute' bit if the application specifically requests the bit when the file is created'. At least that's what it appears to mean.

Then I fixed existing files. Note that I wanted to touch only the files (not the directories), hence the 'find'.

find . -type f  -exec /usr/bin/chmod A=\
everyone@:full_set::deny {} \;


find . -type f  \
-exec /usr/bin/chmod A=\
<= The 'A=' resets all ACL's rather than adding more ACL's
owner@:rw-pdDaARWc--s::allow,\ <= Set file owner to 'full control' minus the execute bit.
group@:r-----a-R-c---::allow,\ <= Set group to 'read'.
everyone@:full_set::deny {} \;  <= Set everyone else to 'deny all'.

This has a side effect of removing the execute bit from executable files. My standard policy is 'no executable files in home directories'. Those smart enough to know what the 'x' bit is are smart enough to know how to fix what just  broke. I wouldn’t do this in directories full of executable files.

Lastly, I tweaked the directories. Setting inheritance ensures that new files and directories have the desired ACL's:

find . -type d -exec /usr/bin/chmod  A=\
everyone@:full_set:fd:deny {} \;


find . -type d \
-exec /usr/bin/chmod  A=\
<= The 'A=' resets all ACL's rather than adding more ACL's
owner@:full_set:d:allow,\ <= Set directory owner to 'full control' with inheritance for newly created directories, including the execute bit.
owner@:rw-pdDaARWc--s:f:allow,\ <= Set directory owner to 'full control' with inheritance for newly created files, excluding the execute bit.
group@:r-x---aAR-c---:d:allow,\ <= Set group  to 'rx-' with inheritance for newly created directories
group@:r-----a-R-c---:f:allow,\ <= Set group to 'r' with inheritance for newly created files
everyone@:full_set:fd:deny {} \; <= Kneecap everyone else

In theory, new files will be created with the equivalent of rw-r-----, new directories will be created equivalent to rwxr-x---.


Helpful docs: