Tuesday, April 14, 2009

Anonymity is not Privacy

Let's pretend that I want to do something subversive like participate in a social network with persons who don't like the current form of government, or maybe I want to be really subversive and prevent advertisers from tracking my social network habits. In either case, preventing someone from de-anonymizing me is essential. The University of Texas paper linked here demonstrates that by using anonymous data from from multiple social networks it is possible to de-anonymize individuals and reliably identify them based on a comparison of the links between the persons in the social networks.

This is relevant in daily online life. If for example, my personal online presence could jeopardize my professional standing (a firearms advocate in a liberal University comes to mind), I would need to maintain separation of personal and professional online presence. Individuals who need to fear loss of freedom based on their personal social network affiliation would be another example. This research indicates that in order to maintain privacy in a social network, one would need to maintain complete separation between the questionable social network and any other social network.

The research is very interesting. If I need or desire to maintain separate networks, and I decide to use a network like Google for professional presence, then I can't allow Google or any Google related network to be associated with my personal presence. For personal presence I need to pick social networks that not only don't cooperate with Google, but actually fear or despise Google. (Microsoft, for example). This only works if I never share the same relationships across the different social networks. If I do, the digraphs intersect and identification becomes possible. Maintaining separation would mean that I would have no overlapping 'friends' across personal and professional networks. The separation also fails if the corporations that own my private and personal networks merge, get sold to another data empire, use the same advertising empires or succumb to the agents of rouge governments.

The research confirms what I've always suspected, that if I desire separation between personal and professional online presence, I am restricted my ability to participate in social networks. The restriction may be for no other reason than before I can participate, I have to decide if a particular network will be professional or personal. No network can be allowed to be both personal and professional, simply because someone, somewhere will be able to use the hybrid network to link the rest of them together. Once I decide to participate in a network I can't switch the network from personal to professional or vice versa. Even though it is theoretically possible to maintain dual identities (and dual news readers, browsers and cookie managers), it’s tough to do, and I suspect that maintaining that level of separation is doomed to failure.
Fortunately for me, my personal life is conducted largely in real time, face to face. Internet based social networks add very little value to my personal life, so I don't have any significant online personal network relationships. I figure that the worlds largest data vacuum can hoover up all my professional related online data without too much concern simply because as an employee of a public entity, my professional life open to the world anyway. But that decision prevents me from using really cool things like Goog411 or Google Voice for my personal life.

I would assume that the conclusions are generic across any network of nodes and links. Presumably e-mail address books are similar in that a person looking at the connections between people based on e-mail exchanges or address books could identify individuals using only anonymized data, and I suspect that anonymized RSS subscription data could be used in a similar manner.

The following are quotes from the paper De-anonymizing Social Networks by Arvind Narayanan and Vitaly Shmatikov, The University of Texas at Austin
The main lesson of this paper is that anonymity is not sufficient for privacy when dealing with social networks. We developed a generic re-identification algorithm and showed that it can successfully de-anonymize several thousand users in the anonymous graph of a popular microblogging service (Twitter), using a completely different social network (Flickr) as the source of auxiliary information…
…We demonstrated feasibility of successful re-identification based solely on the network topology and assuming that the target graph is completely anonymized. In reality, anonymized graphs are usually released with at least some attributes in their nodes and edges, making de-anonymization even easier…
…We do not believe that there exists a technical solution to the problem of anonymity in social networks. Specifically, we do not believe that any graph transformation can (a) satisfy a robust definition of privacy, (b) withstand de-anonymization attacks described in our paper, and (c) preserve the utility of the graph for common data-mining and advertising purposes. Therefore, we advocate non-technical solutions…
…First, the false dichotomy between personally identifiable and non-personally identifiable information should disappear from privacy policies, laws, etc. Any aspect of an individual's online personality can be used for de-anonymization, and this reality should be recognized by the relevant legislation and corporate privacy policies...
…Second, social-network operators should stop relying on anonymization as the "get out of jail" card insofar as user privacy is concerned. They should inform users when their information is disclosed to third parties, even if this information has been anonymized, and give them an opportunity to opt out…


