Anonymity and the Demise of the Ephemeral

The New York Times email update had the right headline “Upending Anonymity, These Days the Web Unmasks Everyone” but made a common mistake in the blurb: “Pervasive social media services, cheap cellphone cameras, free photo and video Web hosts have made privacy all but a thing of the past.”

It’s going to be important in our policy conversations in coming months and years to get a handle on the difference between privacy and anonymity (and others such as confidentiality) and how we think about rights to, and expectations of, each.

There’s a long continuum of social information generation/acquisition/transmission along which these various phenomena can be located:

  • artifactual “evidence” can suggest that someone did something (an outburst on a bus, a car  broken into, a work of art created)
  • meta-evidence provides identity trace information about the person who did something (a fingerprint, a CCTV picture, DNA, an IP address, brush strokes)
  • trace evidence can be tied to an identity (fingerprints on file, for example)
  • data links can suggest other information about a person so identified

Technology is making each of these easier, faster, cheaper and more plentiful.  From the point of view of the question, whodunnit?, we seem to be getting collectively more intelligent: we can zero in on the authorship of action more than ever before.  But that really hasn’t much to do with “privacy,” per se.

As Dave Morgan suggests in OnlineSpin (his hook was Facebook’s facial recognition technology that allows faces in new photos to be automatically tagged based on previously tagged photos a user has posted) the capacity to connect the dots is a bit like recognizing a famous person on the street, and this, he notes, has nothing to do with privacy.

What it does point to is that an informational characteristic of public space is shifting.  One piece of this is the loss of ephemerality, a sharp increase in the half-life of tangible traces.  Another is, for want of a better term on this very hot morning in Palo Alto, “linkability”; once one piece of information is linked to another, it can easily be linked again.  And this compounds the loss of ephemerality that arises from physical recording alone.

From the point of view of the question asked above, the change can mean “no place to hide,” but from the point of view of the answer, it might mean that the path to publicity is well-paved and short.

Some celebrate on both counts as a sort of modernist “the truth will out” or post-modernist Warholesque triumph.  But as pleased as we might be at the capacity of the net to ferret out the real story (the recent unmasking of “Gay Girl in Damascus” yet another example), the same structure can have the opposite effect.  The web also has immense capacity for the proliferation and petrification of falsehood (see, for example, Fine and Ellis 2010 or Sunstein 2009).

Thus, it may well be that the jury is still out on the net effect on the information order.

See also :“No Such Thing as Evanescent Data”

Data Exhaust and Informational Efficiency

Heard an interesting talk by Paul Kedrosky a few weeks ago at PARC titled Data Exhaust, Ladders, and Search.

The gist of the talk is that human behaviors of all kinds leave traces which constitute latent datasets about that activity. Social scientists have long had a name for gathering this type of data: unobtrusive observation. Perhaps the most famous example is looking at carpet wear in a museum as a way of figuring out which exhibits captured the most visitor attention or garbology and related “trace measures used by anthropologist W. Rathje in the 70s and 80s.

One of Kedrosky’s nicer examples was comparing aerial view of Wimbledon center court at the end of a recent tournament with one from the 1970s. The total disappearance of the net game from professional tennis was clearly visible in the wear patterns on the grass court.

In addition to a number of neat examples (ladders found on highways as indicator of housing bubble was a favorite) of using various techniques to capture “data exhaust” (indeed, he suggests, it’s the entire principle behind google), he asks the question: What are the consequences of an instrumented planet? That is, a planet on which more and more data exhaust is captured and analyzed, permitting better decisions and more efficient choices.

In fact, one of the comments on Kedrosky’s blog post about the talk (by one J Slack) suggests a continuing move toward “informational efficiency” — with more and more instrumentation generating data and more and more connectivity, he suggests, “we’ll be continuously approaching an asymptotic efficiency, though never quite getting there.”

A standard definition of informational efficiency is “the speed and accuracy with which prices reflect new information” (TheFreeDictionary.com).  But there is some circularity here — in this context it’s only information if it does affect the price, otherwise it’s mere noise.  And so we’re still left with the challenge of sorting out the signal from the noise even after the data has been extracted from the exhaust.  And the more of everything the more of a job it is.

Bottom line: I think “data exhaust” is a great concept, but I don’t think perfecting its capture and analysis gets you to a fully efficient use of information about the world (even asymptotically).  The second law of thermodynamics kicks in along the way for starters, but the boundedness of human cognition finishes the job.

Somebody is probably going to point out that evolution already does this (that is, it’s the most unobtrusive data collection method of all).  But it takes big numbers and lots of time to do it and the result, though beautiful, is messy.

More to think about here, to be sure.

See Also (2014)

Johnson, Steven. “What a Hundred Million Calls to 311 Reveal About New York.” Wired Magazine 11.01.10

Information Rot

A whole chapter in my book on the sociology of information will be about information permanence and impermanence and so this posting by David Pogue caught my eye: “Should You Worry about Data Rot?” It’s text of an interview that was a part of a video piece he did for CBS a few weeks ago. The basic idea is that we store our data on media that are subject to degradation and that require for play back hardware or software that have short lifetimes. We are left with the problem of constantly “migrating” our data to new formats.

An important observation : the pace at which data recording formats become obsolete and unreadable is accelerating. The experts cited in the piece suggest we are currently at the ten year mark — at this point, one needs to migrate to new media every ten years.

The video piece ends with the observation that there’s never been, nor ever will be, a data recording technology that lasts forever. Of course, one’s first thoughts go to clay tablets from ancient Persia, which seem to have held up rather well. True, but of all the clay tablets ever produced, we have, in all likelihood, but a small fraction. But then, given what’s on most of them (e.g., records of grain sale transactions or inventories of food storage), it’s not clear that the information order is impoverished by their absence. Of what fraction of our current information holdings could the same be said of. One wonders, but one migrates one’s own data, just in case.