Data Exhaust and Informational Efficiency

Heard an interesting talk by Paul Kedrosky a few weeks ago at PARC titled Data Exhaust, Ladders, and Search.

The gist of the talk is that human behaviors of all kinds leave traces which constitute latent datasets about that activity. Social scientists have long had a name for gathering this type of data: unobtrusive observation. Perhaps the most famous example is looking at carpet wear in a museum as a way of figuring out which exhibits captured the most visitor attention or garbology and related “trace measures used by anthropologist W. Rathje in the 70s and 80s.

One of Kedrosky’s nicer examples was comparing aerial view of Wimbledon center court at the end of a recent tournament with one from the 1970s. The total disappearance of the net game from professional tennis was clearly visible in the wear patterns on the grass court.

In addition to a number of neat examples (ladders found on highways as indicator of housing bubble was a favorite) of using various techniques to capture “data exhaust” (indeed, he suggests, it’s the entire principle behind google), he asks the question: What are the consequences of an instrumented planet? That is, a planet on which more and more data exhaust is captured and analyzed, permitting better decisions and more efficient choices.

In fact, one of the comments on Kedrosky’s blog post about the talk (by one J Slack) suggests a continuing move toward “informational efficiency” — with more and more instrumentation generating data and more and more connectivity, he suggests, “we’ll be continuously approaching an asymptotic efficiency, though never quite getting there.”

A standard definition of informational efficiency is “the speed and accuracy with which prices reflect new information” (TheFreeDictionary.com).  But there is some circularity here — in this context it’s only information if it does affect the price, otherwise it’s mere noise.  And so we’re still left with the challenge of sorting out the signal from the noise even after the data has been extracted from the exhaust.  And the more of everything the more of a job it is.

Bottom line: I think “data exhaust” is a great concept, but I don’t think perfecting its capture and analysis gets you to a fully efficient use of information about the world (even asymptotically).  The second law of thermodynamics kicks in along the way for starters, but the boundedness of human cognition finishes the job.

Somebody is probably going to point out that evolution already does this (that is, it’s the most unobtrusive data collection method of all).  But it takes big numbers and lots of time to do it and the result, though beautiful, is messy.

More to think about here, to be sure.

See Also (2014)

Johnson, Steven. “What a Hundred Million Calls to 311 Reveal About New York.” Wired Magazine 11.01.10

Who Knows What Everybody Knows?

A few months ago I was hanging out with a group of folks from National Public Radio’s “Next Generation Radio” Project. Participants were sitting around a room editing stories on their laptops. At some point one, who didn’t strike me as a techie, showed one of the real tech whizzes that you could change the case of text in MSWord with a single command. He thought this was the coolest thing he had seen in a while.

I sat their trying to figure out how this guy who seemed like he knew way more about computers than I did could possibly not already know this. It reminded me of a bit from the Devil’s Dictionary: “self evident — evident to the self and no one else.”

Relevance for the sociology of information? Any bit of information we possess potentially has a “meta-informational wrapper” that tells us who else knows it. We experience this wrapper along a continuum from, say, “I’ve got a secret” to “duh, everybody knows THAT.” What’s interesting, though, is how hard it is to achieve anything like 100% accuracy on this meta-informational front.

I was motivated to think about this while reading David Pogue’s blog/column in the NYT the other day (“Tech Tips for the Basic Computer User”). In it he listed a few tips that are useful for those of us using computers — things like “you can select a word by double clicking it.” He doesn’t come out and put it like this, but I think a take-away from the piece is that these are things that if we know them we don’t think of them as “tips” — they are just things one knows about the machines one uses. It’s actually another cognitive step to recognize these taken-for-granted bits of know-how as things that lots of other people might not know. Nobody, after all, wants to pass along a tip that’s not really a tip (“duh” hurts!).

Interestingly, the blog turns out to be a great vehicle for eliciting tips from folks without the disincentives of (1) worrying that the person you give the tip to will not think it is a tip, or (2) having to make the observation that the recipient does not know something so as to be “safe” in giving the tip. If you read the 1000 plus tips people have sent in, you will no doubt find some of them to be “obvious” and “well known” but others will provide an “aha” moment.

He suggests at one point that of all the stuff you know that “everybody probably knows” probably only 40% of everybody actually knows. I don’t know about the numbers, but the point is a good one. Most of us are probably not very good estimators of how much of what we know is idiosyncratic, local, general, or universal knowledge.