Data Exhaust and Informational Efficiency

Heard an interesting talk by Paul Kedrosky a few weeks ago at PARC titled Data Exhaust, Ladders, and Search.

The gist of the talk is that human behaviors of all kinds leave traces which constitute latent datasets about that activity. Social scientists have long had a name for gathering this type of data: unobtrusive observation. Perhaps the most famous example is looking at carpet wear in a museum as a way of figuring out which exhibits captured the most visitor attention or garbology and related “trace measures used by anthropologist W. Rathje in the 70s and 80s.

One of Kedrosky’s nicer examples was comparing aerial view of Wimbledon center court at the end of a recent tournament with one from the 1970s. The total disappearance of the net game from professional tennis was clearly visible in the wear patterns on the grass court.

In addition to a number of neat examples (ladders found on highways as indicator of housing bubble was a favorite) of using various techniques to capture “data exhaust” (indeed, he suggests, it’s the entire principle behind google), he asks the question: What are the consequences of an instrumented planet? That is, a planet on which more and more data exhaust is captured and analyzed, permitting better decisions and more efficient choices.

In fact, one of the comments on Kedrosky’s blog post about the talk (by one J Slack) suggests a continuing move toward “informational efficiency” — with more and more instrumentation generating data and more and more connectivity, he suggests, “we’ll be continuously approaching an asymptotic efficiency, though never quite getting there.”

A standard definition of informational efficiency is “the speed and accuracy with which prices reflect new information” (TheFreeDictionary.com).  But there is some circularity here — in this context it’s only information if it does affect the price, otherwise it’s mere noise.  And so we’re still left with the challenge of sorting out the signal from the noise even after the data has been extracted from the exhaust.  And the more of everything the more of a job it is.

Bottom line: I think “data exhaust” is a great concept, but I don’t think perfecting its capture and analysis gets you to a fully efficient use of information about the world (even asymptotically).  The second law of thermodynamics kicks in along the way for starters, but the boundedness of human cognition finishes the job.

Somebody is probably going to point out that evolution already does this (that is, it’s the most unobtrusive data collection method of all).  But it takes big numbers and lots of time to do it and the result, though beautiful, is messy.

More to think about here, to be sure.

See Also (2014)

Johnson, Steven. “What a Hundred Million Calls to 311 Reveal About New York.” Wired Magazine 11.01.10

Those damn unconnected dots again (rough draft)

An article in the Times, under the headline “Obama Says Plot Could Have Been Disrupted,” reprises the metaphor of “connecting the dots” to describe different pieces of information having been in different heads, but never getting put together in one head that could make sense of them.

It is reassuring that Obama’s speaking bluntly about organizational performance rather than riding roughshod over the constitution, but, as argued in an earlier piece (“Mind the Gap“), the idea that it’s a simple problem of dot connecting is a basic misconception.

How do you hear “connect the dots”?  One version is reminiscent of a detective show or Agatha Christie novel; the challenge is to assemble hints — pieces of information that, alone, are not conclusive proof of anything — in such a way that the “answer” emerges as a sort of logical necessity.  The “logic” is in the mind of the beholder, but that’s all.

A different version is reminiscent of the we draw lines between stars and come up with “constellations.”  Two things are important.  One, the stars are not really next to one another — the viewer is the one who sees them as points on a plane and interpolates and extrapolates the other vertices of the figure.  Two, there’s no there there — the crab in cancer or the warrior in Orion has to be brought to the observation by us.





The first requires us to have all the pieces on the table and be open to what they “tell us” when seen together.  The challenge for intelligence agencies is to put the information from various sources onto the same table.

The second requires us to decide what to pay attention to and what to ignore (left), how to connect and not connect (middle), and what to add that’s not there (right).

If we increase the degree of information sharing we fill up our field of view with more and more points and the dots get harder and harder to connect.

On the other hand, if we ask the different agencies to filter the information then we are back in hot water because none of them know what they are looking for.

The president was furious about the failure of the system to see “the red flags” and intelligence agencies are reported to have said that the information they had was “vague but available.”  The problem is that flags are not, in general, a priori red.  Presumably, some smart people are thinking about how systems see and things like that; hopefully, they don’t just think of it as “connect the dots.”

We observe with some irony that the actual policy response to the problem — at least the response that’s been announced — is in fact to gather more information via increased screening.


Oh, and if we look up “connect the dots” in Wikipedia you get a short article about a children’s game. It bears a Wiki-warning: “This article may require cleanup to meet Wikipedia’s quality standards.”