No Such Thing as Evanescent Data

Pretty good coverage of the “iphone keeps track of where you’ve been” story in today’s NYT “Inquiries Grow Over Apple’s Data Collection Practices” and in David Pogue’s column yesterday (“Your iPhone Is Tracking You. So What?“). Not surprisingly, devices that have GPS capability (or even just cell tower triangulation capability) write the information down. Given how cheap and plentiful memory is, not surprising that they do so in ink.

This raises a generic issue: evanescent data (information that is detected, perhaps “acted” upon, and then discarded) will become increasingly rare.  We should not be surprised that our machines rarely allow information to evaporate and it is important to note that this is not the same as saying that any particular big brother (or sister) is watching.  Like their human counterparts, a machine that can “pay attention” is likely to remember — if my iPhone always know where it is, why wouldn’t it remember where it’s been? 

It’s the opposite of provenience that matters — not where the information came from but where it might go to.  Behavior always leaves traces — what varies is the degree to which the trace can be tied to its “author” and how easy or difficult it is to collect the traces and observe or extract patterns they may contain.  These reports suggest that the data has always been there, but was relatively difficult to access.  It’s only recently that, ironically, due to the work of the computer scientists who “outed” Apple, that there is an easy way to get at the information.

Setting aside the issue of nefarious intentions, we are reminded of the time-space work of the human geographers such as Nigel Thrift and Tommy Carlstein who did small scale studies of the space-time movements of people in local communities in the 1980s and since. And, too, we are reminded of the 2008 controversy stirred up when some scientists studying social networks used anonymized cell phone data on 100,000 users in an unnamed country.

Of course, the tracking of one’s device is not the same as the tracking of oneself.  We can imagine iPhones that travel the world like that garden gnome in Amelie and people being proud not just of their own travels but where there phone has been. 

See also

  1. Technologically Induced Social Alzheimers
  2. Information Rot

Data Exhaust and Informational Efficiency

Heard an interesting talk by Paul Kedrosky a few weeks ago at PARC titled Data Exhaust, Ladders, and Search.

The gist of the talk is that human behaviors of all kinds leave traces which constitute latent datasets about that activity. Social scientists have long had a name for gathering this type of data: unobtrusive observation. Perhaps the most famous example is looking at carpet wear in a museum as a way of figuring out which exhibits captured the most visitor attention or garbology and related “trace measures used by anthropologist W. Rathje in the 70s and 80s.

One of Kedrosky’s nicer examples was comparing aerial view of Wimbledon center court at the end of a recent tournament with one from the 1970s. The total disappearance of the net game from professional tennis was clearly visible in the wear patterns on the grass court.

In addition to a number of neat examples (ladders found on highways as indicator of housing bubble was a favorite) of using various techniques to capture “data exhaust” (indeed, he suggests, it’s the entire principle behind google), he asks the question: What are the consequences of an instrumented planet? That is, a planet on which more and more data exhaust is captured and analyzed, permitting better decisions and more efficient choices.

In fact, one of the comments on Kedrosky’s blog post about the talk (by one J Slack) suggests a continuing move toward “informational efficiency” — with more and more instrumentation generating data and more and more connectivity, he suggests, “we’ll be continuously approaching an asymptotic efficiency, though never quite getting there.”

A standard definition of informational efficiency is “the speed and accuracy with which prices reflect new information” (  But there is some circularity here — in this context it’s only information if it does affect the price, otherwise it’s mere noise.  And so we’re still left with the challenge of sorting out the signal from the noise even after the data has been extracted from the exhaust.  And the more of everything the more of a job it is.

Bottom line: I think “data exhaust” is a great concept, but I don’t think perfecting its capture and analysis gets you to a fully efficient use of information about the world (even asymptotically).  The second law of thermodynamics kicks in along the way for starters, but the boundedness of human cognition finishes the job.

Somebody is probably going to point out that evolution already does this (that is, it’s the most unobtrusive data collection method of all).  But it takes big numbers and lots of time to do it and the result, though beautiful, is messy.

More to think about here, to be sure.

See Also (2014)

Johnson, Steven. “What a Hundred Million Calls to 311 Reveal About New York.” Wired Magazine 11.01.10

Outflanking "the Human" with Information

Two stories in NYT today about data crunching. One on mapping neuron connections in mice to understand how brains work. The other on using statistics to detect possible cheating on standardized tests.

The brain research takes thin slices of brain tissue and maps connections between neurons in a really BIG (petabyte per mm3) data mining operation. The research is in the infancy stage, but eventually one can imagine having a full circuit diagram of a brain. Interesting implications possibly grasped by either researchers or the articles author:

Neuroscientists say that a connectome could give them myriad insights about the brain’s function and prove particularly useful in the exploration of mental illness. For the first time, researchers and doctors might be able to determine how someone was wired — quite literally — and compare that picture with “regular” brains.

Experts quoted in the article debate whether the research is promising enough to spend millions on.  But this comment about defining normal or regular brains is not one of the concerns they mention.  What are the informational implications of having a data set that describes the connections of a “normal” person? 

The second article, “Cheaters Find an Adversary in Technology,” reads as a shameful bit of commercial promotion masquerading as journalism, but does usefully illuminate the worldview of  the standardized test industry.  The story is about a company that uses statistics to detect cheaters.  Their algorithms are designed to detect things like similar patterns of wrong answers, changed answers, and big improvements in test scores.  If a group of students all misunderstood something in the same way it would look like cheating.  And a test taker who “saw the light” at one point and went back and changed several answers will look like a cheater.  And the thing we do most in school, attempt to teach people stuff, if successful would lead to big improvements in test scores.  But that too, according to the experts, would look like cheating.

There is an arrogance about testers (the gentleman profiled calls himself (unselfconsciously, notes the journalist) “an icon” — (those who have never heard of him are poorly informed)) that consistently rankles.  And their self-promotion as agents of fairness and meritocracy (recall The Big Test) is simple hypocrisy.  More problematic, though, is the influence on teaching, learning, and scholarship of a regime that bases its authority and legitimacy on science and objectivity, but that shrouds itself in secrecy and lives OFF rather than FOR education.

Why these two articles together?  They suggest a sort of pincer maneuver against “the human” based in information — on one flank, structure, define the normal brain to a (particular) giant matrix of ones and zeros, while on the other, behavior, treat statistically unusual patterns of activity as morally suspect.  “Super Crunching” may be a way of the future, but one might lament the likelihood that it is THE way of the future, crowding out or delegitimizing other forms of inquiry into the human condition.  Together, these two articles suggest the imperative of an affirmative complement to our fascination with what we CAN do with information.

Source Mentions and Allusions

  1. Ayres, Ian.  2008. Super Crunchers: Why Thinking-By-Numbers is the New Way To Be Smart
  2. Foucault, Michel. 1995 (1975). Discipline and Punish: The Birth of the Prison
  3. Gabriel, Trip. 2010. “Cheaters Find an Adversary in Technology.” New York Times, December 27, 2010
  4. Swedberg, Richard. 2000. Max Weber and the idea of economic sociology
  5. Vance, Ashlee. 2010. “In Pursuit of a Mind Map, Slice by Slice.” New York Times, December 27, 2010