Is Your Information Your Business?

The Business section is fast becoming the sociology of information section.

In “Twitter Is All in Good Fun, Until It Isn’t,” David Carr writes about Roland Martin being sanctioned by CNN because of controversial Twitter posts.  On the Bits page, Nick Bolton’s article “So Many Apologies, So Much Data Mining,” tells of David Morin, head of the company that produces the social network app, Path (“The smart journal that helps you share life with the ones you love.”), that got into hot water last week when a programmer in Singapore noticed it hijacked users’ address books without asking. On page B3 we find a 14 inch article by T. Vega about new research from Pew about how news media websites fail to make optimal use of online advertising.

More on those in future posts.  Right next to the Pew article, J. Brustein’s “Start-Ups Seek to Help Users Put a Price on Their Personal Data” profiles the startup “Personal” — one of several that are trying to figure out how to let internet users capitalize on their personal data by locking it up in a virtual vault and selling access bit by bit.

This last one is of particular interest to me. Back in the early 90s I floated an idea that alarmed my social science colleagues: why not let study participants own their data? The idea was inspired by complaints that well-meaning researchers at Yale, where I was a graduate student at the time, routinely made their careers on the personal information they, or someone else, had collected from poor people in New Haven. The original source of that complaint was a community activist who had a more colorful way of describing the relationship between researcher and research subject.

The idea would be to tag data garnered in surveys and other forms of observation with an ID that could be matched with an escrow database (didn’t really exist then, but now a part of “Software as a Service (Saas)”). When a researcher wanted to make use of data, she or he would include in the grant proposal some sort of data fee that would be delivered to the intermediary and then distributed as data royalties to the individuals the data concerned. The original researcher would still offer whatever enticements to participation (a bit like an advance for a book). The unique identifier held by the intermediary would allow data linking producing a valuable tool for research and an opportunity for research subjects to continue to collect royalties as their data was “covered” by new research projects just as a song writer does.

The most immediate objections were technical — real but solvable, but then came the reasoned objections. This would make research more expensive! Perhaps, but another way to see this is that it would be a matter of more fully accounting for social costs and value, and for recognizing that researchers were taking advantage of latent value in their act of aggregation (similar to issues raised about Facebook recently). Another objection was that the purpose of the research was already to help these people. True enough. But why should they bear all the risk of that maybe working out, maybe not?

And so the conversation continued. I’m not sure I like the idea of converting personal information into monetary value; I think it sidesteps some important human/social/cultural considerations about privacy, intimacy, and the ways that information behavior is integral to our sense of self and and sense of relationships. But I do think it is critically important that we think carefully about the information order and how the value of information is created by surveillance and aggregation and how we want to think about what happens to the information we give, give up, and give off.

Related

Sociology of Information in the New York Times

Published: September 3, 2011
Why all the sharp swings in the stock market? To Robert J. Shiller, it’s a case of investors trying to guess what other investors are thinking….

Seeking not what is the case, but what others probably think is, or even what others think that others think is…

Published: September 2, 2011
When Rick Perry, the governor of Texas and a presidential hopeful, debates his rivals, his assertions on climate change, Social Security and health care could put him to the test….

Once it’s out there, it’s out there…

Published: August 29, 2011
The antisecrecy organization WikiLeaks published nearly 134,000 diplomatic cables, including many that name confidential sources….

Developing story — a leak, a revelation, or just a mistake?  (See also previous posts on Wikileaks.)

Anonymity and the Demise of the Ephemeral

The New York Times email update had the right headline “Upending Anonymity, These Days the Web Unmasks Everyone” but made a common mistake in the blurb: “Pervasive social media services, cheap cellphone cameras, free photo and video Web hosts have made privacy all but a thing of the past.”

It’s going to be important in our policy conversations in coming months and years to get a handle on the difference between privacy and anonymity (and others such as confidentiality) and how we think about rights to, and expectations of, each.

There’s a long continuum of social information generation/acquisition/transmission along which these various phenomena can be located:

  • artifactual “evidence” can suggest that someone did something (an outburst on a bus, a car  broken into, a work of art created)
  • meta-evidence provides identity trace information about the person who did something (a fingerprint, a CCTV picture, DNA, an IP address, brush strokes)
  • trace evidence can be tied to an identity (fingerprints on file, for example)
  • data links can suggest other information about a person so identified

Technology is making each of these easier, faster, cheaper and more plentiful.  From the point of view of the question, whodunnit?, we seem to be getting collectively more intelligent: we can zero in on the authorship of action more than ever before.  But that really hasn’t much to do with “privacy,” per se.

As Dave Morgan suggests in OnlineSpin (his hook was Facebook’s facial recognition technology that allows faces in new photos to be automatically tagged based on previously tagged photos a user has posted) the capacity to connect the dots is a bit like recognizing a famous person on the street, and this, he notes, has nothing to do with privacy.

What it does point to is that an informational characteristic of public space is shifting.  One piece of this is the loss of ephemerality, a sharp increase in the half-life of tangible traces.  Another is, for want of a better term on this very hot morning in Palo Alto, “linkability”; once one piece of information is linked to another, it can easily be linked again.  And this compounds the loss of ephemerality that arises from physical recording alone.

From the point of view of the question asked above, the change can mean “no place to hide,” but from the point of view of the answer, it might mean that the path to publicity is well-paved and short.

Some celebrate on both counts as a sort of modernist “the truth will out” or post-modernist Warholesque triumph.  But as pleased as we might be at the capacity of the net to ferret out the real story (the recent unmasking of “Gay Girl in Damascus” yet another example), the same structure can have the opposite effect.  The web also has immense capacity for the proliferation and petrification of falsehood (see, for example, Fine and Ellis 2010 or Sunstein 2009).

Thus, it may well be that the jury is still out on the net effect on the information order.

See also :“No Such Thing as Evanescent Data”

No Such Thing as Evanescent Data

Pretty good coverage of the “iphone keeps track of where you’ve been” story in today’s NYT “Inquiries Grow Over Apple’s Data Collection Practices” and in David Pogue’s column yesterday (“Your iPhone Is Tracking You. So What?“). Not surprisingly, devices that have GPS capability (or even just cell tower triangulation capability) write the information down. Given how cheap and plentiful memory is, not surprising that they do so in ink.

This raises a generic issue: evanescent data (information that is detected, perhaps “acted” upon, and then discarded) will become increasingly rare.  We should not be surprised that our machines rarely allow information to evaporate and it is important to note that this is not the same as saying that any particular big brother (or sister) is watching.  Like their human counterparts, a machine that can “pay attention” is likely to remember — if my iPhone always know where it is, why wouldn’t it remember where it’s been? 

It’s the opposite of provenience that matters — not where the information came from but where it might go to.  Behavior always leaves traces — what varies is the degree to which the trace can be tied to its “author” and how easy or difficult it is to collect the traces and observe or extract patterns they may contain.  These reports suggest that the data has always been there, but was relatively difficult to access.  It’s only recently that, ironically, due to the work of the computer scientists who “outed” Apple, that there is an easy way to get at the information.

Setting aside the issue of nefarious intentions, we are reminded of the time-space work of the human geographers such as Nigel Thrift and Tommy Carlstein who did small scale studies of the space-time movements of people in local communities in the 1980s and since. And, too, we are reminded of the 2008 controversy stirred up when some scientists studying social networks used anonymized cell phone data on 100,000 users in an unnamed country.

Of course, the tracking of one’s device is not the same as the tracking of oneself.  We can imagine iPhones that travel the world like that garden gnome in Amelie and people being proud not just of their own travels but where there phone has been. 

See also

  1. Technologically Induced Social Alzheimers
  2. Information Rot

Sociology of Information in the News

A flurry of sociology of information items in today’s New York Times:

  1. “Book Lovers Fear Dim Future for Notes in the Margins”
  2. In a digital world, scholars see an uncertain fate for an old and valued practice.

  3. “Blogs Wane as the Young Drift to Sites Like Twitter”
  4. Long-form blogs were once the outlet of choice, but now sites like Facebook, Twitter and Tumblr are favored.

  5. “TV Industry Taps Social Media to Keep Viewers’ Attention”
  6. As more and more people chat on Facebook and Twitter while watching TV, networks are trying to figure out how to capitalize.

  7. “100 Years Later, the Roll of the Dead in a Factory Fire Is Complete”
  8. For the first time, the names of all the victims in the 1911 Triangle Waist Company fire will be read after a researcher’s identification of six unknown victims.

Information and the Humanities

New York Times reporter Patricia Cohen has been on the “ideas and intellectual life” beat for sometime and has recently produced some excellent pieces of potential interest to the sociologist of information:

  1. A Digital Future for the Founding Fathers.” January 30, 2011. The University of Virginia Press is in the process of putting the published papers of Washington, Jefferson, John Adams, James Madison, Alexander Hamilton and Benjamin Franklin on a free Web site.
  2. Scholars Recruit Public for Project.” December 27, 2010. A project in London is using crowd-sourcing to transcribe 40,000 unpublished manuscripts of the Enlightenment philosopher Jeremy Bentham.
  3. In 500 Billion Words, New Window on Culture.” December 16, 2010. A Google-backed project allows the frequency of specific words and phrases to be tracked in centuries of books.
  4. Digital Keys for Unlocking the Humanities’ Riches.” November 16, 2010. Digitally savvy scholars are exploring how technology can enhance understanding of the liberal arts.
  5. Analyzing Literature by Words and Numbers.” December 3, 2010.  A computer-generated process gives scholars a view into Victorian thought.

Outflanking "the Human" with Information

Two stories in NYT today about data crunching. One on mapping neuron connections in mice to understand how brains work. The other on using statistics to detect possible cheating on standardized tests.

The brain research takes thin slices of brain tissue and maps connections between neurons in a really BIG (petabyte per mm3) data mining operation. The research is in the infancy stage, but eventually one can imagine having a full circuit diagram of a brain. Interesting implications possibly grasped by either researchers or the articles author:

Neuroscientists say that a connectome could give them myriad insights about the brain’s function and prove particularly useful in the exploration of mental illness. For the first time, researchers and doctors might be able to determine how someone was wired — quite literally — and compare that picture with “regular” brains.

Experts quoted in the article debate whether the research is promising enough to spend millions on.  But this comment about defining normal or regular brains is not one of the concerns they mention.  What are the informational implications of having a data set that describes the connections of a “normal” person? 

The second article, “Cheaters Find an Adversary in Technology,” reads as a shameful bit of commercial promotion masquerading as journalism, but does usefully illuminate the worldview of  the standardized test industry.  The story is about a company that uses statistics to detect cheaters.  Their algorithms are designed to detect things like similar patterns of wrong answers, changed answers, and big improvements in test scores.  If a group of students all misunderstood something in the same way it would look like cheating.  And a test taker who “saw the light” at one point and went back and changed several answers will look like a cheater.  And the thing we do most in school, attempt to teach people stuff, if successful would lead to big improvements in test scores.  But that too, according to the experts, would look like cheating.

There is an arrogance about testers (the gentleman profiled calls himself (unselfconsciously, notes the journalist) “an icon” — (those who have never heard of him are poorly informed)) that consistently rankles.  And their self-promotion as agents of fairness and meritocracy (recall The Big Test) is simple hypocrisy.  More problematic, though, is the influence on teaching, learning, and scholarship of a regime that bases its authority and legitimacy on science and objectivity, but that shrouds itself in secrecy and lives OFF rather than FOR education.

Why these two articles together?  They suggest a sort of pincer maneuver against “the human” based in information — on one flank, structure, define the normal brain to a (particular) giant matrix of ones and zeros, while on the other, behavior, treat statistically unusual patterns of activity as morally suspect.  “Super Crunching” may be a way of the future, but one might lament the likelihood that it is THE way of the future, crowding out or delegitimizing other forms of inquiry into the human condition.  Together, these two articles suggest the imperative of an affirmative complement to our fascination with what we CAN do with information.

Source Mentions and Allusions

  1. Ayres, Ian.  2008. Super Crunchers: Why Thinking-By-Numbers is the New Way To Be Smart
  2. Foucault, Michel. 1995 (1975). Discipline and Punish: The Birth of the Prison
  3. Gabriel, Trip. 2010. “Cheaters Find an Adversary in Technology.” New York Times, December 27, 2010
  4. Swedberg, Richard. 2000. Max Weber and the idea of economic sociology
  5. Vance, Ashlee. 2010. “In Pursuit of a Mind Map, Slice by Slice.” New York Times, December 27, 2010