Is Your Information Your Business?

The Business section is fast becoming the sociology of information section.

In “Twitter Is All in Good Fun, Until It Isn’t,” David Carr writes about Roland Martin being sanctioned by CNN because of controversial Twitter posts.  On the Bits page, Nick Bolton’s article “So Many Apologies, So Much Data Mining,” tells of David Morin, head of the company that produces the social network app, Path (“The smart journal that helps you share life with the ones you love.”), that got into hot water last week when a programmer in Singapore noticed it hijacked users’ address books without asking. On page B3 we find a 14 inch article by T. Vega about new research from Pew about how news media websites fail to make optimal use of online advertising.

More on those in future posts.  Right next to the Pew article, J. Brustein’s “Start-Ups Seek to Help Users Put a Price on Their Personal Data” profiles the startup “Personal” — one of several that are trying to figure out how to let internet users capitalize on their personal data by locking it up in a virtual vault and selling access bit by bit.

This last one is of particular interest to me. Back in the early 90s I floated an idea that alarmed my social science colleagues: why not let study participants own their data? The idea was inspired by complaints that well-meaning researchers at Yale, where I was a graduate student at the time, routinely made their careers on the personal information they, or someone else, had collected from poor people in New Haven. The original source of that complaint was a community activist who had a more colorful way of describing the relationship between researcher and research subject.

The idea would be to tag data garnered in surveys and other forms of observation with an ID that could be matched with an escrow database (didn’t really exist then, but now a part of “Software as a Service (Saas)”). When a researcher wanted to make use of data, she or he would include in the grant proposal some sort of data fee that would be delivered to the intermediary and then distributed as data royalties to the individuals the data concerned. The original researcher would still offer whatever enticements to participation (a bit like an advance for a book). The unique identifier held by the intermediary would allow data linking producing a valuable tool for research and an opportunity for research subjects to continue to collect royalties as their data was “covered” by new research projects just as a song writer does.

The most immediate objections were technical — real but solvable, but then came the reasoned objections. This would make research more expensive! Perhaps, but another way to see this is that it would be a matter of more fully accounting for social costs and value, and for recognizing that researchers were taking advantage of latent value in their act of aggregation (similar to issues raised about Facebook recently). Another objection was that the purpose of the research was already to help these people. True enough. But why should they bear all the risk of that maybe working out, maybe not?

And so the conversation continued. I’m not sure I like the idea of converting personal information into monetary value; I think it sidesteps some important human/social/cultural considerations about privacy, intimacy, and the ways that information behavior is integral to our sense of self and and sense of relationships. But I do think it is critically important that we think carefully about the information order and how the value of information is created by surveillance and aggregation and how we want to think about what happens to the information we give, give up, and give off.


This is your Background Check on Steroids

An article, “Social Media History Becomes a New Job Hurdle,” by Jennifer Preston in yesterday’s NYT is obvious fodder for the sociology of information.  It’s primarily about Social Intelligence, a web start up that puts together dossiers about potential employees for its clients by “scraping” the internet.

Issues that show up here:

  • the federal government (FTC) was looking into whether the company’s practices might violate the fair credit reporting act (FCRA), but determined it was in compliance
  • “privacy advocates” said to be concerned that it might encourage employers to consider information not relevant to job performance (why not fair employment advocates? — later in the article we do find mention of Equal Employment Opportunity Commission)
  • what do we make of the statement: “Things that you can’t ask in an interview are the same things you can’t research”?
  • since this is really just an extension of the idea of the “background check” — can we think a little more systematically about that as a general idea prior to getting mired in details of internet presence searches?

Perhaps more alarming than the mere question of information surfacing was the suggestion by the company’s founder, Max Drucker, about how a given bit of scraped information might be interpreted.  To wit, he mentioned fact that a person had joined a particular Facebook group might “mean you don’t like people who don’t speak English.”  According the reporter he posed this question rhetorically: “Does that mean…?”  This little bit of indirect marketing via fear mongering adds another layer to what we need to look at: what sort of information processing (including interpretation and assessment) are necessary in a world where larger and larger amounts of information are available (cf. CIA problem of turning acquired information into intelligence via analysis).

Drucker characterized the company’s goal as “to conduct pre-employment screenings that would help companies meet their obligation to conduct fair and consistent hiring practices while protecting the privacy of job candidates.”  This raises another interesting question: if an agent has a mandated responsibility for some level of due diligence and information is, technically, available, will a company necessarily sprout up to collect and provide this information?  Where would feasibility, cost, and the uncertainty of interpretation enter the equation?  Can the employer, for example, err on the side of caution and exclude the individual who joined the Facebook group because that fact MIGHT mean something that the employer could be liable for not having discovered?  Will another company emerge that helps to assess the likelihood of false positives or false negatives?  What about if it is only a matter of what the company wants in terms of its corporate culture?  Can we calculate the cost (perhaps in terms of loss of human capital, recruitment costs, etc.) of such technically assisted vigilance?

Anonymity and the Demise of the Ephemeral

The New York Times email update had the right headline “Upending Anonymity, These Days the Web Unmasks Everyone” but made a common mistake in the blurb: “Pervasive social media services, cheap cellphone cameras, free photo and video Web hosts have made privacy all but a thing of the past.”

It’s going to be important in our policy conversations in coming months and years to get a handle on the difference between privacy and anonymity (and others such as confidentiality) and how we think about rights to, and expectations of, each.

There’s a long continuum of social information generation/acquisition/transmission along which these various phenomena can be located:

  • artifactual “evidence” can suggest that someone did something (an outburst on a bus, a car  broken into, a work of art created)
  • meta-evidence provides identity trace information about the person who did something (a fingerprint, a CCTV picture, DNA, an IP address, brush strokes)
  • trace evidence can be tied to an identity (fingerprints on file, for example)
  • data links can suggest other information about a person so identified

Technology is making each of these easier, faster, cheaper and more plentiful.  From the point of view of the question, whodunnit?, we seem to be getting collectively more intelligent: we can zero in on the authorship of action more than ever before.  But that really hasn’t much to do with “privacy,” per se.

As Dave Morgan suggests in OnlineSpin (his hook was Facebook’s facial recognition technology that allows faces in new photos to be automatically tagged based on previously tagged photos a user has posted) the capacity to connect the dots is a bit like recognizing a famous person on the street, and this, he notes, has nothing to do with privacy.

What it does point to is that an informational characteristic of public space is shifting.  One piece of this is the loss of ephemerality, a sharp increase in the half-life of tangible traces.  Another is, for want of a better term on this very hot morning in Palo Alto, “linkability”; once one piece of information is linked to another, it can easily be linked again.  And this compounds the loss of ephemerality that arises from physical recording alone.

From the point of view of the question asked above, the change can mean “no place to hide,” but from the point of view of the answer, it might mean that the path to publicity is well-paved and short.

Some celebrate on both counts as a sort of modernist “the truth will out” or post-modernist Warholesque triumph.  But as pleased as we might be at the capacity of the net to ferret out the real story (the recent unmasking of “Gay Girl in Damascus” yet another example), the same structure can have the opposite effect.  The web also has immense capacity for the proliferation and petrification of falsehood (see, for example, Fine and Ellis 2010 or Sunstein 2009).

Thus, it may well be that the jury is still out on the net effect on the information order.

See also :“No Such Thing as Evanescent Data”

FTC Proposes "Do Not Track" Option for Consumer Privacy

The Federal Trade Commission released a preliminary report, “Protecting Consumer Privacy in an Era of Rapid Change,” for public comment today. Among other things, it did suggest the “do not track” option for web surfers. Here’s the NYT article on the report.

A few weeks ago E. Wyatt and T. Vega wrote of the then forthcoming FTC report on net privacy in “Stage Set for Showdown on Online Privacy” (NYT November 9, 2010):

“Consumer advocates worry that the competing agendas of economic policy makers in the Obama administration, who want uniform international standards, and federal regulators, who are trying to balance consumer protection and commercial rights, will neglect the interests of people most affected by the privacy policies. “I hope they realize that what is good for consumers is ultimately good for business,” said Susan Grant, director of consumer protection at the Consumer Federation of America.”

The report contains what look like some good, balanced, and practical guidelines for how consumers and information collecting entities interact on the web and elsewhere.

I’d like to propose, as a thought experiment, a more radical approach.  What if we started from the premise that everyone owns her own information.  You own you opinions, your attitudes, and the traces your behavior might create.  If this information is valuable to another entity, they are free to bid on it.  We don’t need privacy protections, we just need an infrastructure that will allow for a market for private information to operate.

A website or a retailer can have an offer, right at the front door: if you want to browse here, I want to know your name and take note of what you look at.  The consumer, in return, can say, you can watch me, for 5 dollars.  Consumers can make money by moving around the net and generating value.  The entities who host websites on which behavior turns into information turns into value would also be entitled to a share.

Now take the idea a step further.  Suppose rather than selling my information I agree to license it.  This time I say, you can watch me for $5 but down the road, if any value accrues to you by virtue of you aggregating my information with that of others, I want a cut.  As my information goes upstream, up the aggregation pyramid, it becomes a component in something valuable: I deserve a share. 

Of course, we’ll be told this is completely impractical.  Retailers and other entities would just build in the cost.  And the transaction costs would be too high.  Maybe.  But we’ve got micro-credits  worked out at the level of single click-throughs.  I don’t think the barriers would be technical.

From Information Superhighway to Information Metrosystem

The new FTC report on consumer privacy has an interesting graphic in an appendix. It purports to be a model of the “Personal Data Ecosystem.” It’s interesting as an attempt to portray a four-mode network : individuals, data collectors, data brokers, and data users. The iconography here seems to be derived from classic designs of subway and underground maps.


The genre mixing in the diagram invites, on the one hand, a critical look at where the FTC is coming from in the report (which, in my limited experience of digesting FTC output looks relatively well done) and, on the other, points toward a need to better conceptualize the various components and categories.

Under “collectors,” for example, we have public, internet, medical, financial and insurance, telecommunications and mobile, and retail. The next level (brokers) includes affiliates, information brokers, websites, media archives, credit bureaus, healthcare analytics, ad networks and analytics, catalog coops, and list brokers. Finally, on the info users front we have employers, banks, marketers, media, government, lawyers and private investigators, individuals, law enforcement, and product and service delivery.

It’s a provocative diagram that helps to focus our attention on the conceptual complexity of “personal information” in an information economy/society. More on this to follow.