Decisions in the late 1950’s to create anonymized data extractions from the decennial census records democratized data access with deep respect for the privacy of individuals. The later creation of archives of anonymized Federally-funded social science data sets (often sample survey data) led directly to great advances in the empirical social sciences in the United States, securing the country’s academic leadership in that domain. At this point, most social science undergraduates access these data as part of their learning of the trade of social science analysis.
Over the same decades a vibrant proprietary survey industry also emerged, primarily built around customer satisfaction measurement as a feedback loop to manufacturers, retailers, and service providers. Similarly, political campaigns and marketing departments have created proprietary survey data for guidance in strategic moves potentially affecting important outcomes. Some companies emerged to sell statistical information for private sector purposes (e.g., Neilson television ratings; Arbitron radio ratings).
The two data worlds, the democratized access of anonymized microdata vs. closely held proprietary data, have existed in parallel, largely nonintersecting realities over the past 50 years. This caused little loss of human knowledge advance, in my opinion, because the proprietary data often had very limited use, focused directly on the actions of a single company.
The 21st century data world consists of data produced by private sector enterprises as auxiliary to their product or service delivery, as well as the traditional survey data common to the 20th century. Since survey data are increasingly costly per record (falling participation rates leading to increased efforts to gain cooperation), the “big” data or organic data will predominate in the future.
The 20th century information model (e.g., monthly monitoring of major economic statuses; annual measurement of health conditions, criminal victimizations, research and development, and educational outcomes) achieved common-good objectives of the US society. It is not hyperbolic to state that such indicators were the foundation of the “informed citizenry” so important to democracies.
Can we discover a new model to serve those common good purposes in the 21st century?
The answer may lie in a model that is neither dependent on pure altruistic actions serving the common good nor on a direct profit-making motive. This model would need to address two key issues: a) how can it offer financial or information benefits to the private sector data holders and the individuals described by the data, and b) how can it offer full respect for individual privacy?
Can we learn from the new social entrepreneurship movement, which seeks sustainable revenue models for social good purposes? Is there a model that does serve the common good but is self-sustaining because it also serves other interests?
How will such a model address protections of data from one entity from its competitors? How would the confidentiality promises of the data holders be respected?
In short, for the 21st century, can we build a data world that has the legitimacy and credibility key to informing the society about itself, but one that doesn’t depend solely on organizations whose sole mission is serving the common good?
Bill,
Such challenges are what makes this work interesting and fun, don’t you think?
Kate Mereand-Sinha, McCourt
Kate,
Absolutely! Georgetown has all the pieces needed to solve both of aspects of the problem. Furthermore, we are located in the political and legal center where most of the action in such regard is going to occur.
Georgetown should go for it, with McCourt in the lead. And as you are (I’m guessing) a student there, I encourage you to consider working in this area, if you aren’t already.
Bill Kuncik, Georgetown MALS
The Massive Data Institute needs to proceed on two fronts with regard to the issues Provost Groves raises in this week’s blog.
First, as the two previous replies suggest, MMDI can and should endeavor to be a leader in attempting to bring about a new legal regime that governs the use of “big data” so as to appropriately balance the multiple interests involved. It–and Georgetown–are uniquely suited to that important task.
Second, MMDI must carefully navigate around the problems and potential liabilities which the existing legal regime presents for “big data” work being done in the here and now. While too complex to elaborate upon here, one can get a sense of the landscape by googling “legal problems affecting big data,” or a similar search entry.
As I’ve cautioned in this space before, it is of utmost importance to consult with the University Counsel’s Office when engaging in this type of work.
Bill Kuncik, Georgetown MALS
It is interesting that all three previous replies immediately point to the proposed Massive Data Institute as a natural place where such questions ought to be addressed. I know from interactions with students that there is lively interest in such questions. Yet I can’t really give them good answers or pointers.
It is time to outline an intellectual agenda for the MMDI. It is time for McCourt to begin this task. There are many potential partners at Georgetown and in the DC areas who are waiting for this and are eager to participate.
“Whose data is that” is the central question that needs to be answered—legally, economically, and democratically.
We need to systematize an understanding of who owns data, what data must be protected, and when data must be transparent. There is an entire infrastructure that needs to be built. Standard setting, through a process of opening government data responsibly and a focus on creating new data sets to answer bit questions will lead us to the collaborations we need to set these standards.
I would love to see the Massive Data Institute be a part of this conversation, but my guess is that will only happen if MDI is part of and helps convene a conversation with government, non-profits, legal minds, industry, and the public.
It is a great public policy challenge, and I hope the students at the McCourt school get a chance to help answer this question too.
Questions that hopefully will be addressed (and, with some luck and hard work, resolved) by the Massive Data Institute….