One of the difficult, but thrilling, aspects of modern social science research is the exploration of how new internet-affiliated sources of data can inform traditional social science questions. A group of Georgetown faculty, partnering with counterparts at the University of Michigan, have been working together in this area over the past three years or so. Recently, they have come to refer to their group as the Social Science Social Media Collaborative (see collaborator list below). They collectively were awarded a multi-year Michigan Institute for Data Science (MIDAS) research grant to investigate how traditional social science surveys could be blended with social media data. The McCourt School’s Massive Data Institute is a key home for the Georgetown work.
The two data worlds (survey data and social media data) have fundamental differences. First, surveys are slow to be designed, collected, and analyzed. Social media data are being emitted second-by-second. Second, surveys usually measure many attributes on the respondents. Social media data tend to be quite lean in measures from any one platform at any one time (a tweet contains words produced by a subscriber at a specific moment). Increasingly, social science analysis uses many attributes to discover the important predictors of some phenomenon. Third, survey data are designed by researchers to answer specific research questions, each respondent is given the same question to answer. Much social media data are generated without any researchers involved; they are organic to the day-to-day lives of individual subscribers; they are often unstructured text, not numerical data. Fourth, the percentage of the human population covered by social media is highly variable across platforms, unmeasured, and thus problematic for use in inference to the larger population. In short, whether social media data are relevant to important social science questions is one of the current puzzles of the field.
Despite these unknowns, there is great hope that combining social media data with traditional survey data can provide new insights for the social sciences. This is relevant to the work of the Collaborative. It is an interdisciplinary team of computer scientists, communication scholars, developmental psychologists, political scientists, statisticians, and survey methodologists. Thus far, they are working on two domains – 1) the process of public opinion formation and 2) parental decision-making.
One metaphor for these first social science uses of social media data is the first use of a microscope by scientists in the 1600’s – seeing for the first time, aspects of physical entities that the human eye could previously not detect. Sometimes it’s not even clear what one is looking at. The immediate question is how does the new, more granular view integrates with the traditional more highly aggregated view.
Regarding political attitude formation, the group is comparing tweets by journalists and the general public during the 2016 election season to concurrent Gallup polling, news coverage, and other event measurements. They have taken all these data sources to pose a variety of questions – how do the issues of importance to survey respondents compare to the tweets of Twitter subscribers over the weeks of the campaign? How do key phrases of journalists to describe events get transmitted among them, what are the patterns of influence among them? Do survey respondents use similar phrases as Twitter subscribers to describe election events?
Regarding the parenting decisions and learning processes, surveys, blogs, parenting websites, and tweets are the data sources simultaneously examined. How do parents gain their information about how to parent? The initial findings are that Mom-focused behaviors in the social media data tend to concern health of the child; Dad-focused, about how to behave as a parent. Using network analysis (who follows whom), the gender of the parent was the key driver (e.g., Mom-focused accounts tend to follow other Mom-focused). Finally, it appears that Dads retweet more frequently than Moms. No survey has provided insights into Internet sources of parenting information in this way.
These are first looks from the new microscope. These scientists are asking the basic questions of how they compare to the traditional measurements. Without this basic understanding, we will make little progress at using this new data world to understand society. Kudos to the Social Science Social Media Collaborative!
(The collaborators include Leticia Bode, Caren Budak, Michelyne Chavez, Robert Churchill, Pamela Davis-Keane, Mei Fu, Chris Kirov, Jule Krüger, Jonathan Ladd, Linda Li, Colleen McClain, Zeina Mneimneh, Josh Pasek, Trivellore Raghunathan, Rebecca Ryan, Yiqing Ren, Stuart Soroka, Lisa Singh, Michael Traugott, Laila Wahedi, Yifang Wei, Xintong Zhao.)