Six Dimensions of Data Relevant to the Social Sciences

There has been a dominant paradigm in much of empirical social science for most of my life — conclusions based on statistical analyses of samples from large populations fully covered by some sort of listing of their members. This paradigm has supplied national statistical monitoring (e.g., the monthly unemployment rate from the Current Population Survey), causal inference about electoral behavior (e.g., the National Election Studies), insights into the mechanisms leading to poverty (e.g., the Panel Survey of Income Dynamics), and a host of other important knowledge sets.

A parallel branch of studies exist, but these studies are focused on rich data on local areas (e.g., the Framingham Heart Study), longitudinal studies of special populations (the Bennington College Study), and a multitude of qualitative ethnographic studies of urban communities.

What are the attributes of social science data that are valued and how do survey data match up to them?

1. Surveys tend to be weak on spatial granularity.
While the sample survey serves the needs of national description, it often fails to be useful to policy makers at the local level. For example, while the National Crime Victimization Survey provides consistent estimates of crimes experienced by people in the US, it is not large enough to provide estimates for most individual urban areas, let alone neighborhoods. But actions and policies on crime are often implemented at the local level.

2. Surveys are weak in temporal granularity.
The sample survey is slow, consisting of complicated design, collection, and processing steps. Because of these steps, most of the information we have on the US population is based on measures repeated annually, with a few, monthly. Yet, as our world filled with social media input has taught us, events affecting almost every social and economic phenomena are happening at breakneck speed. Many policy makers thirst for information about the “now,” not the “then.”

3. Surveys are weak on subpopulation granularity.
As developed countries become more diverse through immigration, it’s more important for them to have up-to-date information on individual groups. Sometimes the group forms very small sets of the full population. National sample surveys have trouble producing strong estimates for small groups.

4. Surveys are strong on measurement capacity.
Surveys captured the attention of empirical social scientists because they permit the researchers to design the data that they analyze. In doing so, the researcher often induces into the measurement multiple indicators of different attributes of interest on the persons studied. These multiple measures feed the multivariate statistical models of the scientists as they seek to gain insights into the phenomena of interest.

5. Surveys are weak on measuring networks.
Finally, sample surveys have often been built on the selection of individual units (i.e., persons, families, and organizations). Measures are taken on sampled individuals. But increasingly social scientists have become interested in how people are influenced by those around them — their families, their neighbors, people in their church/synagogue/mosque, and co-workers. Part of explaining human behavior is informed by knowing what “connected others” are doing.

6. Surveys are strong on inferential frameworks.
Finally, samples of large populations selected with known probabilities offer strong conclusions about the full population. This statistical property has led to confidence in the findings of surveys to describe basic attributes of national populations, permitting confident decisions based on them.

In sum, most social science is built on comparisons over space, time, subpopulation, measurements, and networks. Confidence in conclusions is enhanced with proper statistical samples of the population being studied. The above arguments show surveys relatively weak on four of the six dimensions.

As the apparent speed of modern life increases, timely information seems more valuable. As the diversity of societies increase, local and subpopulation statistics seem more valuable. As we gain more insight into social networks, increasingly we seek to measure the effects of the context of humans.

The data world that we need to construct to understand human behavior more fully needs to be sensitive to these six needs. The “big data” world offers near real time data from some sources, coverage of wide spaces and populations (but without any documentation about what’s being missed), data sets much less multivariate than survey data, data that is not always geospatially identified, and data that often have network structures to them. Therefore, the big data world tends to be weak on three of the six dimensions.

Our job is to put the old data world of surveys together with the new data world of “big data.”

5 thoughts on “Six Dimensions of Data Relevant to the Social Sciences”

Sidney Thompson says:

November 13, 2014 at 2:46 pm

Thanks for the brief synopsis. Very interesting, and might prove useful to a project I’m considering launching in the near future.

Paul Roepe says:

November 13, 2014 at 8:51 am

Excellent topic once again; the new age of “big data” does indeed present terrific opportunities for improving empirical social science. It also presents amazing opportunities for epidemiology, environmental science, global health, genomics, health care policy, and other fields in which Georgetown either already excels or is building impressive critical mass. But as is well known there are bottlenecks in realizing the potential of big data. Most agree that the bottlenecks do not come from generating data sets (we are already very good at that) but from a slower than optimal pace in improving (and inventing) computational approaches for analyzing diverse data sets. Increasing that pace will rely on interdisciplinary approaches that bring together computer scientists, natural scientists, and social scientists. This is one opportunity where our smaller size is a distinct asset; we can in theory leverage more flexibility and greater interdisciplinarity than most Universities in overcoming these bottlenecks.

Best
Paul

- Bill licamele says:
  
  November 13, 2014 at 10:56 am
  
  Great points about opportunity for GU with its size and expertise. Now let’s break down those silos!
  
bill licamele says:

November 12, 2014 at 9:13 pm

WOW that is very astute and will make me think twice about doing any surveys!.. Great comments and they show what great problems there are with …. surveys. I guess it says again all that counts cannot be counted and all that is counted doesn’t necessarily count. Right ?

Peter C. Pfeiffer says:

November 12, 2014 at 9:13 pm

As a humanities person who has no fear of numbers, I’d say: Go for it!

Address

Contact

Six Dimensions of Data Relevant to the Social Sciences

5 thoughts on “Six Dimensions of Data Relevant to the Social Sciences”

Leave a Reply Cancel reply