Skip to main content


ICC 650
Box 571014

37th & O St, N.W.
Washington, D.C. 20057

maps & directions

Phone: (202) 687.6400



Statistics and the Real World

Posted on

The Founding Fathers of the United States were a group that valued the use of empirical data to guide decisions. In the constitution they chose to use a population census to re-align the House of Representatives every ten years to reflect the expected shifts and growth of population across the growing number of states. They increased the amount of information collected in that decennial census to inform the emerging new nation about the distribution of property ownership, occupational mixes, household conditions, etc.

Over the decades statistical information collected by Federal statistical agencies has formed the core information infrastructure of the country. It is the very cornerstone of the informed citizenry. It provides the information on how well we’re doing as a country. It informs us about how well the elected officials are doing in their leadership.

This infrastructure is valuable to the extent that it is objective, not affected by the political philosophy of the current elected officials. It’s valuable to the extent that it is an accurate portrayal of reality, using state of the art methods to collect data. It’s useful to the extent that it contains consistent indicators, comparable over time (to detect change in key phenomena). It’s helpful to the extent that it reflects key concerns of the society.

This infrastructure is a fragile one; the agencies that provide the objective information often must report that things are not going as well as those elected to office would hope they are. Over the years their budgets have suffered and some key statistical indicators have been dropped. In that sense, we know less than we did earlier.

These concerns arose recently with the current set of violent deaths involving police and African-American citizens. The last weeks have seen both citizen deaths and police deaths. FBI Director Comey, in a recent testimony to a Senate committee said, “We need more and better data related to officer-involved shootings and altercations with the citizens we serve, attacks against law enforcement officers, and criminal activity of all kinds.”

In the early 1970’s there was an effort to supplement police-reported crimes with a statistical series that was based on the notion that police-reported crimes were not accurate counts of events that violate laws. They were based on a reporting system internal to departments; they required the processing of descriptions of events that were likely to be judged as criminal violations by the justice system. There were many reasons that events were “unfounded,” deemed not reportable. To supplement these official reports, victimization survey methods were developed, that asked individual persons whether they had experienced what they thought was a criminate victimization, whether or not it was reported to the police.

But because of the relatively small size of the victimization surveys, relatively rare events are not well estimated. Again, Director Comey: “We in the FBI track and publish the number of “justifiable homicides” by police officers. But such reporting by police departments across the country is not mandatory, and perhaps lacks sufficient incentive, so not all departments participate. The result is that currently we cannot fully track incidents involving use of force by police. And while the ‘Law Enforcement Officers Killed and Assaulted’ report tracks the number of officers killed in the line of duty, we do not have a firm grasp on the numbers of officers assaulted in the line of duty. We cannot address concerns about officer-involved shootings if we do not know the circumstances surrounding such incidents.”

There are ongoing efforts to improve the consistency and content of police-reported criminal events. A new system of bottom-up reporting is in place but the rate of participation of local jurisdictions is lower than desirable. This prompted researchers to collect their own data from jurisdictions. The data come from only twelve jurisdictions among the thousands of jurisdictions in the country. They are jurisdictions that voluntarily cooperated. They do not represent in any statistically-meaningful way the full population.

In the absence of data that are strongly representative of the full population, it’s common that any data available will be used to draw conclusions about what is happening in our country. This is not always desirable. In this case, twelve jurisdictions form interesting case studies but solid conclusions about national phenomena need richer data. While the researchers should be credited with assembling such data, the nation really deserves consistent and comprehensive attention to assembling such statistical information.

Time has shown that this is best done with a Federal statistical agency that has strong devotion to data quality and complete objectivity.

Joint Appointment Initiative at Georgetown

Posted on

Every university is facing the challenge of how to increase the support of traditional disciplines, as they evolve, at the same time it invests in cross-disciplinary initiatives that have promise. Most of the existing reward systems of universities favor within-unit appointments, and hence presidents, provosts, and other leaders have been mounting special efforts at cross-unit appointments.

Last year, the three Executive Vice-Presidents (EVP’s), Ed Healton of the Medical Center, Bill Treanor of the Law Center, and I collaborated on a call for faculty proposals for joint appointments. This was a partnership to strengthen Georgetown by supporting cross-cutting scholarly and teaching activities, while at the same time seeking to attract the very best faculty to Georgetown.

In evaluating proposals we favored joint appointments between campuses/schools over joint appointments between units within the same school. Similarly, joint appointments between units with well-defined teaching/research synergies and those with jointly offered courses benefiting students in both units were favored over others. Finally, joint appointments entailing association with an existing interdisciplinary effort at Georgetown were preferred over others.

Last year, we redefined the joint appointment structure for the main campus to protect candidates from shouldering more than 100% jobs by their dual citizenship. We also specified a promotion review process that protected joint appointment holders (i.e., a positive outcome in one unit and a negative outcome in the other leads to a positive outcome in the first unit and a dropping of the joint appointment).

The joint appointment initiative succeeded in generating proposals from all three campuses and many different units at the university.

After input from all the schools, the following joint appointment searches have been approved for search in academic year 2016-2017:

Cross-campus Joint Searches

1. Department of Biology, Georgetown College and Department of Pathology, GU Medical Center: Non-Embryonic Stem Cell Biology

The biology of stem cells is an exciting research frontier that offers new insights and opportunities for understanding many basic processes, including aging, cancer and embryonic development. Indeed, the applications of stem cell biology to medicine are multifold and include uses for the prognosis, diagnosis and treatment of cancer, the treatment of age-related diseases, the treatment of genetically inherited diseases, and the regeneration of diseased or damaged tissues. Having our students exposed to this field is important for their preparation in several fields.

2. Department of Computer Science, Georgetown College and Georgetown Law: Information Privacy

Matters concerning data, privacy, and policy are central concerns that need systematic attention. Commercial collection of massive data sets, together with governments’ desire to obtain and use these data, raise serious concerns about both the privacy of the people represented in the data set as well as how these data may nonetheless be put to public purpose. The combination of legal scholars and computer scientists would be a strong one in this realm. Those on the policy side could specify how they wanted to use or share the data; the computer scientists could devise systems to allow it to happen, but provide strong guarantees that the data was not being used in other ways.

3. Graduate School of Arts and Sciences, College Department of Psychology, and the Medical Center Department of Neuroscience: Cognitive Aging

The world is facing a demographic shift sometimes called the “gray tsunami.” Due to increasing longevity, declining fertility rates, and population-control policies, the percent of the population over the age of 65 is increasing dramatically. Cognitive Aging is an umbrella term for the subfield within Aging that focuses on the individual’s mental factors in the context of aging. These include affective and cognitive processes, their brain bases, genetic and environmental influences, and effects on outcomes for adaptive functioning.

Cross-school Joint Searches

4. McCourt School of Public Policy and Department of Computer Science, Georgetown College: Policy Analytics

To enable sound data-based policymaking, society needs leaders trained in policymaking as well as data analysis on high-dimensional data. This is a set of skills that have historically appeared in separate programs – public policy and computer science. Recognizing the need for these interdisciplinary leaders, however, academic programs are beginning to appear that address the training needs for individuals who have a passion for this area. These programs include both aspects of policy analysis and data science. This appointment would be a key contributor to the “big data” initiatives at the McCourt School and strengthen the interdisciplinary impact of the computer science department.

5. McDonough School of Business and School of Foreign Service: International Business

To build on the recently approved joint master’s degree in International Business and Policy, this joint appointment would bring in a faculty member with a strong research agenda in the economic, strategic and political drivers of success for private sector and public sector organizations. This appointment would be a key catalyst to further the university goals of enhancing its global impact.

6. McDonough School of Business and Department of Computer Science, Georgetown College: Business Analytics

A senior hire in the area of Business Analytics would bring a strong research agenda in machine learning, Operations Research and Management, Business Information Systems, Business Analytics, Statistics, Econometrics, or other business related field. Possible areas of interest could include: utilizing large-scale data with data mining and machine learning to optimize business operations, algorithmic design for mathematical economics, mechanism design, optimization, game theory, or risk mitigation and analysis in complex networked systems such as business supply chains.
For all of these appointments, the search committees will seek to attract scholars with strong international reputations, who would add significantly to the stature of the Georgetown faculty, and who share a deep devotion to a student-centered research university.

On behalf of my two EVP colleagues, I thank all the faculty who reached across units to craft proposals for joint appointments. I congratulate the winning proposals and look forward to the process of identifying world-class faculty to add to the Georgetown community.

What’s Happening?

Posted on

A key attribute of a democracy is the belief that information flows to the citizenry must be ubiquitous, unfiltered, and continuous. This requirement must be executed by institutions that have a devotion to that enterprise. In the early days of the democracy, newspapers played one role in “keeping government officials honest.” Later, the development of quasi-independent government agencies totally devoted to providing objective, accurate information about the economy and the larger society spurred the information feedback loop.

We’re entering a new era, in my belief, in the nature of the feedback loop. New digital sources of data are now providing information on what’s happening. Sometimes it goes under the moniker of “what’s trending,” based either on Twitter traffic on given hashtags or on YouTube viewing counts. Some news commentators appear to treat the information in the same spirit that they treat a report on UN-coordinated cease-fire talks among several countries or the daily movement of the stock market.

If one reviews the history of survey research, there are three distinct streams of development – the use of surveys in journalism, the use of surveys in marketing, and the use of surveys for social scientific research. The use of surveys in journalism was a logical evolution of so-called “man-in-the-street” interviews (sorry, that’s what they called it at the time). These were viewed as useful supplements to a journalist’s investigation of some social or economic phenomenon of news interest. Such articles often began with the discussion of the event/status, then interweaved quotations from “real people” about how the phenomenon affected their lives. Such a literary form attempted to increase the “human interest” in the news. Of course, the journalist had complete control over choice of what “man-in-the-street” quotations to use, and thereby, what conclusions to prime among readers.

If one man-in-the-street interview was good, many were even better. Naïve assembly of many interviews arose, appearing to be more a “survey” of opinion than a single case-study. The news poll was born.

In a way focusing on trending tweets and popular YouTube videos is a modern version of the “man-in-the-street” interview. Some are presented as evidence of widely-shared attitudes in the public. They are compared to other trends to prompt conclusion about whether one issue is more important than another issue. They are often used as evidence that a specific event is important to the society.

Others are just silly and humorous diversions from the dreadful news of wars, abject poverty, murder, mayhem, and sadness.

What’s different about the “what’s trending” development is that it often doesn’t start with any hard news initiated by a journalist. The “man-in-the-street” is not an enriching feature of a hard news story; it is the story. The start and end of the story are popular hashtags and uploaded videos.

Because tracking hashtags is easier than doing man-in-the-street interviews, I suspect we will see more of this way of answering the question of “what’s happening.” I fear, however, that the citation of numbers (“100,000’s of uses of the hashtag in the last 12 hours,” “750,000 views of the video”) might be misinterpreted as having a value beyond what it deserves.

Every once in awhile, I’d like a commentator to tell us that, just because a few hundred thousand people are visibly behaving in some way, it may not tell us much of anything about all 323 million US residents or all 7 billion world residents. It’s quite frightening to consider the possibility that such information might be the only source for an informed citizenry to assess their welfare.

Dimensions of Knowledge

Posted on

All of us, through life, ponder about the balance between learning a little about a lot versus focusing our attention on learning more and more about a particular area.

Father Adolfo Nicholás, in a 2010 address, described the fear he has that modern life is overloaded with stimuli demanding our immediate attention. This overload breeds short attention spans. A short attention span produces a pervasive superficiality. It provides a breeding ground for fanaticism and ideologues. To counter this, he argued for “depth,” concentrated thinking and time for reflection as a way to release “creative imagination” to reassemble facts and observations into new understandings.

IBM has promoted the idea that their enterprise values “T-shape” people. By that it appears they mean experts in a subfield of knowledge who also are conversant in the basic concepts of many other fields. That basic knowledge is needed to work effectively in interdisciplinary teams. The base of the “T” is meant to graphically describe the extent of deep knowledge in the subfield, and the top of the “T” describes the many other fields in which basic knowledge is held.

In a completely different domain, some scholars speculate that what we interpret as wisdom is actually a synthesis of deep knowledge in an area and broad experience in many subfields. We sometimes encounter this in a mentor, who sees our situation from a vantage point different than ours, powered by their own years of experience. They diagnose a way forward for us, the mentee, that we could not see ourselves. We see it in an elder, who amazes us by putting together two seemingly disparate observations into a novel perspective. They have identified connections among facts and concepts to produce new insights.

This ability to synthesize information seems different from the notion of “depth” or the notion of “breadth” of knowledge. It is compatible with a metaphor used to describe visionaries – “height,” the ability to see solutions at a level that is superior to others. Visionaries “see above the crowd” because of their ability to synthesize lots of facts and combine them in new ways. The metaphor of height is popular because it communicates enhanced vision.

Thus, for real impact, the ability to synthesize diverse knowledge must accompany deep command of one field and broad competency in many. This synthesis provides the height of vision we so admire, leading to the creation of novel solutions based on the totality of knowledge.

How do universities achieve these goals? When we’re at our best, we present students with experiences that draw on deep and broad knowledge, but apply it to real world problems solvable only with new combinations of that knowledge. This is often a difficult task for students accustomed to highly structured teaching/learning protocols. It also challenges instructors to manage great uncertainties as students try to navigate problems that may not have a solution or may have multiple solutions. The move to experience-based learning often creates such experiences. Interdisciplinary problem-based studies also seem to offer fertile ground for rehearsing the synthesis step.

So, in essence, this is the argument for producing graduates who exhibit depth, breadth, and height. Instead of “T” people we need people that might be called “+” people.

Beneficence, Maleficence, and Instrumentalization in Data Ethics

Posted on

Two central notions in bioethics are “beneficence,” the provision of benefits to the patient, and the absence of “maleficence,” the “do no harm” principle. Certainly in constructing a framework for data ethics, these must also be central goals.

There doesn’t appear to be a perfect mapping of these ideas to statistical practice aimed at describing the attributes of large populations. IRB’s have largely adopted the practice that informed consent requires the participant’s understanding of both costs and benefits of their decision. For statistical information that benefits serve the common good not the individual, see last week’s blog, A Little More on Data Ethics.

Bioethics focuses on interventions (or lack thereof) on individuals. For example, medical procedures should be designed to benefit the individual.

Statistics for the common good seek beneficence for the whole population. Measuring the unemployment rate through self-reports of a sample of persons seeks to inform the citizenry of an indicator of the society’s well-being. Participants in employment surveys are promised that no action on themselves will be taken based on their responses. Statistical uses of data by definition are uninterested in the individual.

Further, not all statistical aggregations of data avoid maleficence for all persons. If the unemployment rate leads policy makers to introduce increased taxes to support the unemployed, then the relative socioeconomic status of employed persons is diminished. They, as a group, are harmed. In general, when data produce statistical information that leads to a reallocation of resources in a society, group harm results.

Thus, when individual data are used for statistical purposes, the interplay of individual versus group beneficence and maleficence is different than for the biomedical case.

When we move our attention to unobtrusive use of personal data, another concept seems useful for data ethics – instrumentalization. This is the use of other humans merely as a means to some other goal. One might consider this notion as relevant to informed consent. If researchers do not reveal their goals to the participant, the individual is not fully able to weigh costs and benefits of participation. Consent is then not “informed.” For example, users are routinely informed that their mobile phone positional data are used to provide traffic estimates. I have the right to refuse participation if I feel that I am merely being used as an instrument for a purpose that does not benefit me. Informed consent, in that sense, is the protection against instrumentalization in the researcher-participant relationship.

Such issues seem quite pertinent when the uses of the data are long-lasting and not visible to the participant. For example, I suspect that few users understand how their past internet behavior affects their current browsing experiences. The use of predictive models to guide the browser actions, based on user profile data, is often not easily discerned by a user. I suspect most users do not examine the displayed ads on a page as a reflection of their internet use, but as a mass-marketed effort for business development. Therefore, the benefit to the individual is increased by the intervention to the extent that the user finds the ads informative and useful. From one perspective, that is exactly the intent of the browser owner. To the extent that users click on ads, the browser’s statistical model has achieved the goal. One can easily make the argument that instrumentalization doesn’t seem to apply here (to the extent that the user read the privacy policy). If, on the other hand, the statistical model is not well formulated, and ads presented are annoyances, does instrumentalization arise? (Such an occurrence prompts the need for the user to opt-out of the service under those circumstances.)

As we think about how to build a sustainable environment for common good uses of large digital data resources, we must address our ethical obligations to those providing the data. I suspect we’ll be talking about beneficence, maleficence, and instrumentalization in the future.

A Little More on Data Ethics

Posted on

Much of the perspective on ethical treatment of persons in research flowed from the Commission for the Protection of Human Subjects of Biomedical and Behavioral Research (1974-78). The field of bioethics parallels these activities and has also influenced how many think of obligations social scientists have toward persons involved in their research. There are several key principles of bioethics:

                  1. Autonomy, the freedom to make one’s own decisions
                  2. Beneficence, good personal consequences of research participation
                  3. Absence of maleficence, avoidance of harm
                  4. Human dignity, worthiness of respect

This is a post with some thoughts about applying notions of autonomy to various types of data as part of the Internet world.

Central to modern research practice is the right of participants to determine their own participation in activities that affect their lives. This thought underlies the notion of informed consent by the “human subject.” Further, in self-report studies (e.g., surveys), the participant is free to refuse to answer each question posed. This gives control to the respondent what information is indeed collected by the researcher. Notions of autonomy have guided IRB regulations. Research approvals rest on the disclosure of information to the participant in a way that they can competently and freely weigh the costs and benefits of participation. Voluntary, but informed, participation is the goal.

It might be useful to turn this around — change the perspective from solely the human subject, whose information is in question, to one that focuses both on the participant and collector of data. It seems that under some circumstances, each actor in the transaction may have ethical obligations.

What are the ethical obligations of each of these actors in their own autonomy? I suspect the nature of the ethical framework might depend on the uses of the data provided. The most relevant dimension might be the answer to the question, “Who benefits?” and who knows about the benefits?

In many statistical surveys, the individual data provided by one participant merely form one among thousands of observations. The product of the data are statistics describing large numbers of people in the population (e.g., median income of age and gender groups; the unemployment rate; the rate of educational attainment; prevalence of health conditions).

For this first data use type, constructing statistical information that informs the full society, what are the obligations of the participant? This information is key for the informed citizenry to evaluate the state of the nation. The benefits are not personal but societal. Each of us has some obligation to the common good. In one sense, providing personal information for common good information is like voting. My individual vote has little benefit to me, but it fulfills my personal responsibility to the polity in a democracy. As a citizen I have the responsibility to join with my other citizens to select officials. In this class of statistical information the beneficiary is the full community.

A second use class moves one step further to impact on the individual participant. What is the participant’s responsibility when the data are used to build a model that predicts future behavior (e.g., the risk of mortality in actuarial estimates; the likelihood of clicking on a web page display advertisement). In these cases, like the first use class, my personal data are just one observation, which when combined with those of many others permits the estimation of statistical likelihoods. But the next step is an intervention. The estimates are used to make decisions affecting individuals (I receive or do not receive a contract for life insurance; I see an advertisement from a retailer I earlier visited).

With this type, the beneficiaries are certainly the commercial entities involved, to the extent the models are predictive. But the individual can also benefit, if the personal intervention is viewed as a good thing. Losing the opportunity for life insurance probably won’t be viewed positively (gaining the opportunity, would be). Being alerted to new goods and services of interest to me, without initiating the search for them myself, might be something of value. But in some cases, there are also common good benefits (e.g., a healthy national insurance framework; efficient distribution of products). But it seems clear, that ethical obligations toward support of the common good don’t apply here as strongly as in the first type.

For this type, what obligation does the data collector have to that person? Prior to obtaining their data they have some obligation to inform them what they will do with it. This is required in order that the individuals can make, with some autonomy, a decision weighing the costs and benefits of providing their personal data.

The third notable type of data use is individual information used for person-based intervention, often by combining data sources. Some of these are seemingly direct benefits (e.g., mashing my location data via mobile phone tracking with display of nearby restaurants, or traffic accidents). Here, personal benefits to the user are very salient. However, the crowd-sourcing of data can sometimes yield common good outcomes. Do ethical obligations to the common good prompt considering the value to the society of such information dispersion to make the full society more livable? Who decides that a world with Yelp! is better than a world without it?

Data ethics should be part of the framework of evaluating both the behavior of individuals that might provide data and those who seek the data. It seems clear, however, that different uses will imply different guidance.

Depth of Thinking in the Big Data World

Posted on

When I was about nineteen years old and taking a first course in empirical social science, I was given a computerized set of data and documentation describing the data. The class was told to pose any question we found relevant to the data, construct an analysis, and describe in words the results.

The liberty to construct one’s own questions was alluring. My naiveté led me to the typical failure of any first analyst – to examine all possible combinations of attributes to predict the question of interest. I still remember the pages and pages of analyses I produced. I let the software do my thinking. I had skipped an important step in inquiry – studying the results of past work relevant to my question. I hadn’t gone deep enough.

We all are seeing the results of similar superficiality in data analysis now. Extensive computation power coupled with very large data sets permit one quite easily to build predictive models of anything described by the data. Some of the models predicting behaviors, like choosing to click on a web-page advertisement, can contain hundred of thousands of attributes (prior clicks, web-pages visited, etc.) measured on millions of people.

Sometimes the models are tested using some gold standard set of indicators. Can the predictive models match the benchmark indicator? Do they seem to track over-time the real phenomenon they seek to predict?

There are several failures (e.g., Google Flu Trends predicting the annual course of the influenza). Testing predictive models against benchmarks is effective for the time the match is observed. However, they can fail outside those benchmarks.

Some predictive models repeat the mistake of my nineteen year old self. I was merely seeking to predict some measured outcome. I had no idea of the processes underlying the phenomenon I sought to explain. I had no theory. I had no understanding of the mechanisms that produce the outcome. Models built under such circumstances can work … until they don’t work.

Social scientists make large distinctions between the necessary ingredients for prediction and the necessary ingredients for causal inference. What actually produces the phenomenon that interests me? Without such understanding, predictive models usually suffer from a sin of superficiality. They are thoughtlessly built and are tested only on a limited set of circumstances. When the circumstances change, the model breaks down.

In a way, prediction without understanding key features of the causal mechanisms is like candy. There is an immediate gratification, but it can hurt you if you consume too much.

The lasting value of data-based models requires depth — depth of thinking about the outcome at hand; depth of interacting and observing the process. Usually this is qualitative, immersive activity. Sometimes the model builder can be assisted by those deeply conversant with the process, but it helps to have modelers who themselves know the process well.

Depth takes time. Depth requires different expertise than that required for data manipulation and algorithmic or statistical analyses. But depth has the payoff of model specification that is more robust to changing circumstances.

The lack of depth in the world of “big data” is more threatening, I fear, than in the world of data designed by the researcher. We can easily fall into a habit of “harvesting” data without deeply thinking about the outcome we are trying to predict. Since there is so much data, surely, we assert, we can easily build a strongly predictive model that will serve us well for a long time. The problem with “hitching our entire wagon” to existing data alone, however, is the case where the key mechanisms are not measured in the data. No amount of data will save us from the fate of missing the right attributes to measure (e.g., missing measurements of attitudes and other internal states that have not yet produced observable behaviors).

Lots of data missing key measurements and thoughtless application of statistical techniques are fatal temptations in this new world of big data. There is no nice shortcut for deep thinking about causal mechanisms.

Privacy and Data Ethics

Posted on

Privacy is one of the central issues facing academic fields that use personal data to pursue their research.

“Privacy” is a term that unfortunately has a large set of alternative definitions. In the social sciences, the right of privacy empowers participants chosen for research data collection to refuse to reveal personal information. All of the features of informed consent, promulgated by the use of institutional research boards, are aimed at protecting such privacy rights. In the nomenclature of those fields, a person can voluntarily give up this right to privacy to participate in a research inquiry and reveal to the researcher “private” information about themselves. In return for this volunteer act, the researcher pledges to keep “confidential” the information so proffered. Thus, “privacy” and “confidentiality” are two concepts that are simultaneously used. By “confidentiality” the researcher is obliged to use the information provided by the research participant only for the research uses described to the participant. The researcher keeps the data in a protected state, under controls that prevent any other use of the information.

“Privacy” in other domains takes on a more complicated set of meanings. Take for example the unobtrusive monitoring of internet usage patterns. In this domain, there is often an “opt-in” or “opt-out” opportunity for the user (a decision which may or may not be thoughtfully taken). But the collection of data often continues indefinitely, without reminders to the user of the collection, and without any notice of its use. It seems that “privacy” in these domains includes concerns about a) whether the consent to collect the personal data is sufficiently informed, b) whether the collection of data is sufficiently obtrusive to remind the user of its collection, and c) whether the uses of the data are made manifest.

In these domains, privacy concerns resemble surveillance concerns – that an unknown other is collecting personal information without alerting the person, with the uses of the data being completely out of the control of the person (and perhaps the collector). Surveillance evokes fears of uses of the data that may harm the person. Indeed, these domains motivate quite quickly the broadest sense of privacy, as “the right to be left alone.”

Of course, the issues of privacy are soon going to become much more complex – the internet of things is promising within a few years that there will be over 55 billion devices connected to one another indirectly through the internet. They will be emitting data continuously about their own behaviors, some of which are monitoring of a person’s environment (e.g., is anyone in the house; do we need to replace the water filter; have we run out of peanut butter?) We will acquire such devices because they will reduce our day-to-day burdens. But the devices will also collectively “know” much more about individuals than is currently the case. The data they produce may or may not be totally revealed to the users. Will we care about this?

All these notions of privacy take the perspective of the human described by the data. Another perspective is that of the user of the data. This is an area that seems underdeveloped or, at the least, too-little-discussed. It needs attention because there can be uses of personal data that greatly benefit the common good. When common good outcomes are sought by analyzing person-level data, we need a well-constructed set of ethical principles – a data ethics, if you will. These would resemble those of many helping professions – medicine, law, etc. Such codes appear to be key foundations of the trust maintained by those professions.

Such codes of ethics do exist in various statistical subfields, but they are too infrequently highlighted in discussions of “big data.” The codes begin, like those in medicine, by pledging to do no individual harm to persons whose data records are part of a statistical analysis. (Indeed, statistical uses of data place no value on an individual data record because all of the products are based on aggregations of records.) They further state that fulfilling pledges of confidentiality of individual records is a foundation of the human measurement enterprise.

It is interesting to speculate about how the development of a data ethics field might help the world navigate between the promise of using new data resources for common good purposes, on one hand, and respect for individual rights of privacy, on the other.

Progress on the Racial Justice Initiatives

Posted on

On February 4, 2016, President Degioia articulated a clear urgency for Georgetown to address the continuing legacies of slavery and racial injustice. He launched a new initiative “…because our social and political culture has not been remedied; and, in fact, from a set of recent events, it has deteriorated; because there is a holy impatience among the African-American community that delay is just another way of saying NO; because the moral imperative for complete social justice continues to summon us not to discussion but to action and that summons will not go away–we ignore social morality at our peril.

There were several key features of the initiative: 1) the construction of a new academic unit with faculty devoted to the teaching of an African-American Studies undergraduate major, minor, and elective courses, 2) the building of a Research Center on Racial Justice, 3) the establishment of PhD fellowships and postdoctoral fellowships to help staff the Center, 4) the recruitment of new faculty to staff the academic unit and the research unit, and 5) the establishment of a senior administrative officer to help faculty recruitment with an aim to enhancing the representation of faculty of color at Georgetown.

A Working Group of faculty, staff, and students has been meeting for several weeks.1 I thought it might be a good time to let everyone know how the work is proceeding.

The group first discussed alternative plans for the academic unit. The group decided it would best if it were a department in the College and drafted a mission statement for the unit. It would have both 100% tenure-line faculty, joint appointments with units from other campuses and schools, and, if necessary, non-tenure-line faculty. The department would have an annual budget, by-laws governing its work, and a reporting line to the Dean of the College. Joint appointments between African-American Studies and other units would have formal memoranda of understanding with other units, which specified the rights and responsibilities of the jointly-appointed faculty member and the two units involved. A chair of the department would be recommended by the core faculty and forwarded to the president, as with all department chairs, for final appointment. Current Georgetown faculty, who were previously active in the African-American Studies program, have been polled regarding their anticipated role in the new department.

The working group then tackled the recruitment of the faculty for the initiative. It decided to launch a search for four faculty members of open rank for the African-American Studies Department. The group drafted an advertisement for all four positions simultaneously, in order to underscore the institutional commitment to the initiative. The searches mounted in academic year 2016-2017 will recruit core faculty members to the department, those whose citizenship is primarily focused on the department. The searches planned for 2017-2018 will focus more fully on the leadership of the Research Center.

The Graduate School is re-establishing the Healy fellowships for graduate student support, which we hope might also benefit the Research Center.

The Working Group is now beginning to discuss the desirable attributes of the Research Center. President Degioia specified that the Center will be a university-wide entity, and one of the topics of discussion is how best to ensure the sustainable health of such a unit at Georgetown. It has already realized that there are many faculty members whose scholarship might be relevant to the Center. Hence, it is considering how best to get input from those faculty members on the future outlines of the Center.

The Working Group is on schedule with its task as assigned by the president. We will meet over the summer to extend our work. With each meeting, the group reminds itself how important a task they have been given. We have much more to do, but we are making progress on this important initiative for Georgetown.


1 The members of the working group are Reena Aggarwal, Paul Butler, Soyica Colbert, Robert Groves, Edward Healton, Maurice Jackson, Rosemary Kilkenny, Charles King, Gwen Mikell, Angelyn Mitchell, Jasmin Ouseph, Robert Patterson, Precious Stephens-Ihedigbo, William Treanor, Edilma Yearwood.

Thanking the Faculty Who Shape Minds and Spirits

Posted on

At this point in my life I have heard and read many commencement speeches. They tend to have common themes: this day is not an end but a beginning; your generation has the opportunity to succeed on problems that my generation has failed to solve; you are our hope of building a better world, etc.

These are all good thoughts and I must force myself to remember that most graduates don’t hear them as often as I do. They might seem much fresher to them than to me.

Most speakers also note that the students could not have succeeded without the support of their parents and extended families. In fact, some note that, for each graduate, this day is a celebration of a whole network’s success. This note is absolutely true, and often leads to an opportunity for the graduates to warmly thank their parents and family members. This is good.

It’s noticeable to me, however, over the years, that few speakers make similar remarks about the faculty. By implication, but I believe, completely unintentionally, one can get the impression that with the families and students’ work, sufficient resources exist to produce the graduates’ success.

From inside the academy, however, that picture of higher education becomes quite inadequate.

The faculty are the diagnosticians of impediments to clear thinking. We see them work with individual students to discern what cognitive blocks prevent understanding material in a course. We see them return draft papers filled with constructive comments on ways for more effective verbal expression. We see them push students to think more deeply about readings, to extract successively more sophisticated understanding of text, to discern the layered meanings of works. The faculty nurture the day-by-day intellectual growth of the students.

The faculty breed the facility of comprehending alternative viewpoints on the same “facts.” We see them lead discussions of students with diverse backgrounds. They use the inherently different perspectives among classmates to help the students teach each other. While the faculty are focused on conveying facts and current knowledge of a field, they are also teaching how to acquire and assess new perspectives on a problem. Exploiting the diversity within the students in the class, they are simultaneously teaching all how to foster effective dialogue in a complicated, diverse world. When they succeed, the students are more effective collaborators in joint work in their work lives.

The faculty are guides to the skills of self-teaching. By providing introductions to students to original inquiry (through group projects, written papers, etc.), they pass on skills of immersing oneself in a new domain of knowledge. When they succeed,d they have helped shape a mind that is resilient to the rapidly changing terrain of knowledge.

They are the counselors of synthesis. They, in quiet meetings in their offices, address the puzzle each student faces of how to integrate their new knowledge into a life’s work compatible with their own spirit. They help the student see alternative ways forward and give trusted assessments of the student’s area of strengths. They sometimes energize an alumni network that lands the start of a work career.

So, while the commencement appropriately focuses on the graduates and their family support, it’s important to remember the role of faculty in their achieving that success. They are the lifeblood feeding successive generations of women and men for others.

Office of the ProvostBox 571014 650 ICC37th and O Streets, N.W., Washington D.C. 20057Phone: (202) 687.6400Fax: (202)

Connect with us via: