Skip to main content

Address

ICC 650
Box 571014

37th & O St, N.W.
Washington, D.C. 20057

maps & directions
Contact

Phone: (202) 687.6400

Email: provost@georgetown.edu

 

Musings on Bots

In the 1930’s, in the midst of the depression, the US did not know how many unemployed persons existed in the country. There were stories, spotty counts, and speculations, but any policy formulation was basically uninformed about the true size of the problem.

In 1934, a revolutionary article emerged from the UK proved the value of measuring a small sample of a large human population to get unbiased estimates of the full population with known error tolerances. It required giving everyone in the population a known, nonzero chance of selection into the small sample and then measuring each of them in a consistent manner.

This led to rapid innovation in statistics of governments around the world, the growth of the empirical social sciences based on sample surveys, and the increasing influence of market research.

However, over time, participation in surveys declined, violating principles of the theory requiring full measurement of the sample. This indirectly led to greatly inflated costs, to track down sample persons who were very busy and to persuade the reluctant to reconsider. Hence, cheaper methods for measurement were sought – moving from face to face interviewing common in the mid-20th century, to mail questionnaires, to telephone surveys, to web-based surveys.

Unfortunately, current practices violate most of the original underlying theory.

Many surveys now use an “opt-in” method, whereby internet sites recruit people willing to agree to be respondents. Anyone can respond about their willingness to do so. Sometimes there are cash or other incentives to promote repeated response behavior.

But on the current internet, with the ease of building software agents, bots as they are called, there is a new problem. For those who run opt-in surveys, it has become a second job to detect whether a set of responses were even made by a real human. It is routine to check for illogical patterns of responses in an attempt to detect both bot responses and careless, thoughtless human respondents, doing the task just for the money.

I have a new favorite example of this, from a serious attempt to detect the extent of such behavior in data from an opt-in survey. A question was asked with a known percentage of US humans answering “yes”. The question asked whether the respondent possessed a license to pilot an Ohio Class Cruise Missile Submarine (“Are you licensed to operate a class SSGN submarine?”) About 12% of the opt-in respondents answered that they possessed such a license! The true percentage of the US adult population is near zero. There many other examples of this phenomena. Opt-in surveys display systematic biases.

There appears to be some evidence that bots answer many questions “yes” because this increases the likelihood of being identified a member of a rare respondent type, and thus eligible for more surveys in the future, maximizing the benefits offered by the opt-in survey company.

It is unfortunate that such surveys’ results are reported as if they were meaningful indicators of population characteristics, right alongside those using rigorous methods consistent with statistical theories.

What do we do as consumers of survey estimates? Be suspicious. If you find a statistics of interest, try to identify the methodology of the survey. Do this especially when the statistic is surprising based on your prior knowledge. If you can’t find documentation on the survey methods, it’s likely that quite questionable methods are being used. Don’t trust the results; tell others not to trust the results. If the methodology is an opt-in volunteer method, disregard it.

5 thoughts on “Musings on Bots

  1. Thank you, Bob, for this essential clarifying paper on the danger of (overused) sample surveys. You have provided us a service.

  2. Given confirmation bias, doesn’t the “surprising result” test have it’s own problems? So we let slide things that comport with what we already think but place an emphasis on questioning results that perhaps challenge our worldview.

    As Richard Feynman was alleged to have said “The first principle is that you must not fool yourself, and you are the easiest person to fool.

    • George, the “surprising result” test doesn’t have its own problems if the result is limted to a verifiably known result such as the number of persons licensed to operate a class SSGN submarine for a given population. This kind of test fits with the “Are you a robot?” kind of question.

Leave a Reply

Your email address will not be published. Required fields are marked *

Office of the ProvostBox 571014 650 ICC37th and O Streets, N.W., Washington D.C. 20057Phone: (202) 687.6400Fax: (202) 687.5103provost@georgetown.edu

Connect with us via: