Last week, I was lucky enough to go to Seattle to present a paper at the International Communications Association annual conference. This was my first ICA, and I enjoyed the experience greatly. I featured on a wonderful panel entitled Really Useful Analytics and the Good Life with my colleague Nick Couldry, as well as Helen Kennedy and Giles Moss from Leeds University, and Caroline Basset from Sussex University.
The slides from my presentation are available here, but in this blog entry, I just wanted to outline the core shape of my argument, which will hopefully provide a framework for future work.
The first thing to say is that this paper was rather different to the work I have previously done in this area. With Ben’Loughlin, I have written a lot about what we have termed semantic polling (Anstead and O’Loughlin, Forthcoming). In these pieces, we worked to both understand and theorize about new research techniques that harvest vast amounts of data from social media (normally Twitter) to understand how the public are reacting to specific events or politicians. In those earlier papers, Ben and I tried to think about different understandings of public opinion – outside the dominant opinion polling paradigm established in the 1930s – and thought about how they problematized the arguments related to semantic polling.
The datasets used by semantic pollsters are certainly big, maybe running into many millions of tweets. However, for the paper at ICA, I wanted to draw a distinction between big data (defined simply through the size of the dataset or number of data points being worked with) and Big Data. The latter is distinct because it employs a fundamentally different epistemological framework to traditional social science research methods. This argument is clearly put in a couple of places. Most famously (or infamously, depending on perspective) is Chris Anderson’s claim that theory is now irrelevant.
“Out with every theory of human behaviour, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves”(Anderson, 2008).
More recently Mayer-Schonberger and Cukier have argued that:
“The era of big data challenges the way we live and interact with the world. Most strikingly, society will need to shed some of its obsession for causality in exchange for simple correlations: not knowing why but only what” (Mayer-Schonberger and Cukier, 2013).
Such arguments have proved to be very divisive for obvious reasons (Couldry, 2013), yet there ramifications are certainly worth considering. Clearly government, political parties and other civic organisations have a great interest in big data and what it can tell them about the public. At the same time, traditional methods for understanding public opinion are, for various reasons that I detail below, struggling or at least evolving rapidly. So the question is: do we need a new theory of public opinion to cope with these developments?
As noted by Herbert Blumer as far back the as the 1940s (1948), public opinion research has always been rather adverse to theory, instead focusing its energies on practical methodological issues. However, one rather useful historically grounded theoretical framework has been outlined by the American academic Susan Herbst. Employing the idea of what she terms infrastructrues of public opinion, Herbst argues two things: first, that the definition of public opinion varies across time and place; and second that the definition actually has three components. These are shown, with historical examples, in Table 1 below (derived from Herbst and Beniger, 1994, Herbst, 2001).
An infrastructure of public opinion therefore consists of a method for measuring public opinion; an understanding of politics which shapes that public and how it is conceived; and forums in which public opinion is discussed. This tripartite model has taken quite distinctive forms in different historical periods and geographies, as the comparison between pre-revolutionary France and the mid-twentieth century United States in the table indicates.
Before discussing how we might fit the development of Big Data research into this model, it is also worth noting something about more traditional techniques and understandings of public opinion. In many ways, the mid-twentieth century US paradigm described above persists, at least in the way we talk about public opinion. However, there are a number of reasons to suggest that this infrastructure of public opinion is in decline. These include:
- The growing role for qualitative research. While opinion polling still plays a huge role in the development of political strategy, recent decades have seen growing prominence for qualitative researchers. While most researchers would claim that both techniques have to be combined for a rich understanding of public opinion, it is interesting to note that the most famous political researchers in the UK in recent decades have tended to be more associated with qualitative research than with polling, while the focus group has taken on a hugely important symbolic significance in contemporary politics (Gould, 2011, Mattinson, 2010, Schier, 2000).
- Declining response rates to telephone surveys. This is a much considered problem, especially for American pollsters. It is now not uncommon to get response rates in the single digits, which is undermining traditional methodological approaches to public opinion research (Groves, 2011).
- The development of internet panel surveys. The development of new online methods have challenged traditional telephone and face-to-face methods, and changed the market place for public opinion research (AAPOR, 2009).
- The use of more complex statistical modelling techniques. Partially as a result of lower response rates and partially because of internet panel surveys, it can now be argued that pollsters have moved from sampling the population to modelling it. In short, the poorer quality of the raw data going-in (be this because of the inherent biases of online panel polls or lower response rates for telephone samples) means that more statistical jiggery-pokery is required to create representative numbers (Groves, 2011).
- The rise of alternative metrics and predictors of public opinion. Opinion pollsters no longer have the field to themselves. Most famously Nate Silver employs Bayesian predictive modelling to predict US elections, while new social media research techniques have claimed reflect public opinion (Anstead and O’Loughlin, Forthcoming, Silver, 2012).
If we want to bind many of these trends together in an over-arching narrative, it perhaps relates to the decline of mass society. Traditional opinion polling, certainly as conceived by George Gallup and his contemporaries (and characterised by Herbst), was focused on understanding the political nation, as a singular entity. However, as the political nation has become more complex and differentiated, this model has started to look a lot less applicable. Therefore, as we do start to sketch out an infrastructure of public opinion where Big Data is becoming more influential, it is also important to hold in mind that this is not wholly a revolutionary development but also in continuity with other, older changes in the measurement and use of public opinion.
So what might a Big Data infrastructure of public opinion look like? One thing to note is that it is not really clear yet – we are still in the very early stages of the use of Big Data. What follows therefore is a slightly speculative attempt to start to answer this question.
Perhaps the easiest place to start is with an epistemology of public opinion and Big Data. I made a few points in my presentation. These are perhaps the most important:
- As outlined above, Big Data approaches are correlative, meaning they are more interested in “how” than “why” questions.
- Opinion polling is technically probabilistic in nature (hence the focus on margin of error). However, probability becomes far more important with Big Datasets, especially when the aim of the activity is prediction. As such, the very nature of the output analysis that is presented to politicians and the public might be different (Silver, 2012).
- Big Data is integrative. In particular, Big Data techniques often seek to use multiple datasets – both structured and unstructured – from a variety of sources. This represents a dramatic shift in the kind of information that can be processed and used to construct public opinion (Mayer-Schonberger and Cukier, 2013).
- Another important consequence of Big Datasets is that they can be more effectively sub-divided. Recent years have seen a rise in the so-called super-poll (in the UK, this technique is most famously used by Lord Ashcroft) where a sample of 25,000 is taken. The reason for this is that sub-samples can be more easily extracted from the dataset, without greatly increasing the margin of error. This would not work with a traditional 1,000 person poll. Big Data is also immune from this problem, and can very easily be organised in a way that allows for specific groups to be studied.
What though of ontology? What idea of the public might be embedded in Big Data?
- One optimistic reading of this turn of events is that measurement of public opinion will become more conversational, rather than being simply about atomised individual opinion. This may even have the consequence of decentralising power as the tools for measuring public opinion become more accessible.
- More pessimistically, big data techniques may alienate citizens even more from public opinion collection by harvesting unconscious expressed preferences, drawing on what has been termed “data exhaust”.
- So this raises a question: how would this model work with classic liberal democratic ideas? If citizens are engaging in democracy but don’t know they are, what does this mean? Are they really citizens anymore? Certainly the liberal idea of participation as an educative moment, which embeds an individual more in the political system would not make sense any more.
Finally, in what forums might public opinion be discussed in a Big Data infrastructure of public opinion?
- Big data is already starting to bleed over into mainstream political journalism (as Ben and I have detailed in our work), but is still something of a novelty. As yet, it is not as respected as more traditional public opinion research methods.
- However, it is questionable how much big data analysis citizens will get access to, and how transparent its construction will be. This is especially true if we are talking about data held in the private sector, such as social networks or health companies.
- So this suggests a potentially interesting double standard: the public might be given access to more frivolous analysis (what Big Data says about a reality TV show, for example), while important information is held by government and corporations (how Big Data is used to influence healthcare policy, for example).
- But it is important not to suggest that government is a singular identity. Some parts of government are clearly interested in big data, but it is not clear the legitimacy that various policy actors attribute to it (for example whether the civil service, executive, MPs or local councils have a great interest in it). What interest will MPs have in Big Data, for example? Will they take it more seriously than half a dozen letters from constituents?
These are really just some provisional ideas which I hope to fashion into something more substantial in the next few months. But any comments or questions are very welcome indeed!
AAPOR 2009. AAPOR Report on Online Panels, Washington D.C., American Association of Public Opinion Researchers.
ANDERSON, C. 2008. The end of theory? [Online]. Available: http://www.wired.com/science/discoveries/magazine/16-07/pb_theory [Accessed 27th June 2013].
ANSTEAD, N. & O’LOUGHLIN, B. Forthcoming. 1936 and all that: Can semantic polling dissolve the myth of two traditions of public opinion research? In: GIBSON, R. K., CANTIJOCH, M. & WARD, S. (eds.) Analyzing Social Media Data and Web Networks: New Methods for Political Science. Basingstoke, England: Palgrave Macmillan.
BLUMER, H. 1948. Public opinion and public opinion polling. American Sociological Review, 13, 542-549.
COULDRY, N. 2013. A Necessary Disenchantment: Myth, Agency and Injustice in a Digital World. Inaugural Lecture, London School of Economics and Political Science. London: LSE.
GOULD, P. 2011. The Unfinished Revolution: How New Labour Changed British Politics for Ever, London, Abacus.
GROVES, R. M. 2011. Three Eras of Survey Research. Public Opinion Quarterly, 75, 861-71.
HERBST, S. 2001. Public Opinion Infrastructures: Meanings, Measures, Media. Political Communication, 18, 451-464.
HERBST, S. & BENIGER, J. R. 1994. The changing infrastructure of public opinion. Audience making: How the media create the audience, 95-114.
MATTINSON, D. 2010. Talking to a brick wall : how New Labour stopped listening to the voter and why we need a new politics, London, Biteback.
MAYER-SCHONBERGER, V. & CUKIER, K. 2013. Big Data: A Revolution That Will Transform How We Live Work and Think, London, John Murray.
SCHIER, S. E. 2000. By invitation only : the rise of exclusive politics in the United States, Pittsburgh, Pa., University of Pittsburgh Press.
SILVER, N. 2012. The signal and the noise: Why so many predictions fail-but some don’t, New York, Penguin.