Viktor Mayer-Schönberger and Kenneth Cukier’s Big Data: A Revolution That Will Transform How We Live, Work, and Think provides an brief history and overview of the promises, advancements, issues and implications of the big data revolution.
Big data is a social phenomenon that has significant qualitative effects that the authors state is revolutionary. Able to come about as a result of technological evolution, for the first time in history, there exists the ability for people to easily and cheaply capture and store massive amounts of data and monetize it for various uses in a variety of ways one thought impossible. This transition means that statistical methods of sampling or estimation no longer ought to be seen as the ideal manner in which various interests can extract meaning from data.
The book points out that data is rapidly becoming the raw material of business and government policy. A number of other examples include, as it related to criminal justice issues, police use the technology to determine which regions to patrol at certain times of day. In the business realm Wal-Mart, the first company to adopted datafication for it’s sales analysis systems, learnd from that that they should send Pop-Tarts to areas about to be affected by hurricanes and not NutraGrain bars. This area small examples, however, as data accumulated by some companies for insurance and banking are able to sell for hundreds of billions of dollars as they can help with predictions about the likelihood of loans defaulting.
Data, as it exists in the world, can however lead to flawed conclusions. Mayer-Schönberger and Cukier praise Google’s Flu Trends service – which analyses billions of searches into its website as well as other indicators to estimate the prevalence of flu in the United States. In 2015 Google’s estimate of flu cases was twice the actual number. This isn’t itself an issue – however – as it allows data scientists to better figure out how to quantify this without people filling out a survey every day. So, what exactly is so revolutionary about businesses having a better means of projecting items that will likely be purchased by consumers? Well, the book argues that it’s paradigmatically revolutionary and cites three shifts why this is so.
The first shift cause by big data is the ability to survey components of information from potentially a whole population instead of just sampling random portions of it. Rather than projecting based upon samples – which the authors repeatedly decry as an antiquated means of projecting (something proven by the recent election of Trump despite most polls to the contrary) – we can look at everything.
The second paradigmatic shift is that “looking at vastly more data also permits us to loosen up our desire for exactitude”. This is so as in big data, according to the authors, “with less error from sampling we can accept more measurement error”. Science is obsessed with sampling and measurement error methodologies and potential error percentages because they exist in a ‘small data’ world.
It would be amazing if the problems of sampling and measurement error really disappeared when you’re “stuffed silly with data”. But context is something that needs to be considered carefully and why it is easy to treat samples as n=all as data gathering means get closer to full coverage, researchers ought to account for the representativeness of their sample. One easy to overlook example of this relates to the digital divide.
The third and potentially most radical paradigmatic shift in understanding complex information and their relationship to each other means that people will change the “causal modality and get rid of “the idea of understanding the reasons behind all that happens.”
The traditional image of science the authors propound, however conflates principles with practices. While desire to determine causality and precision in measurements are generative mores, the authors seem to dismiss causation as something to aspire to too cavalierly with the promise of big data.
Their claim that the social sciences “have lost their monopoly on making sense of empirical data, as big-data analysis replaces the highly skilled survey specialists of the past” seems fatuous. So what if the new algorithms can review big data analyses and predictions, they only determine meaning by the means by which they are input. What others not so blinded by promise of datafication know is that even at the most granular level of practice, analytic understanding is necessary when attempting to implement these systems in the world or use them to understand the past.