The utopian hype over Big Data is being critiqued on many fronts. After all, it isn’t that new. The Romans and the Nazis amassed huge amounts of data on their populations. And then, of course, there is the creepy, Big Brother aspect.
However, one problem no one seems to be talking about is that Big Data is too small.
That may sound like a contradiction, what with the kajillions of information bytes that makes it not just data but “Big” data. Indeed, digital technology has enabled researchers to access, store and analyze unprecedented massive amounts of data, often online.
But Big Data is too small when it is amassed, scraped and API’ed from the Internet, whether from tweets or Google searches. The problem is the digital divide.
Specifically, people with lower levels of income and education are not accessing or creating online content nearly as much as people with a college degree and a comfortable middle-class lifestyle. That means if journalists, academics and policy makers rely on Big Data analytics, then they are ignoring issues important to many poor and working-class people.
Ignoring the least likely to be online
I first became interested in the implications for this Big Data Gap a few years ago when I read a New York Times article about how the Centers for Disease Control was tracking the flu outbreak by analyzing Google searches. Who is the most vulnerable for the flu? The poor and elderly. Who is least likely to be online? The poor and the elderly.
An oft-repeated myth is that the digital divide is over. In response, I recently wrote a blog post, “The Seven Myths of the Digital Divide.” I address claims, such as older people dying out will solve the divide. (Class gaps persist across age groups.) Or that people are leapfrogging over desktops by having cell phones. (Could you write a news article on a cell phone?)
Roughly 20 percent of people in the United States are not online, and many of those who are do not have consistent access. But even among those who are connected, there exists another divide between people who create and participate online and those who do not.
Digital content is paramount. Who is posting to blogs, Twitter or YouTube matters for how representative the Internet is. In my research, I found that among American adults, people with a high school education are less likely to create online content with all 10 activities I examined. These people are not contributing to that giant pool of online data. Thus, we have a Big Data Gap.
My research findings point to how some of these gaps matter with my own Big Data analytics. I study social movements, social media and social class. With my current project, I am studying how digital activism matters for about 35 political and social movement organizations in one southern state. The groups, all of which are organizing around one political issue, range from Tea Party groups to rank and file unions, as well as university groups and government associations. Three of the most active groups offline have virtually no online presence. One organization does not come up on Google searches. Half of the organizations are not active on Twitter.
So when we study so-called Twitter and Facebook revolutions by analyzing social media data with fancy social network diagrams, who are we studying and who are we ignoring? It is more than a question of inaccurate descriptions of political events. It can also lead to policies that reflect only those with digital bullhorns.
How the data gap influences reporting
The Big Data Gap can also lead to inaccurate reporting by journalists.
From the Arab Spring to Occupy Wall Street, newspapers hailed social media as the spark that flamed these movements. But this is more hype than reality, as it is the Twitter elite who tweet and retweet movements already underway.
On a more day-to-day level, though, because of the transformation of journalism over the last decade, news reporters do not have the flexibility or time to track sources down who are not online. What’s trending on Twitter or other Big Data aggregators is more apt to be used than the agenda for a school board or union meeting. The Knight Foundation, a leading funder of citizen journalism projects, funds “Data and Journalism” projects analyzing Big Data.
Certainly, not all Big Data derive from the digital cloud online. Corporations have a way of tracking most consumers, regardless of whether or not they have the latest iPad. And before the dawn of digital technology, journalists and scholars have focused on the artifacts of the elite – from 16th century books to The New York Times.
However, there was little pretense that this was representative of all of society. Users of Big Data and the digital cloud imply that this information does include all “citizens.” I am not arguing that Big Data, therefore, should be discarded. Quite the opposite — it is not only a literal gold mine of information but it is also fascinating and fun to analyze. Instead, we need to figure out ways to use information from the digital cloud in a methodological way that represents all of society, not just those online, or at a minimum, recognize that it is smaller than we might have thought.
Jen Schradie is a doctoral candidate in the Department of Sociology at the University of California-Berkeley and the Berkeley Center for New Media. She has a master’s degree from the Harvard Kennedy School. Jen studies social movements, social media and social class. Her broad research agenda is to interrogate digital democracy claims in America. Her research on digital production inequality earned her the 2012 Public Sociology Alumni Prize at UC Berkeley. With a National Science Foundation Grant, she is studying how the Internet matters for democracy among social movement and labor organizations in the American South. Before entering academia, Jen directed six documentary films, which have screened at more than 25 film festivals and 100 universities. Follow her on Twitter @schradie or go to www.schradie.com.
It’s a great thought piece, Jen. One thing I was wondering is if the people on the other side of the Digital Divide might actually *welcome* being left out of Big Data, so that they are not being tracked everywhere and watched. Did you consider that angle?
In a post-Snowden world, not having a digital trace might, indeed, be
welcome. But the state has always surveilled, even before the Internet. And Big Data analytics are about more than privacy. Those of us who are able to be online 24-7 often forget how our tweets, posts, and even comments on sites like these are a privilege. They also drive research, policy and journalism. While lol-cats and ‘hey girl’ memes do not seem fodder for political priorities, other forms of digital engagement do have an impact. The fact that we have a virtual poll tax on digital democracy could deepen social inequalities, rather than ameliorate them.
I can certainly understand why people might want to opt-out of Big Data, the reality is that the ability to do so is very privileged The technical savvy it takes to know how to avoid leaving a digital trace is beyond many people on the other side of the Digital Divide.
The Digital Divide means more than not being tracked; its about not being counted. Its the equivalent of being invisible. You are not a constituent, so no public entities are not accountable to you. I work in social services for a nonprofit agency. We are seeing a huge challenge as state agencies move access to information and to public benefits to online only. While public policy makers are aware of the digital divide, they are effectively writing off the impact that these decisions will have on people who are already on the margins.
Native American tribes are still affected by the digital divide! Yet we’re adopting mobile at very high rates. But, you can’t fill out a form on a mobile phone very easily. Therein lies the divide.
Big Data is only a buzz word and the size of the data is relative to our current capabilities in being able to analyze such data. Big Data is essentially information for companies to use in predicting customer needs and as we always do, new jobs and titles are created to accompany this data revolution. Data scientists and analysts are a growing trend right now. However, this article is centered more around the social inequalities of the poor not having a computer readily available. If we move our democratic society online, we are essentially taking away more than the privilege to be counted, but the right to count. In the world of statistics, there are ways to fill in the gaps and provide for missing data…as long as we know it’s missing.