The utopian hype over Big Data is being critiqued on many fronts. After all, it isn’t that new. The Romans and the Nazis amassed huge amounts of data on their populations. And then, of course, there is the creepy, Big Brother aspect.
However, one problem no one seems to be talking about is that Big Data is too small.
That may sound like a contradiction, what with the kajillions of information bytes that makes it not just data but “Big” data. Indeed, digital technology has enabled researchers to access, store and analyze unprecedented massive amounts of data, often online.
But Big Data is too small when it is amassed, scraped and API’ed from the Internet, whether from tweets or Google searches. The problem is the digital divide.
Specifically, people with lower levels of income and education are not accessing or creating online content nearly as much as people with a college degree and a comfortable middle-class lifestyle. That means if journalists, academics and policy makers rely on Big Data analytics, then they are ignoring issues important to many poor and working-class people.
Ignoring the least likely to be online
I first became interested in the implications for this Big Data Gap a few years ago when I read a New York Times article about how the Centers for Disease Control was tracking the flu outbreak by analyzing Google searches. Who is the most vulnerable for the flu? The poor and elderly. Who is least likely to be online? The poor and the elderly.
An oft-repeated myth is that the digital divide is over. In response, I recently wrote a blog post, “The Seven Myths of the Digital Divide.” I address claims, such as older people dying out will solve the divide. (Class gaps persist across age groups.) Or that people are leapfrogging over desktops by having cell phones. (Could you write a news article on a cell phone?)
Roughly 20 percent of people in the United States are not online, and many of those who are do not have consistent access. But even among those who are connected, there exists another divide between people who create and participate online and those who do not.
Digital content is paramount. Who is posting to blogs, Twitter or YouTube matters for how representative the Internet is. In my research, I found that among American adults, people with a high school education are less likely to create online content with all 10 activities I examined. These people are not contributing to that giant pool of online data. Thus, we have a Big Data Gap.
My research findings point to how some of these gaps matter with my own Big Data analytics. I study social movements, social media and social class. With my current project, I am studying how digital activism matters for about 35 political and social movement organizations in one southern state. The groups, all of which are organizing around one political issue, range from Tea Party groups to rank and file unions, as well as university groups and government associations. Three of the most active groups offline have virtually no online presence. One organization does not come up on Google searches. Half of the organizations are not active on Twitter.
So when we study so-called Twitter and Facebook revolutions by analyzing social media data with fancy social network diagrams, who are we studying and who are we ignoring? It is more than a question of inaccurate descriptions of political events. It can also lead to policies that reflect only those with digital bullhorns.
How the data gap influences reporting
The Big Data Gap can also lead to inaccurate reporting by journalists.
From the Arab Spring to Occupy Wall Street, newspapers hailed social media as the spark that flamed these movements. But this is more hype than reality, as it is the Twitter elite who tweet and retweet movements already underway.
On a more day-to-day level, though, because of the transformation of journalism over the last decade, news reporters do not have the flexibility or time to track sources down who are not online. What’s trending on Twitter or other Big Data aggregators is more apt to be used than the agenda for a school board or union meeting. The Knight Foundation, a leading funder of citizen journalism projects, funds “Data and Journalism” projects analyzing Big Data.
Certainly, not all Big Data derive from the digital cloud online. Corporations have a way of tracking most consumers, regardless of whether or not they have the latest iPad. And before the dawn of digital technology, journalists and scholars have focused on the artifacts of the elite – from 16th century books to The New York Times.
However, there was little pretense that this was representative of all of society. Users of Big Data and the digital cloud imply that this information does include all “citizens.” I am not arguing that Big Data, therefore, should be discarded. Quite the opposite — it is not only a literal gold mine of information but it is also fascinating and fun to analyze. Instead, we need to figure out ways to use information from the digital cloud in a methodological way that represents all of society, not just those online, or at a minimum, recognize that it is smaller than we might have thought.
Jen Schradie is a doctoral candidate in the Department of Sociology at the University of California-Berkeley and the Berkeley Center for New Media. She has a master’s degree from the Harvard Kennedy School. Jen studies social movements, social media and social class. Her broad research agenda is to interrogate digital democracy claims in America. Her research on digital production inequality earned her the 2012 Public Sociology Alumni Prize at UC Berkeley. With a National Science Foundation Grant, she is studying how the Internet matters for democracy among social movement and labor organizations in the American South. Before entering academia, Jen directed six documentary films, which have screened at more than 25 film festivals and 100 universities. Follow her on Twitter @schradie or go to www.schradie.com.