As part of our Metrics that Matter special series, we’ve been exploring the importance of certain media metrics to newsrooms on an institutional level. This post takes a step back and looks at the larger picture, tackling the issue of diversity in media coverage as a whole. This post, written by Chartbeat’s Sonya Song, is an abridged version of an original white paper. Read the full report here.
Many scholars believe that consuming diverse media content helps us check our own thoughts and listen to others, and thus holds communities and nations from falling apart. Being diverse means we read about both sports and politics, scan both left- and right-leaning media outlets, and visit both domestic and international news websites. It’s easier said than done, especially when search engines and social media use various algorithms to cater to everyone’s preference and comfort.
Witnessing editorial strategies manipulated by technology, some intellectuals are worried that the public’s media diet is bounded by “filter bubbles” and “echo chambers.” And their worries are supported by data. For example, in the US, conservatives mostly rely on Fox News as their lone source of information, whereas liberals have less exposure to opposing views on Twitter.
What’s happening in parallel with this parochial media diet is a trend of increasing political polarization in the US over the last decade, as reported by Pew Research. In addition, Ethan Zuckerman at the MIT Media Lab stressed that most nations see under 10% of Internet traffic from overseas websites, and language isn’t the only barrier – Brits and Indians don’t read each other’s content, for example, but neither do neighboring Spanish-speaking South American national audiences.
On the other hand, despite widespread concern, some scholars are convinced that the undesirable effect of media segregation is fairly limited and mitigated, and social media may even help reduce the partisan tendency. Comparing online to offline communication, Gentzkow and Shapiro find that ideological segregation of online news consumption is significantly lower than face-to-face interactions with neighbors, coworkers and family members, although it’s higher than that of most offline news consumption. Hence, they conclude that there’s “no evidence that the Internet is becoming more segregated over time.” At the same time, scholars across varied disciplines try to develop and test ways of diversifying information exposure.
To better understand the relationship between media diversity and traffic referrers, we analyzed data from our network of publishers and obtained a full picture of the diversity of traffic driven by Google, Facebook and Twitter. It’s the first time, we believe, that collective behavior of media consumption is revealed at a large scale. The following is a summary based on our research findings. For more information, please read our full report.
What We Did
We examined two types of referrers — search and social — and chose three particular referrers as representatives: Google, Facebook and Twitter. We selected these three particular sources because Google drives the majority of search traffic, while Facebook and Twitter do so with social traffic. All other referral sources are yet to be studied in the future.
Note: We excluded Google News and Google Plus from the “Google” referrers because we wanted to look into the search-driven traffic alone. These two referrers were neglected in this study.
We designed this study to span a considerable period to allow enough time to benchmark the patterns of media consumption. This span also contained various types of scenarios that allow us to observe media consumption patterns around them. Against eventless weekdays, we selected and paid special attention to three types of scenarios:
- Breaking events (the Paris attacks and San Bernardino shooting)
- Scheduled events (Democratic and Republican presidential debates)
- Holidays (Thanksgiving and Christmas)
We drew a random sample of about 700 websites from our database and among these websites 70 million unique URLs were visited in November and December of 2015. For each visit, our data marked which URL was visited and which upstream website, or referrer, brought people to this URL. We aggregated the data by referrer and measured each of them in three ways: volume, variety and diversity.
As a brief explanation of the intuition behind quantifying diversity, suppose Facebook brings one visit to a particular web page. That means the social traffic referred by Facebook is 1 pageview and the diversity is zero (no diversity at all). In the next hour, this web page becomes viral on Facebook and attracts 1,000,000 visits, but at the same time Facebook doesn’t send traffic to any other web page at all. In this extreme case, the social traffic from Facebook is 1,000,000 pageviews but the diversity remains zero. However, if each of those 1,000,000 pageviews went to separate articles, the traffic would be maximally diverse.
Throughout the two-month study period, we identified pronounced and compelling patterns in media consumption. We’ll first examine diversity and then volume and variety.
Figure 1 shows how diversity varied over time and around major events. The x-axis denotes time with light-gray dashed lines marking days and dark-gray ones marking weeks. The y-axis denotes diversity as calculated by Shannon Entropy. The three types of events are indicated by different colors: breaking events in brown, presidential debates in purple and holidays in red. Google, Facebook and Twitter are indicated using orange, green and blue. This visual theme is used throughout this report.
Here’s a summary of our findings:
- People generally receive more diverse content through Google searches than through social media platforms such as Facebook and Twitter. During major events like the Paris attacks, people rush to a few brand-name media outlets for updates and this kind of news consumption appears the least diverse for all the referrers. That means although more people seek information online after major events, they may be funnelled to a limited number of media outlets. The extremely concentrated news consumption persists a few hours after major events and gradually recovers in the next few days.
- On an eventless day, diversity follows a predictable daily cycle for the traffic driven by Google and Facebook – not much so for Twitter. Look at Figure 1, or Figure 2 for an enlarged view. We can see that diversity runs the opposite directions for Google- and Facebook-driven traffic: Diversity tends to dive lower in the middle of a day for the former and climb higher for the latter. It’s probably because, as a lone activity, search would become more diverse when people are off social settings, whereas more content will be infused into online social venues when people are indeed connected.
- On Thanksgiving and Christmas, all the three measurements fall for online traffic, which means that on holidays fewer web pages are offered by media outlets and fewer are visited by their audiences.
- Around presidential debates, we didn’t observe noticeable abrupt changes in diversity.
While search traffic was more diverse than social traffic, the correlation between referral sources and diversity doesn’t imply causality, because search engines and social media are used for differing purposes in various scenarios. In other words, it’s inappropriate to say that searching Google for news would better diversify someone’s media diet than relying on shared news on social media. Likewise, using social media for news wouldn’t necessarily limit one’s view if she also adopts other online and offline channels for information.
In addition, Figure 1 shows Google traffic has neat daily cycles of diversity every day, moving up in the morning and down in the afternoon, whereas Facebook and Twitter show less definite cyclic patterns in diversity.
The reason may again lie in solitary behavior as opposed to collective behavior. Search queries tend to be executed regularly by individuals like everyday routines, e.g., dinner, sleep, commute, etc., and aren’t greatly influenced by others’ behaviors. Social media, on the other hand, serve crowds to consume and share content together as well as to echo content back and forth with algorithmic manipulation. As a result, content gains momentum in traffic more quickly and more often on social media than through search engines.
Figure 2 zooms into the two breaking events and shows more subtle differences across the three referrers.
- Google search showed more dramatic rises and falls than other referral sources, presumably because after an event breaks, people are motivated to search for information proactively rather than wait for others to share something on social media. For search, many people habitually go to Google. Although social media enable search features as well, this subset of information may be insufficient and indirect in times of emergency.
- Although Twitter drives much less traffic than Facebook in general, after the events in San Bernadino and Paris occurred, Twitter had a deeper dip in diversity than Facebook. A possible explanation is that, until recently, Twitter didn’t sort messages, which moved information faster than Facebook. As a result, the heavy traffic on Twitter quickly sort out and surface a small collection of web pages and reduces the diversity of media consumption. A further note is that news media and journalists have a strong influence on Twitter, and their voices may muffle others, thereby making the information on the platform less diverse in times of an emergency.
Examining Volume and Variety
As discussed earlier, diversity takes into account two factors of media consumption: variety and distribution. Therefore, the diversity dips during the two breaking events can be induced by both factors: either (a) the variety drops, i.e., a smaller amount of content is provided by media outlets and circulated among Internet users, or (b) the distribution becomes substantially skewed, i.e., a small amount of content attracts incomparably more attention than the rest majority, or both.
Like diversity, when eventless, volume and variety follow a predictable daily cycle, rising the first half of the day and falling in the second half. During the two major events in Paris and San Bernardino, volume spiked tremendously, indicating many people were motivated to seek information.
Around Thanksgiving and Christmas, volume and variety fell in addition to diversity, indicating that during holidays media production slows down and audiences direct their attention elsewhere as well. During the presidential debates, it’s hard to tell how volume and variety changed by eyeballing the raw data.
Trend of variety
One reason it’s hard to detect variety changes around events is because it has both daily and weekly ups and downs (see Figure 4). Such a change in variety could correspond to this pattern rather than being impacted by an event. In other words, to observe the impact of an event if any, we need to tease out the daily and weekly cycles first.
Recall our previous discussion on diversity: what caused the diversity drops during the two breaking events? Could it be volume or variety? After removing daily and weekly trends, examining these two factors found both contributed to the changes: as the variety fell and the volume rose, most attention must have gone to only a few web pages on a few websites. Or in other words, the distribution of attention became vastly skewed. The destinations during these events are not hard to guess: BBC, CNN and NYT for people speaking English, and Le Monde, Der Spiegel and La Nacion for people speaking French, German and Spanish.
What factors shape diversity of news consumption?
Diversity of media consumption varies every hour and every day and this variation is shaped by all the players in the media ecosystem: audience, publishers and mediators, such as search engines and social media.
On the individual level, news consumption is mainly determined by personal interests and preferences. Because people’s interests are diverse and because their search inquiries are more varied and specific than what they encounter passively on social media, search-driven traffic would be expected to be more diverse than social-driven traffic.
On the collective level, several theories help explain why social-driven traffic is less diverse than search-driven traffic. First, people tend to stay in like-minded communities and engage in selective exposure that echos their own beliefs.
At the same time, social media users have remarkably differing levels of potential to propagate information. For example, an average Twitter user has about 60 followers while the top 1% have over 30,000 followers, who are mostly celebrities, opinion leaders and marketers, and therefore are able to deliver their messages more widely than average users. As a result, minority voices are muffled and the overall information diversity is reduced on social media.
Moreover, tragic news is more likely to go viral than uplifting news, because negative information draws more attention than positive information. People are urged by their innate need for information to keep themselves safe and informed during crises. This phenomenon can be interpreted by negativity bias.
In addition, collective behavior is formed differently around major events. During those moments, Twitter users tend to retweet more than reply to each other and oftentimes messages from celebrities and news media get circulated the most and information diversity thus declines. Outside of social media, collective information seeking is spontaneous rather than organized, such as peak search inquiries on Google during breaking news.
From a technological perspective, media materials are ranked and promoted by search engines and social media using homegrown algorithms, and people’s selective exposure is inevitably amplified by these sorting algorithms.
This is because algorithms reinforce rich-get-richer dynamics, a phenomenon in which people or things with more advantage accumulate advantage more easily than those with less advantage. For example, once a song climbs onto the Billboard Top 50, it’ll accrue additional plays more quickly than those songs off the chart, and therefore, tend to stay a long time on the chart.
From a financial perspective, larger publishers have more resources to provide more and better content to their audiences than smaller ones, thereby attracting disproportionately more traffic. This attention gap is especially striking around major events, as we have seen in Figure 1, which indicates how attention is shared among news media.
This phenomenon can be explained using “umbrella model,” a theory raised in the 1970s. It states that media content has a one-way flow from larger markets to smaller markets. For example, the New York Times gets circulated in Boulder but the Boulder Daily Camera rarely makes its way to New York. Similarly, US movies are watched in New Zealand but not the other way around.
An explanation for this kind of phenomena is that larger markets empower content providers with more resources and, therefore, expect better products in general. Likewise, pageviews will never be evenly shared among publishers and larger ones always attract more traffic than smaller ones. Therefore, the unbalanced distribution of financial resources dictates a skewed distribution of online traffic, which sets the ceiling of information diversity that can hardly be broken through.
Implications for newsrooms, small or big
News consumption is substantially concentrated around major events, as shown by our data, and this finding can have immediate managerial and editorial implications for newsrooms. Imagine the moment when a major event like the Paris attacks takes place, online media, television and radio break news in real time, whereas print media need to wait until the next morning to reach newsstands and doorsteps.
Here’s the question: Should local print media cover the breaking event on their front pages? If your answer is yes, many people will have already eagerly read online about what happened the previous night and will simply skip the front page story about the major event. Does it mean that local print media don’t need to cover it on the front page? Not really, because subscribers may complain that the local media has missed a beat or will have little sympathy while non-subscribers would consider themselves having made a very wise decision to subscribe to the New York Times instead of the local newspaper. As such, would it be a sensible choice to cover the event with minimum resources, reproducing a wire story for instance?
The scenario is much different for larger newsrooms, but there are still puzzles to unravel. As the entire world watches for the live coverage of a breaking event, what would you do with other news stories you had planned to publish? If you publish them, they may get buried by the overwhelming interest in the breaking news. If you don’t, some of your readers may find there isn’t much fresh content to digest on that day.
What’s your experience with major events as a journalist or a reader? Please leave a comment here, join our discussion on Twitter, or reach me at [email protected] For more elaboration on key concepts and research methods, please read our full report.
Sonya Song is a media researcher working with the data science team at Chartbeat. As a Ph.D. candidate, she focuses her research on media economics, a field that mainly answers the question how to make money using media content. Her sideline research is to study media control, especially censorship and propaganda.