How Algorithms and Human Journalists Will Need to Work Together

Photo by Arthur Caranta and used here with Creative Commons license.

This post originally appeared on The Conversation.

Ever since the Associated Press automated the production and publication of quarterly earnings reports in 2014, algorithms that automatically generate news stories from structured, machine-readable data have been shaking up the news industry. The promises of this technology – often referred to as automated (or robot) journalism – are enticing: Once developed, such algorithms could create an unlimited number of news stories on a specific topic at little cost. And they could do it faster, cheaper, with fewer errors and in more languages than any human journalist ever could.

This technology provides an opportunity to make money creating content for very small audiences – even, perhaps, customized news feeds for an audience of just one person. And when it works well, readers perceive the quality of automated news as on par with news written by human journalists.

As a researcher and creator of automated journalism, I’ve found that computerized news reporting can offer key strengths. For example, automated journalism can analyze patterns in large amounts of data far more quickly than humans. I’ve also identified important weaknesses that highlight the importance of humans in journalism, whose ability to make judgment calls about what is newsworthy is irreplaceable.

Identifying automation’s abilities

In January 2016, I published the “Guide to Automated Journalism,” which reviewed the state of the technology at the time. It also raised key questions for future research, and discussed potential implications for journalists, news consumers, media outlets and society at large. I found that, despite its potential, automated journalism is still in an early phase.

Right now, automated journalism systems are serving specialized audiences, large and small, with very particular information, producing recaps of lower-league sports events, financial news, crime reports and earthquake alerts. The technology is constrained to these types of tasks because there are limits to what sorts of information it can take in and process into text that humans can easily read and understand.

It works best when handling structured data that is accurate like stock prices. In addition, algorithms can only describe what happened – not why, making it best for routine stories based solely on facts that have little room for uncertainty and interpretation, such as when and where an earthquake happened. And because the major benefit of computerized reporting is that it can do repetitive work quickly and easily, it is best used to cover repetitive topics that require producing a large number of similar stories, such as sporting event reports.

Covering elections

Another useful area for automated news reporting is election coverage – specifically regarding results of the numerous polls that come out almost daily during major campaigns. In late 2016, I teamed up with fellow researchers and the German company AX Semantics to develop automated news based on forecasts for that year’s U.S. presidential election.

The forecasting data were provided by the PollyVote research project, which also hosted the platform for publishing the resulting texts. We established a completely automated process, from collecting and aggregating the raw forecasting data, to exchanging the data with AX Semantics and generating the texts, to publishing those texts.

Over the course of the election season, we published nearly 22,000 automated news articles in English and German. Because they came from a fully automated process, the final texts often had errors, such as typos or missing words. We also had to spend much more time than we had expected troubleshooting problems. Most of the issues came from errors in the source data, rather than the algorithm – highlighting another key challenge of automated journalism.

Finding the limits

The process of developing our own text-generating algorithms taught us firsthand about the potential and limits of automated journalism. It’s crucial to make sure the data is as accurate as possible. And it is easy to automate the process of creating text from a single set of facts, such as the results of a single poll. But adding insights, like comparing that poll to others in the past, is much harder.

Perhaps the most important lesson we learned was how quickly we reached the limits of automation. When developing the rules governing how the algorithm would turn data into text, we had to make decisions that might seem easy for people to make – such as whether a candidate’s lead should be described as “large” or “small,” and what signals could suggest a candidate had momentum in the polls.

Those sorts of subjective decisions are very hard to formulate into predefined rules that should apply to any situation that has occurred historically – much less to any situation that might occur in future data. One reason is that context matters: A four-point lead for Clinton in the run-up to the election, for example, was normal, whereas a four-point lead for Trump would have been big news. The ability to understand that difference and interpret the numbers accordingly is crucial for readers. It remains a barrier that algorithms will have a hard time overcoming.

But human journalists will have a hard time outcompeting automation when covering routine and repetitive fact-based stories that merely require a conversion of raw data into standard writing, such as sports recaps or company earnings reports. Algorithms will be faster at identifying anomalies in the data and generating at least first drafts of many stories.

All is not lost for the people, though. Journalists have plenty of opportunities to take on tasks algorithms cannot perform, like putting those numbers in proper context – as well as providing in-depth analyses, behind-the-scenes reporting and interviews with key people. The two types of coverage will likely become closely integrated, with computers using their strengths and the humans focusing on ours.

Andreas Graefe is the endowed Sky Research Professor at Macromedia University.

Media and Journalism Awards: Sept. 14 Edition »

« ‘Headspace VR’ Study at USC Annenberg Finds Importance of Human Characters

Tags: algorithmAutomated journalismbig datarobot journalism

Andreas Graefe :Andreas Graefe is the endowed Sky Research Professor at Macromedia University. His work has been published in journals such as Public Opinion Quarterly, International Journal of Forecasting, Journal of Behavioral Decision Making, Journal of Business Research, Electoral Studies, and BMC Medical Informatics and Decision Making. From 2012 to 2014, he led the research focus Forecasting Politics at LMU’s Center for Advanced Studies. He also serves as an Associate Editor for Foresight – The International Journal of Applied Forecasting. Graefe studied economics and information science at the Universities of Regensburg and Zurich and received his PhD in economics from the University of Karlsruhe. He has held research positions at the Institute for Technology Assessment and Systems Analysis at Karlsruhe Institute of Technology as well as visiting scholar positions at the University of Pennsylvania’s Wharton School and Columbia University. After finishing his PhD, he worked in the private sector as a senior manager for the German pay-tv company Sky Deutschland, where he led the CRM Resource Management Department.

Comments are closed.

MediaShift Launches New Peer Group Trainings for Publishers
We know that our MediaShift community likes learning new things. That's why we have always…
How to Build a Digital Newsroom with Developers and Journalists Working Together
Most news organizations today have been publishing online for the better part of a decade.…
How Broadcasters Are Making Two-Way Experiences with Interactive Content
The following is a sponsored post from NAB Pilot to promote its Innovation Stories section.…