Software that writes baseball game stories from box scores and play-by-play information now has a name: StatsMonkey. And it’s making some journalists nervous — needlessly.
The software, the first version of which was developed this spring by a team of computer science and journalism students at Northwestern University, has evolved significantly since then. John Templon and Nick Allen (a "programmer-journalist" attending the Medill School of Journalism on a Knight News Challenge scholarship) were two of the students who worked on the initial version of the software, which has been made available on an open-source basis. John and Nick, both Medill grad students, developed the software with Tian Huang of Medill and Thu Cung, a student in the McCormick School of Engineering and Applied Science.
The software, then called "Machine Generated Sports Stories," was one of five projects developed in an experimental collaboration with the McCormick School’s Intelligent Information Laboratory, or InfoLab. The class brought together students from Medill’s Interactive Innovation Project and from McCormick’s practicum in intelligent information systems. Two professors from Medill (me and Jeremy Gilbert) and two from McCormick (Kris Hammond and Larry Birnbaum) led the collaboration.
If you want to know more about the class and the software the students developed, you can read the class blog, watch the students’ final presentation, or download their comprehensive report that includes recommendations for journalists, media companies and journalism education.
Since June, Nick and John have kept working on the baseball project as paid interns at the InfoLab. They’ve reconstructed the code, built a greater variety of game narratives and begun to incorporate details about trends in player and team performance over time.
Earlier this month, an article in Medill’s alumni magazine brought StatsMonkey to the attention of a lot of journalists. A couple of them didn’t like it:
- "Soon enough, sports reporters could be obsolete," wrote Andrew Greiner at NBCChicago.com.
- Rick Green of the Hartford Courant, asked, "… isn’t something lost when the reporter isn’t there at the games, talking to players, paying attention to what’s not said and feeling the mood?"
These weren’t the first journalists to express concerns about StatsMonkey. Back in August, Gregory Hardy of CBSSports.com worried about what might happen "if robot sportswriters take over."
Given the turmoil in the news business these days, it’s understandable that journalists — especially sports journalists — would be nervous about StatsMonkey. But I don’t think sportswriters need to be worried — if StatsMonkey becomes a commercial product, it is highly unlikely to put sports journalists out of work.
To understand why, let’s start by explaining what StatsMonkey actually does:
- It imports the box score and play-by-play information, information routinely captured for games in professional leagues, college baseball and some lower levels (high school, youth leagues, etc.).
- It uses some baseball-geek stats (leverage index and win probability added) to identify high-stakes at-bats and key plays that significantly change the probability that one team will win.
- It determines a game narrative — for instance, a come-from-behind win, a pitcher’s duel, etc. — from these key at bats and plays
- It constructs a headline and story from the options for game narratives and incorporates key events from the play by play
- It uses historical data — about teams and players — to add context (for instance, that a particular player’s hit broke a 5-game hitless streak, or that this was the team’s third win in a rwo).
Of course, the program will have a limited number of possible game narratives, and it cannot account for events that don’t show up in the box score or play by play (for instance, the infamous play in a 2003 Chicago Cubs playoff game in which a fan caught a foul ball that might otherwise have been fielded for an out). A StatsMonkey story will be a very poor substitute for richly textured narrative by a professional sportswriter.
But think of a few ways StatsMonkey could add to what professional journalists do:
- It could instantly write a game story as soon as the last out is made, freeing a reporter to go down to the field or the locker room to do interviews
- It could generate a story about any game in progress, at any point during the game — just what someone might want when checking on a favorite team during the work day.
- It could create stories about games — for instance, college baseball — that are not routinely covered by professional journalists.
- It could generate stories about each player in a game for whatever people are especially interested in particular players (not hard to imagine for college baseball)
- If Little League coaches start to enter game information through a mobile device (and there already is at least one "app for that"), it could generate game stories about Little League games, which have a passionate following but will never be covered by professional journalists.
Beyond that, even given my background and identity as a journalist, I would have to say to any sportswriter: If your game story CAN be generated by a computer, at some point it WILL be generated by a computer. Human journalists will do — and should do — the kind of reporting and storytelling that computers can’t.
Beyond that, StatsMonkey is just a first experiment in identifying formulaic stories could conceivably be generated by software rather than people. Some other possible examples: corporate earnings reports, obituaries, even accounts of what City Council did last night. As with StatsMonkey, software that generates these kinds of stories most likely wouldn’t replace journalists. The software would create stories that would otherwise not be written, or free up journalists to do more important work that can only be done by humans.
Got any other ideas for topics that would be a good fit for computer-generated stories? Post in the comments below.
If this works, it definitely could free up sports reporters to be more creative and focus their attention away from game summaries. I’ve heard of other programs along these lines. Is anyone familiar with an application that works for football box scores?
While I appreciate your optimism, I fear this computer option won’t free reporters to cover other things. It will free publishers to free reporters from their jobs.
Just think of the salary savings, and computers don’t need insurance or 401(k) matches, either. As a laid off journalist, it seems like an obvious end result.
The #1 goal of a corporation is to increase profit. Replacing humans with computers is an obvious way to lower wages and increase profits. It’s sad, but the corporations we know and buy from (and now read from) don’t have their employees as first priority. it sux, but a journalist replaced by statmonkey is no different then a worker in an automobile plant being replaced by a machine that does automated work.
If the StatsMonkey robot doesn’t take the job of that sportswriter, thousands upon thousands of bloggers writing about the same subject will. One way or the other, sportswriting does not look like a growth industry to me.
I disagree with you about the future for sportswriters (and other journalists). I think the future is bright — but not if your future rests on writing game stories that a computer can do almost as well. Unique content that truly engages sports fanatics — or fanatics on other topics — will be valued by audiences,by the advertisers that want to reach them, or both.
StatsMonkey sounds so cool. Could the code possibly be used to help reporters who write other formulaic stories to get the news out faster, then spend more time digging into the story behind the story?
Good reporting has never been all about the writing. Smart news organizations could use tools like StatsMonkey to get more quality work out of their staffs that doesn’t necessarily equate to column-inches, and I think their staffs would thank them for that. Honestly, what reporter enjoys writing the same old story the same way over and over? In the short period that I wrote cops stories that was the part I hated most. What I enjoyed most was talking to people and finding out what was really going on (something I’m guessing StatsMonkey can’t do).
I’d hasten to say that if a news organization rates its reporters only based on the volume of their writing, and cuts people who produce fewer words even if they’re doing better actual reporting, that organization is in big trouble for reasons that have nothing to do with robots.
Most forms of reporting in print and “on-the-air” could be done via programs/bots–traffic, accidents, weather, crime, polls, etc.
Sense-making journalism, investigative journalism, local color columnists, editorial writing, etc. and yes, sports columnists, can not be simulated.
They can be supported by the programs/bots-aggregating data and fact checking.
Why aren’t programs being created to do the jobs of upper management? Their jobs are so much more about numbers and formulas.
Why does any newspaper even need a media company backing them with presidents and boards of directors and CEOs?
All they do is look at the numbers versus the content–and they are never concerned with quality and what might be good/bad for society versus what makes money.
Surely there is a App for that!