How Newsrooms Should Best Handle Public Data

    by Ryan Graff
    February 26, 2013

    This post was written by Ryan Graff of the Knight News Innovation Lab and originally appeared on the Lab’s blog.

    Over the past few months something unusual has happened to public data projects: They’ve made national headlines.

    For journalists, the most well-known project was the gun permit holder map the Journal News in White Plains, N.Y., published late last year featuring names and addresses of all registered gun owners in two New York counties.



    The map was controversial and inspired journalists and journalism pundits to weigh in on the project’s virtues and faults before it was ultimately removed late last month.


    The controversy — especially in light of recent and proposed legislation — got us thinking about how newsroom developers should best handle public data. What solutions are best suited to deal with data that is potentially invasive? Are there differences when dealing with data online versus in print? And what repercussions might news organizations face following controversial publishing of public data?

    questions that come with data

    At its core, publishing data requires editorial judgment not all that different from the judgment journalists have honed in print over the past few decades.

    “Every data set is like a human source,” said Derek Willis, an interactive news developer at The New York Times. “You weigh whether to publish the information you get from it, in what context and to what end.”

    Still, there are some unique questions that come with data and digital distribution.

    Of course there’s the issue of permanence. Stories and data last much longer online than they do in print and have the potential to follow the people mentioned in the data for years to come with potentially negative effects.

    There’s also the issue of accuracy. Rich Gordon, a Knight Lab co-founder and former digital director at the Miami Herald, recalls working on projects for the print edition of the Herald years ago in which every line of data printed was double-checked by a person.

    Online, Gordon contends, there’s a greater tendency to present all data of a particular set. That tendency allows for much more depth, but the volume makes it difficult to double-check for accuracy.

    It also represents a shift from looking for stories within data to data being the story. That shift isn’t necessarily problematic, but it does make journalists less likely to find mistakes or inaccuracies in the data.

    “Because we spent a lot of time with the data in search of the news before we published, we were more likely to find trouble with the data,” Gordon said.

    In fact, data accuracy was one of the reasons the Journal News cited for taking the map down, according to the publisher’s note that announced the removal. It also appears to be one of the reasons cited for a similar database the Roanoke Times introduced and subsequently removed back in 2007, according to a note from the publisher of that paper.

    More challenging than mere accuracy for large data sets, is that what’s accurate one day might be inaccurate the next — again, a factor in the Journal News’ decision to pull down the map.

    The changing nature of accuracy was one of the key concerns the New Products Development Team at the Tampa Bay Times faced when developing a mug shot site back in 2009, said Matt Waite, who was part of the team and today is a journalism professor at the University of Nebraska.

    “You have to ask yourself, how long is your data valid,” Waite said, “how long is it good?”

    Waite and his colleagues didn’t have a reliable answer to that question when it came to arrest records. Though the record of each arrest was unlikely to change, it seemed unfair to publish an arrest record and then neglect to follow the case through the court system simply because it was a challenge they couldn’t handle programmatically, he said.

    Out of concern for privacy and a desire to avoid building a “background check tool” for the Tampa area, the team decided to take some steps to protect the privacy of those arrested.

    Waite and team told scrapers running not to scrape the mug shot pages. They did it twice, in fact — first in the site’s robots.txt file and again in the HTML of the individual pages. Then they came up with a way to house the names of arrestees in the JavaScript for each page, a non-standard way to handle names and one not likely to be picked up by bots, Waite said.

    As a final protection both against publishing inaccurate data and against creating an undue burden on those arrested, all photos are deleted from the site after 60 days.

    Waite’s team demonstrates data and potential invasion of privacy of private citizens are challenges, but not insurmountable. Creativity allows you to make illustrate a story without trampling on potential privacy concerns.

    “You have options as a developer,” said Ben Welsh, a database producer at the Los Angeles Times.

    For example, the Memphis Commercial Appeal publishes a database of handgun carry permit holders, including full name, city, and ZIP code. The information provided was not all that different from what the Journal News presented.

    The difference is that the database is searchable and doesn’t ever appear in one piece. It works well and searches return broad results. When I enter “Graff” in the last name field, for example, it returns not only exact matches, but also Pendergraff and DeGraffenreaid.

    Another potential solution is to carefully choose what data to publish, which is exactly what the Commercial Appeal did with its decision to publish ZIP codes, but not addresses.

    And therein lies the challenge — “threading the needle,” as Welsh said. The idea is to provide enough information to be useful, but not so much that you’re invading the privacy of ordinary citizens.


    The consequences for the news industry for journalists and others who publish data that the public deems reckless are real. Lawmakers in New York passed legislation soon after the Journal News’ map that allowed permit holders to request confidentiality. Just last week, a group of lawmakers in Maine tried to pressure the Bangor Daily News into withdrawing a request for concealed weapon permits.

    Also last week a Florida lawmaker introduced a bill that would require all websites to remove mug shots within 15 days of being notified that an arrest did not result in conviction. The bill was reportedly inspired by a so-called extortion mug shot site, but makes no distinction between those sites and traditional news sites.

    “If news organizations want to separate themselves from the mug shot racket they need to be conscientious about how they handle public data,” Waite said.

    It’s a good lesson and one that journalists can avoid with some creativity and, perhaps, restraint.

    The real key in publishing data, as in other journalism, is to add context and nuance to it.

    “To republish something with out any insight or analysis is a low form of journalism,” Welsh said. “With data — as with all things in journalism — we should strive not to be stenographers.”

    Ryan Graff joined the Knight News Innovation Lab in October 2011. He previously held a variety of newsroom positions — from arts and entertainment editor to business reporter — at newspapers around Colorado before moving to magazines and the web. In 2008 he won a News21 Fellowship from the Carnegie and Knight foundations to come up with innovative ways to report on and communicate the economic impact of energy development in the West. He holds an MSJ from the Medill School of Journalism and a certificate in media management from Northwestern’s Media Management Center. Immediately prior to joining the Lab, Graff led marketing and public relations efforts in the Middle East.

    Image posted to Flickr by Steven Vance.


    The Knight Lab is a team of technologists, journalists, designers and educators working to advance news media innovation through exploration and experimentation. Straddling the sciences and the humanities the Lab develops projects, prototypes and innovative bits of code that help make information meaningful, and promote quality journalism, storytelling and content on the internet. The Knight Lab is a joint initiative of Northwestern University’s Robert R. McCormick School of Engineering and Applied Science and the Medill School of Journalism. The Lab was launched and is sustained by a grant from the John S. and James L. Knight Foundation, with additional support from the Robert R. McCormick Foundation and the National Science Foundation.

    Tagged: developers gun permits journal news journalism map newsrooms public data

    Comments are closed.

  • Who We Are

    MediaShift is the premier destination for insight and analysis at the intersection of media and technology. The MediaShift network includes MediaShift, EducationShift, MetricShift and Idea Lab, as well as workshops and weekend hackathons, email newsletters, a weekly podcast and a series of DigitalEd online trainings.

    About MediaShift »
    Contact us »
    Sponsor MediaShift »
    MediaShift Newsletters »

    Follow us on Social Media