Interviewing is a skill. I’ve heard this repeatedly as a college student, at journalism conferences and in newsrooms. Not all of us are born with the ability to form penetrating questions or to extract crucial details from sources. Like any skill, interviewing takes practice and preparation.
The same is true for interviewing data.
Journalists who wouldn’t consider themselves “data journalists” already have the necessary foundation for asking good questions of data. Just as you would background a person you were interviewing for a story, data has its own history. Interviewing data takes some practice, but the benefits for journalists are plentiful. You’ll learn how to find the holes in data (and there are always holes), how to avoid misinterpretations and how to find better stories.
Chances are that you already have the basic tools in place to interview data: your eyes and a spreadsheet. Although we ask computers to do a lot for us these days, interviewing data involves a lot of hands-on work. There’s a basic process I recommend for interviewing data, and the pattern will be familiar to you if you’ve ever prepared for an interview. Although this process is a general one, if you want to try it out with actual data, try this file of arrest data from Fairfax County, Virginia. I used that data for a talk at the Journalism/Interactive conference. (From which, University of Florida professor Mindy McAdams put together a very helpful guide.)
- Preparing for the Data: Working with data isn’t like a person-on-the-street interview. You have to know what you’re asking for when you request data. Just as you would read other interviews and stories about a person you’re set to interview, with data you can try to find other stories or research papers that make use of the specific data, paying attention to caveats or problems about the data. You also should try to speak to people who maintain or use the data and obtain any documentation about it. In other words, don’t treat this as a blind date – know as much about the data as you can before you actually meet it.
- The Introduction: As with any personal interview, even the best preparation doesn’t guarantee that what you expect to happen will happen. The first encounter with data is the time to find out whether your expectations are in line with what’s in front of you. Typically that means opening the data in a spreadsheet like Excel or Google Spreadsheets or maybe looking at it in a text editor. Since this isn’t a real person looking back at you, be as skeptical and intrusive as you can. You’re looking for obvious flaws in the data, such as blank rows or partial information.
Bear in mind that most government data is created for a particular purpose, which might be to fulfill a legislative mandate or to aid elected officials in making a decision. It might not be created
to make a journalist’s job easier. Unless you personally created the data you’re looking at – and even that’s no guarantee, since we all make mistakes – never assume that the data in front of you is complete and mistake-free.
If you’re using a spreadsheet (and I highly recommend you do if you have the chance), you’ll be looking for one thing right off the bat: a header row. That simply means that the first line in the spreadsheet contains descriptions of the data, not the data itself. If your data has no header row, make getting one your first priority and don’t proceed until you have it.
- Running the Traps: Consider this the “getting to know you” phase of the data interview. You’ve met and have established some basic bonafides, but now it’s time to get down to business. With a spreadsheet, sorting and filtering are your best friends. Sorting is simply ordering the values in a column in one direction (say, smallest to largest or alphabetically). Using our arrest data as a template, you might start by sorting on the age of the person arrested, to see if everyone has an age listed and what the ranges are. A sort is the first programmatic question in your interview: “Tell me about yourself?” Sorting gives you information about the extent of the data – how far it goes – and its consistency. For example, sorting data containing addresses will show if streets and cities are spelled the same way, and that can help tell you whether you will need to standardize your data so that “Street” is always “St.” or “Saint Louis” is always “St. Louis.” Filtering gives you the ability to drill down into the data to find those proverbial needles in haystacks – for example, isolating the 21-year-olds in the arrest file. You can consider filters to be a step up in the sophistication of your questioning, since they allow you to ask very specific questions such as “How many arrests were made for burglary?” One of the most important things that sorting and filtering can accomplish for your interview is letting you know how flawed your data is. This is like the point in a personal interview when you realize that a source is extremely credible or whether you’ll need to do a lot of verification before using the information you get.
The Nitty Gritty
- Cleaning the Dirt: All data contains errors of some kind, but that doesn’t mean you can’t use it for stories. Just as human sources have their flaws, the quality of data will help guide your reporting and the scope of your stories. Before you ask serious, detailed questions, you’ll need to make sure that you know the data’s limits and weaknesses. This isn’t just a necessary chore; this step in the process will teach you a lot about the data that will prove useful in further reporting. Just like in a human interview, you should keep good notes on what you find and what actions you take as a result.
- Finding Stories: The first four steps in the process can take a lot of time, but without them you’ll never get to where you want to be: finding stories in data (or worse, you’ll find bad stories in data). The approach to the data interview is the same as with a personal interview: ask the easier, broader questions first and then build up towards more focused lines of inquiry.
Here is where you get to have the most fun, since data doesn’t mind endless questioning, even at odd hours or with a repetition that would drive normal interview subjects bonkers. Your queries are only limited by your knowledge of the data and your imagination.The things you’re looking for from the data interview are the same things you look for from any interview: interesting and newsworthy trends or events, and a way to explain them.
Sorting and filtering probably cover about 90 percent of the questions you’d ask of spreadsheet data, but depending on how the information is organized, you might need other tools or approaches. That’s why I like to refer to spreadsheets as a “gateway drug” to data journalism: once you get comfortable with interviewing data using spreadsheets, you’ll probably want to do more.
Derek Willis finds stories in data for The Upshot, a politics and policy site by The New York Times. He has been a reporter and web developer since 1995. Find him online at @derekwillis or on his site.