X
    Categories: Uncategorized

Coder-journalist: Governments Should Open Up Their Data

Ryan Mark, one of the first two winners of our journalism scholarships for computer programmers, wonders why it’s so hard to get usable government data.

I wrapped up my second quarter of journalism school and my daily reporting class a couple of weeks ago. Learning firsthand what goes into a simple news article gave me a new-found respect for the work that’s required. Making call after call, leaving messages with people who will never call you back, and then taking notes while paying attention to what somebody is saying is quite a difficult way to spend a day.

The Internet makes a lot of the job much easier that I can imagine it used to be, but I still ran into some basic roadblocks that no amount of global communications technology can breach.

I wrote a story about information systems in nursing homes at the end of May, and in the course of my research I had to make use of Medicare and Illinois Medicaid data.
Medicare has a comprehensive online resource of most if not all the nursing homes in the United States, along with metrics such as size, whether they accept Medicare or Medicaid, and if they’ve had citations from regulatory agencies. The data was on the Medicare site, but it wasn’t easy to work with.

All I wanted was a list of nursing homes in Cook County, Illinois, that accepted Medicaid, ordered by the number of residents the homes had, from smallest to largest.

Anybody who has worked with a database or Excel knows that this shouldn’t be difficult but Medicare’s web interface wasn’t built to handle a complex request. No problem, I thought. The site offers a way around this: you can download their database!

But it’s in Microsoft Access format.

Since I use a Mac, to access this ‘public’ information, I needed to install Windows XP and Access on my MacBook. I was lucky I had Windows installed already and an old copy of Office 2003 Professional lying around, and still remembered how to use Access.

It only took me a few hours to generate my list of 20 nursing homes.

Why couldn’t have they put the data in a text file? Access can save tables to a text file. I could have used Open Office, Apple’s Numbers or Excel, and I could have viewed it on my Mac or on Linux. I could have written a new web page with tools to work with the data, the tools that the Medicare site couldn’t provide. I could have uploaded it to Many Eyes.

And Medicare does a better job with its data than the Illinois Department of Healthcare and Family Services does with Medicaid information.

The Illinois Medicaid website has lists of PDF files for each nursing home that receives Medicaid in the state. It’s a wealth of information, but it’s very difficult to write a program to pull useable information out of those files.

If you don’t feel like wading through hundreds of PDF files, you call the DHFS’smedia department and talk to a nice woman named Penny. You tell her the zip codes, date range and what kind of numbers you want, and she calls back in a day or two.

Most of this data should be transparent and easily accessible. I shouldn’t have to call up the media department to get numbers. It’s our government, so the data should be ours, where it’s not protected by privacy laws.

There are people working on getting government agencies to provide information in a usable format. From talking to Adrian Holovaty briefly at the recent Future of Civic Media Conference, the folks at Everyblock deal with these problems on a regular basis. Everyblock, along with other interested organizations have put together the 8 Principles of Open Government Data. Organizations like the Sunlight Foundation, and programs such as Sunshine Week are trying to bring more attention to government transparency, and doing it in a web-friendly way.

I think government agencies should focus on getting data out there in a standardized format, and I would venture to guess that in many cases they will need help implementing that standardized format.

Rich Gordon :Rich Gordon is a professor and director of digital innovation. At Medill, he launched the school’s graduate program in new media journalism. He has spent most of his career exploring the areas where journalism and technology intersect. Prof. Gordon was an early adopter of desktop analytical tools (spreadsheets and databases) to analyze data for journalistic purposes. At The Miami Herald, he was among the first generation of journalists to lead online publishing efforts at newspapers. At Medill, he has developed innovative courses through which students have explored digital content and communities and developed new forms of storytelling that take advantage of the unique capabilities of interactive media. In addition to teaching and writing about digital journalism, he is director of new communities for the Northwestern Media Management Center, where he is responsible for a research initiative focusing on the impact of online communities, including social networks, on journalism and publishing.

View Comments (2)

  • That state law is a step forward, but it doesn't quite address the kind of openness that Ryan is talking about. We've been grappling with similar issues at Gotham Gazette -- New York City actually has a six year old law that requires documents to be posted electronically but they're almost universally provided as PDFs, and sometimes these are giant PDFs that contain scanned images of tables of data.

    At the state level I could ask for another format, but there is no good reason the city and state (or Medicaid) can't post that data in a universally readable format alongside their PDFs.

Comments are closed.