A Look Back at News Hack Day SF

    by Aine McGuire
    July 20, 2012

    This is a guest column by [ScraperWiki’s](https://scraperwiki.com) [Thomas Levine](http://thomaslevine.com), an awesome data scientist who spends his time roaming the globe finding interesting data and doing stuff with it.

    Blogging about [News Hack Day SF](http://newshackdaysf.tumblr.com), which brought together journalists, developers and designers for several days of creative news coding and data reporting, is so
    [weeks](http://allthingsd.com/20120626/it-may-not-be-televised-but-the-journalism-revolution-will-be-hacked/) [ago](http://newshackdaysf.tumblr.com/post/25857744845/thank-you-newshack-day-wraps-up), but indulge me nonetheless as I brag about what I did four weekends ago.

    1. Before the weekend

    I arrived in San Francisco on Tuesday and stayed with a friend in his crazy warehouse in Oakland until Thursday, attending a nerdy talk every night.

    On Thursday, we prepared some things for the Friday at the Chronicle. Then we had lunch at a hilarious diner; look at their authentic retro Wi-Fi!


    It’s so retro that the waitresses take orders on iPads! Oh, and the food was quite awesome.

    I stayed the rest of the nights at the StartupHouse. That was rather crazy as well (in a different sense than the converted warehouse sense).

    1. Friday’s agenda
      We had our scraping tutorial on Friday morning. The turnout was pretty good.


    If you’re jealous about your non-attendance, you can take a look at the video (eventually) and the [slides here](http://scraperwiki.thomaslevine.com).

    Event chefs Sam Lippman and Tim West made us some rather tasty lunch after the tutorial; then we broke into eight groups, nominally to work on particular projects, but really to fiddle around with computers and hang out.

    MajorPlanet Studios’ [Michael Coren](http://www.majorplanetstudios.org), Center for Investigative Reporting’s [Michael Corey](http://www.mikejcorey.com) and [I](http://thomaslevine.com) hovered around helping various groups. I helped a bunch of journalists with more elementary programming things, but I also helped some people who had particular projects in mind.

    One person had come across some Microsoft Access databases of San Francisco health inspections posted online. We looked into ways of converting them to more convenient table formats and pondered things to do with them. If you are bored, I
    recommend that you post all of the health inspection reports on Foursquare or Yelp.

    One group was looking at a particular table in tax filings, so I had some fun pulling that table out with them.

    Another group huddled in the corner and then announced at the end that they had managed to extract quotes from press releases with bearable reliability. They continued working on this over the rest of the weekend.

    1. Over the Weekend

    Through the weekend, Michael, Michael and I continued hovering around helping various groups.

    I wound up mostly helping people with some elementary programming and web
    development stuff. Maybe that’ll get them started on learning more about that and making cool things.

    People always ask me how to save tweets, so I also showed off the function that I discussed in an [earlier post on ScraperWiki](http://blog.scraperwiki.com/2012/07/04/twitter-scraper-python-library/).

    Chef West made more awesome food all weekend long. Look at our Sunday breakfast.


    I even got to tackle one of the hardest problems in Computer Science. As I was
    working with one group, it became clear that the group wanted to make a clone
    of Needlebase. We had figured out the easier parts — what we want in general, what to do first, how to do it, etc. But what would we name it?

    As computer programmer Phil Karlton once said, “There are only two hard problems in Computer Science: cache invalidation and naming things.”

    Considering Needlebase’s name, my wisdom led me to “Haystack.” And engineer Randall
    Leeds upped its hip factor by turning it into “Haystacks” and replacing “stacks” with “stax” — so it’s “Haystax.”

    1. Data Projects

    Let’s not forget the weekend’s data projects.

    The group that had huddled in the corner on Friday huddled in a different corner over the weekend. On Sunday, they unveiled their creation, On the Record. It lets you search quotes from news articles by person over time.

    [Haystax](http://haystaxdata.org) created enough functionality to scrape one of the test case sites identified by Reporters Lab. And for this, they won the ScraperWiki prize.

    And if there had been a 404 page prize, [Bird Dog](http://birddogit.herokuapp.com)
    would have [won](http://birddogit.herokuapp.com/404.html).

    1. The aftermath

    After all this craziness, I left San Francisco the following Monday night for ScraperWiki’s Liverpool office, which is where I am right now. Quite hilariously, this is the first time I’ve ever been to the office or anywhere in England.

    All in all, the event was a success — kudos to those who helped make it happen!

    Tagged: data haystax majorplanet studios news hack day scraperwiki scraping

    Comments are closed.

  • Who We Are

    MediaShift is the premier destination for insight and analysis at the intersection of media and technology. The MediaShift network includes MediaShift, EducationShift, MetricShift and Idea Lab, as well as workshops and weekend hackathons, email newsletters, a weekly podcast and a series of DigitalEd online trainings.

    About MediaShift »
    Contact us »
    Sponsor MediaShift »
    MediaShift Newsletters »

    Follow us on Social Media