With Tools, Tables and Tours, ScraperWiki Wants to Liberate Data

    by Nicola Hughes
    November 7, 2011


    As part of the Knight News Challenge entry, we at ScraperWiki said we would roll out Journalism Data Camps across the U.S. We had done what we called “Hacks and Hackers Hack Day“ events across the U.K. and Ireland, bringing journalists and coders together. This happened at the same time as HacksHackers in the U.S. — great minds and whatnot!

    Now we’re scaling up when it comes to exploring the data prospects of the new world. We are heading across the U.S. on a data liberation front. But where do we start, and where do we go? Well, firstly we want to liberate data. And lots and lots of people can use data. More importantly, we want to bring together anyone who wants to work with data to tell a story, provide insight or build an application.


    So how do you go about finding where the right mixes are? Well, I scraped the data, mapped it, and visualized it, of course! I scraped media organizations, R, Ruby, Python, and PHP meetup groups, data conferences, some B2B media as well as HacksHackers Chapters, and the top journalism schools. All in all, almost 13,000 data points were collected from different scrapers. So I put them into Google Fusion Tables and voila!

    A heat map gives me the hotspots for the concentrations of data points. These are biased towards the media sector, as there are many more outlets than interest groups and journalism schools. But it’s a good gauge of where we can build interest for the events.


    i-20244c2923951c5bc43e9e190fd1f68c-intensity map.jpg

    Drilling down through the data using filter and aggregate, I got the breakdown of the proportion of the groups we want to reach for each locale. With some rough and ready image manipulation (I use Gimp as it’s open source), I mashed up a visualization scaling the pie charts so that the pixel radius corresponds to the size of the dataset for that location.

    i-21a7e66c74f34c587f09bf692feb8250-USA Sectors.jpg

    Now, it’s not an exact science nor is it news site-ready. But the speed in which I can look for a guide from data is now set to the digital time clock. 13,000 data points collected, cleaned and visualized in half a day. This is now a loose guide but also a tool. And this is the sort of quick thinking, quick gathering and quick analyzing we want to see at our events. So think big data. Think multiple sources. Think multiple tools. And then you can extrapolate for multiple uses!

    We haven’t settled on our tour locations yet, so watch our blog for details. We’re also getting clues for where to go from the data underground, so don’t think the data is giving everything away. We hope to see you there!

    Tagged: data hacks and hackers hack day scrapers scraperwiki tables tools tour

    Comments are closed.

  • Who We Are

    MediaShift is the premier destination for insight and analysis at the intersection of media and technology. The MediaShift network includes MediaShift, EducationShift, MetricShift and Idea Lab, as well as workshops and weekend hackathons, email newsletters, a weekly podcast and a series of DigitalEd online trainings.

    About MediaShift »
    Contact us »
    Sponsor MediaShift »
    MediaShift Newsletters »

    Follow us on Social Media