Media Cloud and Calais

    by Amanda Hickman
    March 12, 2009

    The Berkman Center launched a project called Media Cloud this week, a toolkit that facilitates analysis of trends in the news. The sample visualization on the site now shows world maps that illustrate the number of mentions each country got in Talking Points Memo, the New York Times and the BBC, respectively. I, of course, immediately tried to create a visualization comparing Gotham Gazette to a few other local papers. Lo, though: no Gotham Gazette in Media Cloud.

    I’ve been hearing about Calais lately. I at least got the memo that I’m supposed to know what it is. I gather that Calais has something to do with the sources in Media Cloud. So over to Calais:

    “Calais is a rapidly growing toolkit of capabilities that allow you to readily incorporate state-of-the-art semantic functionality within your blog, content management system, website or application.”

    That clears everything up for you, doesn’t it? There’s more on their about page but there is still a lot of applause for the semantic web and very little in the way of “register for a free account and we’ll help you parse your content in search of semantic keywords” (which, is, roughly, what I was looking for.)


    So: Media Cloud is a cool tool and data junkies (and everyone, really. everyone) should take a look. But I don’t want to be left out of the project, either. So file this as my mildly unhinged rant against insiderism. The Berkman Center is the last place (okay, not the last) I’d expect to find the perspective that only the big kids matter.

    The benefit of the doubt I can offer is that this project is clearly still under production. They want some ideas about what people might use Media Cloud for, maybe just to keep them inspired, maybe to get people talking about it, maybe so they can plan feature rollout accordingly. So maybe they just need to hear from small publications?

    What do you want from them?


    And: back to Calais for a moment. As a publisher, we should be wanting to use Calais to index our archival content. They don’t actually have tools to do that with, they have an API — that means that if you aren’t using an out of the box content management system (we aren’t) you’ll need to write your own interface to Calais. Which brings me to my next question: anyone else writing Calais hooks (is “hook” even the right word?) for their content? How are you structuring it?

    Tagged: berkman calais coverage data-mining mainstream media mediacloud sementic web

    8 responses to “Media Cloud and Calais”

    1. Je suis déçu de voir que Media Cloud ne couvre que des médias anglais. Le monde est beaucoup plus vaste.

    2. Hi Amanda,

      Thanks for your interest in Media Cloud and in the Calais Web service at OpenCalais.com

      Some assembly is still required in most cases, especially larger publishers who tend to be on customized CMS platforms. But we have some 9,500 developers in the community now helping to build code libraries, plug-ins and toolkits of all kinds to help ease the process.

      On our Tools page, we do offer our Tagaroo plug-in for WordPress, and our Marmoset plug-in, which makes it easy for any site to generate semantic metadata and pass it to Yahoo! SearchMonkey.

      In addition, you can find a collection of Calais modules for Drupal at http://www.Drupal.org/project/opencalais or use the new, free OpenPublish platform for Drupal 6 from Phase2 Technology (integrating Calais throughout) at http://www.opensourceopenminds.com/openpublish

      Finally – you can use our SemanticProxy.com service if you would like to do some testing: submit a URL to get back your semantic metadata.

      Hope that helps!

      -Krista, the Calais team

    3. EthanZ says:

      Hi Amanda –

      As you mentioned in your post, this is a pretty early release of an experimental product. We’re tracking a couple hundred newspapers – all English-language, a split between large and small newspapers – and several hundred English-language blogs at this phase of the project. Basically, each source we include requires some work on our part, and this was all we were able to do at this early stage of the project. As we get more support for the work, our plan is to include vastly more sources.

      This isn’t Calais’s fault at all, by the way. We use Calais to analyze the stories we retrieve from newspapers – Calais is able to list people, places and topics mentioned in the stories. But the choice of what media sources to monitor is our fault at the Media Cloud team at Berkman, and we made our choices based on a) the constraints we had regarding staff time and b) the need to create a set of sources to help answer some questions raised by Yochai Benkler about the US political blogosphere.

      We’re also very well aware that the system is deeply limited by its focus on English-language blogs. (The other project I’ve been involved with at Berkman is Global Voices which is almost exclusively focused on non-English blogs.) We decided to focus on English-language sources first because that’s where Calais’s tools are most mature. As we expand our sources, we’re going to be including other languages as well.

      Absolutely no slight was intended towards the countless millions of media sources we’re not currently tracking. This is a small, young and growing project, and this was what we were able to accomplish with current resources.

      -Ethan Zuckerman, Berkman Center

    4. Tom Tague says:

      And – if you just want to check out what Calais can do with content without any keys, coding or whatever here’s what you need to do: 1) Grab some content (an article, blog post, whatever), 2) copy it, 3) head over to http://viewer.opencalais.com, 4) paste it, 5) coolness!

    5. Tom Tague says:

      Prior link picked up some strangeness


    6. amanda says:

      Thanks, everyone for the rapid feedback!

    7. amanda says:

      Michel makes a good point, one worth translating into the lingua franca of the idealab:

      I am disappointed to see that Media Cloud covers only English media. The world is much vaster.

      Which is true, though I think comparing across languages adds a non-trivial layer of complexity to the project, just as including local papers adds a non-trivial level of volume to the project.

    8. arrah says:

      Our Asset disposal service allows for the recycling of redundant IT equipment in an effective, environmentally sound manner.
      Computer Recycling

  • Who We Are

    MediaShift is the premier destination for insight and analysis at the intersection of media and technology. The MediaShift network includes MediaShift, EducationShift, MetricShift and Idea Lab, as well as workshops and weekend hackathons, email newsletters, a weekly podcast and a series of DigitalEd online trainings.

    About MediaShift »
    Contact us »
    Sponsor MediaShift »
    MediaShift Newsletters »

    Follow us on Social Media