Futures Lab Update #111: Investigating Rumors With TwitterTrails; Scraping Web Data

    by Reuben Stern
    June 25, 2015
    The tool TwitterTrails allows users to trace the origins of rumors on Twitter, to see how far they’ve spread and gauge corresponding levels of skepticism. Screenshot courtesy of RJI.

    This week we see how TwitterTrails evaluates the credibility of a rumor spreading in social media, and we learn how new tools are making it easier to extract data from websites.


    "Outwit functions like a web browser for finding data: Input the web page address to Outwit's URL, and out come words, links and images from the given web page." -- Berkeley Lovelace, RJI Futures Lab

    PART 1: TwitterTrails

    Developed as part of a research project at Wellesley College, TwitterTrails can trace the origins of rumors on Twitter to see how far they’ve spread and gauge the corresponding level of skepticism. We get an overview of how it works from Takis Metaxas, a director of the project, who says the tool ultimately can help journalists determine if a rumor is true or false.
    Reporting by Daniel Shapiro.


    [To skip directly to this segment in YouTube, click here.]

    For more information:

    Journalists can request an account by emailing Takis Metaxas at [email protected] Members of the public can also submit ideas to be investigated via the TwitterTrails website.


    Metaxas provides a more in-depth demonstration and explanation in this video prepared for the Journalism/Interactive conference held in April

    The TwitterTrails blog provides case study examples related to specific news events, such as the Ferguson protests.

    PART 2: Web scraping tools

    Several apps and browser plug-ins now enable journalists to quickly extract batches of information from Web pages without learning how to code. We get an introduction to what the tools can do from Michelle Minkoff, data journalist at the Associated Press.
    Reporting by Berkeley Lovelace and Katy Mersmann.

    [To skip directly to this segment in YouTube, click here.]

    A few Web scraping tools:

    • OutWit Hub – A tool that enables users without programming skills to extract links, images, email addresses, and other information. Data can be saved into CSV, HTML, Excel, or SQL databases, while images and documents are saved directly to a hard disk. Free version has a limit on the amount of data that can be extracted.
    • Import.io – A simple online tool where users can enter a Web page’s URL and download the extracted results as a spreadsheet file. There’s also a desktop app that can crawl entire websites. In addition, the company offers a more advanced enterprise option.
    • DownThemAll – A free plug-in for Firefox browser that enables users to download links, images and/or text from a Web page.
    • Kimono – Extractor tool that, without requiring a user to write code, turns material from Web pages into API data feeds for use by other websites, mobile apps, etc. Available for free. A paid enterprise-level plan includes more advanced service and support.
    • Scraper – A “simple (but limited) data mining extension” that works with Google’s Chrome browser.
    • Web Scraper – A free Chrome browser extension that follows a site map adjusted by the user to extract information from multiple related Web pages. The company also offers an enterprise-level data extraction service.

    For more information:

    A helpful guide, compiled by Samantha Sunne and Matt Wynn for the 2015 IRE conference, provides several options and instructions for pulling information from websites.

    Reuben Stern is the deputy director of the Futures Lab at the Reynolds Journalism Institute and host and co-producer of the weekly Futures Lab video update.

    FuturesLabWebBanner-mediashiftThe Reynolds Journalism Institute’s Futures Lab video update features a roundup of fresh ideas, techniques and developments to help spark innovation and change in newsrooms across all media platforms. Visit the RJI website for the full archive of Futures Lab videos, or download the iPad appto watch the show wherever you go. You can also sign up to receive email notification of each new episode.

    Tagged: data data journalism import.io outwit hub scrapers Twitter rumors TwitterTrails virality web scraping

    Comments are closed.

  • Who We Are

    MediaShift is the premier destination for insight and analysis at the intersection of media and technology. The MediaShift network includes MediaShift, EducationShift, MetricShift and Idea Lab, as well as workshops and weekend hackathons, email newsletters, a weekly podcast and a series of DigitalEd online trainings.

    About MediaShift »
    Contact us »
    Sponsor MediaShift »
    MediaShift Newsletters »

    Follow us on Social Media