• ADVERTISEMENT

    How Can the Media Industry Attract Much-Needed Data Scientists?

    by Jess Duda
    October 23, 2014
    The job board at Strata+Hadoop last week in New York showed just how in demand data specialists are. Photo by Jess Duda.

    Across industries, data scientists are the hot and scarce commodity to help organizations collect, clean, and analyze the growing sea of “big data.”

    Journalism has an acute need for data analysis, especially to monetize cutting-edge features and keep large, powerful institutions accountable.

    "Rather than already having the expertise [of a data scientist], it’s more important that you can ask questions and not be blocked by something you’re unfamiliar with." -Laurie Skelly

    But data scientists don’t come cheap, and the media industry is still hobbling through its unstable financial future. WSJ.com recently reported that many traditional scientists are forgoing academia for higher-paying data science positions at Yelp, Airbnb, Etsy and other Internet behemoths. The New York Times was the lone media organization noted.

    ADVERTISEMENT

    Adding to the financial constraints of attracting data talent, media companies are also asking a lot of the people they’re looking for. Job advertisements for media data scientists often ask for well beyond the already, multi-faceted role. This consists of a full stack developer, statistician, and visual designer, who has knowledge of government data sources and the mind of an investigative journalist. It’s reminiscent of when online video exploded; media outlets recruited for one person to be the producer, videographer, editor — or “preditors.” That’s a lovely title indeed. Yes, J-schools, such as Columbia’s Lede Program, are creating new degrees to churn out new graduates to combine data scientist/journalist roles. Over time, we’ll find out whether one person can handle cleaning data, coding, and being an effective reporter and storyteller.

    Last week at the Strata Hadoop World 2014 conference in New York, it became quickly apparent that the problem of filling this multi-faceted role is not limited to media companies. And so, publications have much to learn from what’s happening with data talent across other industries.

    To illustrate the conundrum, see the Venn diagram below by Steve Geringer, a machine learning consultant. He adapted it from the original by data scientist Drew Conway, whose skill set is certainly unicorn-like.

    ADVERTISEMENT

    Data Scientist Unicorn by Steve Garinger

    Debate: Do Data Scientists Need to Know How to Code?

    To avoid the futile search for a “unicorn,” what are the key skills a data scientist should have?

    Whether they need to code was up for an Oxford-style debate at Strata Hadoop World 2014. Joseph Adler of Interana and Scott Nicholson from Poynt defined a data scientist as one who discovers insights from data. New tools, such as Paxata, Tamr, and Trifacta, allow for cleaning and analyzing data without coding. They defined writing code as creating original commands using a programming language, not typing in pre-existing commands into developed software programs, such as Excel. They qualified, however, to be an efficient data scientist, especially to produce new products, coding is required. Adler noted that top data science teams are hiring Ph.D.s for their analytical skills; any necessary coding is taught on the job.

    Strata Hadoop World 2014 Debate on Data Scientists

    Joseph Adler (Interana, Inc.), Scott Nicholson (Poynt), Lucian Lita (Intuit) and Hilary Mason (Accel Partners) at Strata Hadoop World 2014

    ‘Team yes code’ was Hilary Mason from Accel Partners and Lucian Lita of Intuit. Mason emphasized that data scientists need to be able to interact with systems where data live, which is in computer, and requires coding, however limited. Lita argued solving ambiguous problems with heterogeneous data by building models and going into production, requires an iterative process of agile development. Coding is fundamental to agile, otherwise,”we would have to be dependent on others to get the inputs,” he said.

    In the media context, these two viewpoints depend on the goals. Does the media outlet require collecting original data and the labor-intensive cleaning? Does it need to build custom products to distribute the data? Then yes, a data scientist who can code and conduct statistical analysis is key. Outsourcing is a more affordable option, but to report effectively on the critical issues of our time, the team needs to be in-house.

    For outlets that don’t want to collect original, messy data, their journalists with statistical skills can use clean datasets from government, academic, and non-partisan sources analyzed with statistical software and produced with out-of-the-box tools, such as Tableau and Mapbox. The conference trade show had a plethora of new ‘business intelligence’ vendors that offer both analysis and visualization tools. Conferees noted open source and proprietary tools have exploded in the last two years, so who knows what will be possible in a year or two.

    Training Data Scientists Boot Camp Style

    Laurie Skelly, Data Scientist of DataScope and Metis Data Science Boot Camp

    Laurie Skelly, data scientist of datascope and Metis Data Science Boot Camp

    If a media organization were to recruit a data scientist, Laurie Skelly provided an excellent overview of both the soft and hard skills needed. Skelly is a data scientist at DataScope and an instructor at the newly formed Metis’ Data Science Bootcamp.

    The soft skills include: Curiosity, creativity, grit, and humility to admit when things don’t work. The full-time, 12-week boot camp format provides group work and community support to fend off isolationism common to “impostor syndrome,” and the short deadlines seek to prevent perfectionism. “Rather than already having the expertise, it’s more important that you can ask questions and not be blocked by something you’re unfamiliar with,” Skelly said. Good advice indeed.

    The hard skills are in terms of the following project-based model created by Hillary Mason and Chris Wiggins, which span development, statistics, web, and communications.

    Project Phases to Conduct Data Science
    Create a Goal
    Obtain Data
    Scrub
Model
    Explore
    Interpret
    Communicate

    Skills & Tools
    Data Acquisition
    Data Exploration
    Machine Learning / Statistics
    Computer Science
    Web development
    Data Visualization
    Domain Awareness (in a new field for the student)

    Data Science Team Roles

    Amy Gaskins on Data Science Teams

    Former military intelligence data analyst Amy Gaskins discusses how to build a data science team.

    Of the rare unicorns that do exist, it can be risky to depend on one person for data science capabilities, according to Amy Gaskins of MetLife’s Global Technology & Operations. She is not a developer and describes her skills in her bio as “integrating disparate data sets and identifying the correlations and patterns within them.” Gaskins advised recruiting team members with unique skills and cross-train to build resiliency. The team should consist of the following roles, which could be applied to media developers, producers/reporters, and managing editors.

    1. Problem solvers: Engineers who understand the problem a well as the technologies, code, and infrastructure needed to create a solution.
    2. Translators: A communicator who speaks the language of the business.
    3. Medic: A resource gatherer who can break barriers. She/he has the network and charm to get access and resources to overcome the king of “no” the team may confront.
    4. Leader: The team manager who understands the problem and how to fix it. They are essential and “their background is irrelevant.”

    Gaskins advised against hiring applicants who hesitate to give an opinion, especially to a controversial question, because it risks indecisiveness when the leader is absent. Versatility is key because new problems arise and positions change. Lastly, interview and chose new hires together with a unanimous yes to maintain team cohesion.

    The Prospects for Data Resources in Journalism

    Overall, where journalism can offer much value to data science is the communications piece, particularly with genres based on narrative arcs and visual media. Developer-statisticians aren’t trained to be Emmy- or Oscar-winning storytellers, regardless if they are subject matter experts or can code visualizations. Whereas data scientists can provide insights and evidence that may lead to  new interview subjects, stories to break, and patterns of decision-making in our most powerful institutions. The prospect of bringing media storytellers and data scientists together is extremely exciting and can be a real game changer for the future of journalism.

    Jess Duda is a digital content strategist and producer developing a big data tool to track the policy process. Previously, she was at PBS Digital working on digital strategy with national and local web producers as well as product development.

    Tagged: data journalists data science strata hadoop world 2014
    • Vincent Granville

      They can’t. It’s so easy today for a data scientist to set up his lean publishing company, at no or little cost, quickly become very profitable, and compete with digital media companies that have massive overhead costs – mostly HR (editors, IT, sales people) but also infrastructure. That’s what I did.

  • ADVERTISEMENT
  • ADVERTISEMENT
  • Who We Are

    MediaShift is the premier destination for insight and analysis at the intersection of media and technology. The MediaShift network includes MediaShift, EducationShift, MetricShift and Idea Lab, as well as workshops and weekend hackathons, email newsletters, a weekly podcast and a series of DigitalEd online trainings.

    About MediaShift »
    Contact us »
    Sponsor MediaShift »

    Follow us on Social Media

    @MediaShiftorg
    @Mediatwit
    @MediaShiftPod
    Facebook.com/MediaShift