How Metadata Can Eliminate the Need for Pay Walls

August 18, 2010

The Library of Alexandria offers a surprising lesson about today's media world

You have to admire his chutzpah. Rupert Murdoch, the so-called nemesis of public interest news, is now being hailed by some as its potential savior. Sick and tired of people reading his news outlets for free online, Murdoch has erected pay walls around his sites (or some of them at least).

Anyone who wants to see what is published on thetimes.co.uk will have to pay at least £1. That includes search engines who are not even allowed to index the Times’ online content. Now we have to wait and see if the subscription revenues start rolling in.

Yet even those who hope the pay wall succeeds have reservations. Pay walls represent both a practical and philosophical shift in the provision of news on the net. They represent a shift from the openness that has defined the early history of the web, to a closed world much more reminiscent of the 20th century’s constrained media environment. Erect a pay wall and you immediately cut yourself off from much of the web community. You disable the vast majority of people from recommending, linking, commenting, quoting, and discussing.

It is for this reason that any forward thinking journalist cannot help but be disheartened by the pay wall. It cuts you off from a much bigger potential audience. It suffocates networked journalism, whereby you engage with your readers to source, expand, deepen, and extend your story. It limits your opportunity to enhance your own brand, as opposed to that of the publication. But worst of all, it turns its back on the reason for the net’s success — the flowering of millions of conversations. As the lawyer who stopped writing for the Times after it put up its pay wall said, “inside the paywall no-one can even hear you scream.”

Fortunately, there is an alternative. A way in which news can remain distributed, open, even re-usable. A way in which journalism can work with the grain of the web, and continue to grow, extend, and integrate. And it is a way — crucially — that journalism can still make money.

But first, a story.

Library of Alexandria

In the fourth century BC, a student of Aristotle, Demetrius of Phaleron set up a library in Alexandria. It was a little different from the libraries we’re now familiar with. It had lecture halls, a dining room, meeting rooms, and a “walk.” It also had a reading room and lots of books (or scrolls as then were). Within a few decades it had acquired almost half a million scrolls, many containing multiple works. Such an abundance of scrolls would quickly have become unmanageable had it not been for Callimachus of Cyrene. Callimachus started “the first subject catalogue in the world, the Pinakes,” according to Roy Macleod in “The Library of Alexandria.” This was made up of six sections and catalogued some 120,000 scrolls of classical poetry and prose. His methods were then adopted and extended by other librarians.

Thanks in no small part to the cataloguing, people were able to build on each other’s knowledge. Scholars began to compare the texts and try to understand the reasons why they differed. Hence cross-textual analysis was born. People were able to contrast and evaluate various scientific methods. Archimedes (of “Eureka” fame) worked out methods for calculating areas and volumes while at the library that later formed the basis for calculus.

The library at Alexandria became the most famous of the ancient world, and spawned many further libraries and even whole university towns such as Bologna and Oxford. Yet had its books not been catalogued none of this might have happened. Had the books not had metadata giving basic details about who wrote them, when they were written, what they should be classified as, then there would not have been the foundations on which scholars could build.

Metadata is just a fancy word for information about information. A library catalogue is metadata because it categorizes the books and describes where you can find them. You find metadata on the side of every food packet, only we don’t call it metadata, we call it ingredients. The equivalent metadata about a news article would capture information about where it was written, who wrote it, when it was first published, when it was updated. All pretty basic stuff, but critical to properly identifying it and helping its distribution.

Importance of Metadata

Metadata did not matter so much when news was all tidily packaged together in a newspaper. You knew when something was published because it was inside that day’s paper. You knew who had published it because it was on the masthead and at the top of every page. There was — is — lots of metadata about news in newspapers, we just tend to take it all for granted.

The Internet, and the search engines and social networks that power the web, have broken the newspaper package down into discrete pieces of content. These atomized chunks — individual news articles, photographs, video clips, audio clips — are what we consume online. We do not read an online paper cover to cover, as we would a print paper. That would be exhausting. The BBC news website publishes about 150,000 words each day. To skim every individual article would take upwards of 17 hours. Instead we pick and choose, we unbundle.

Rather than seeing unbundling as a problem, news outlets should see it as an opportunity. An opportunity to distribute news all around the web. An opportunity to get readers to help sell their news – by recommending pieces to their colleagues and friends, and by linking to stories from their networks and blogs. The only thing news producers need to do before publishing a news article, is make sure it has metadata integrated to it. This way whenever people — or machines (i.e. search engines) — see it, they can also see its provenance, recognize what category of information it is, and give credit to its creator.

Having basic information about who produced something is to the mutual advantage of the person who wrote the article (or took the photograph or shot the film footage), and of the public who is reading it. The producer gets proper credit for what they created, and the public gets to see who created it — giving the news greater transparency and a measure of accountability.

When you think about it, it seems remarkable that so much content does not have this sort of metadata already. It is like houses not having house numbers or zip codes. Or like movies not having opening or closing credits. Or like a can of food without an ingredients label. As Jeff Jarvis wrote recently, “When it comes to products, we want to know: where it was made, by whom, in what conditions, using what materials, causing what damage, traveling what distance, with whose assurances of quality, with whose assurances of safety.” Why should news be any different?

hNews

hNews is just one of a number of methods of adding metadata. It is a simple, open standard that is free and that anyone can implement. We at the Media Standards Trust Britain developed it in partnership with Sir Tim Berners-Lee’s Web Science Trust, and in the latter stages by working with the Associated Press. (This was made possible thanks to two foundation grants, one from the MacArthur Foundation and one from the Knight Foundation. You can read my blog posts about the development of hNews over at Idea Lab, a Knight-funded sister site of PBS MediaShift.)

There are other ways to add metadata to news, for example using RDF or linked data. hNews is an easy entry point since it is built on existing standards (microformats), fits easily within any CMS (there is a WordPress and a blogger plugin), and is entirely reversible. Almost 500 news sites in the US have already implemented hNews, including the Associated Press and AOL. But you choose whichever one suits you best. (Some sample implementations are available here.)

Once hNews is added there are some immediate benefits. Every news article has consistent information about who wrote it, who published it, when it was published etc. built into it. Every article also has an embedded link to the license associated with its reuse (so ignorance is no excuse). And, every article has a link to the principles to which it adheres. These principles should not only help to distinguish the article as journalism, but should make the principles that define journalism — that are right now opaque and little understood by the public — transparent. Moreover, all this information is made ‘machine-readable’ by hNews. In other words a machine (like a search engine) can understand it.
Making this information machine-readable opens up the less immediate, but more exciting aspects of metadata. It creates an ecology of structured data that makes search more intelligent, enables innovation, and opens up new revenue opportunities.

It is a little known truth that much of the evolution of the web has already been driven by open standards. And that many of the uses of open standards are not at first apparent to those who create them. Who could have known that RSS (Really Simple Syndication) a simple standard for syndicating web content, would now be the way millions of people consume audio podcasts? Or that OAuth and OpenID would so simplify the sharing of private information across websites?

The openness and re-usability of hNews enables people to build stuff with it and on top of it. It allows you, for example, to add a “news ingredients” label to the bottom of each article. This is what Open Democracy are doing. Under each article that has hNews embedded they will automatically add an hNews icon. Scroll over this icon and you will get a pop-up box with all the basic details of the article (author, publish data/time, principles etc.). Rather like the ingredients on a food packet. Some of this information is hyperlinked so that you can click directly through to more information — like the license associated with re-use of the article. Imagine labels like these on all news articles. At a stroke you would have transformed their transparency and accountability.

Embedding metadata like hNews has countless other potential uses. As a simple illustration of the type of thing it enables, we built a browser plugin – itchanged.org – that allows you to track changes in news articles. Another application might be more intelligent recommendations (e.g. see readness.com). But most importantly, structuring data creates an environment in which invention becomes possible — in the same way, for example, that library catalogues do.

AP News Registry

It can also help news organizations work out ways to make money. For example, the Associated Press has built its News Registry on top of hNews. The news registry is AP’s way of tracking its news around the web so that it has much better metrics that it can use to charge more accurately for its content, and work out revenue sharing opportunities for advertising associated with its content.

How it does this is pretty straightforward. In addition to hNews the AP embeds an image file, probably a transparent pixel, to each news article. This file is equivalent to a photograph in a web page, except that it is not intended to be seen. But like a photograph in a web page, this image file has to be served up from a separate server — in this case AP’s servers. So whenever the article is viewed on a computer, the browser (Internet Explorer, Firefox etc.) notices the image file and asks AP’s server to deliver it. That way the AP knows who is reading the article. It’s a little like a carrier pigeon. The pigeon can fly wherever it likes but always knows where its home is.

Pay walls will rise and pay walls will fall. But in the world of information abundance in which we now live pay walls are a step backwards. If news wants to benefit from the remarkable openness and dynamism that the internet has unleashed then it should embrace the distributed network and take advantage of it, not turn its back.

Martin Moore is the director of the Media Standards Trust, a nonprofit organization that aims to foster high quality journalism. He has been working in news and media for more than a decade, including for the BBC, Channel 4, NTL, IPC Media, Trinity Mirror and others. Moore studied history at Cambridge and holds a doctorate from the London School of Economics, where he was teaching and researching until summer 2006.

Tagged: ap hnews library of alexandria media standards trust metadata microformats pay wall rupert murdoch

6 responses to “How Metadata Can Eliminate the Need for Pay Walls”

Peter Pinch says:
August 18, 2010 at 8:38 pm
The first hnews link returns a 404.
Mark Glaser says:
August 18, 2010 at 8:58 pm
Sorry about that broken link. I fixed it, there was a typo attached to it, so that was the problem.
Steve Pinch says:
August 19, 2010 at 12:54 am
This seems to be mostly about ‘why metadata is groovy’ which I think most media cos worth their salt have known for 10 years or more, than why it replaces paywalls. The problem for most online media companies isn’t lack of traffic – they have loads of it – it’s lack of quality, targeted ad inventory and also subscription revenues. Metadata is important, but not really for the reasons you state.
John Hamer says:
August 19, 2010 at 9:45 am
Great piece, Martin. I agree that hNews is a major step toward transparency, accountability and openness. Here’s another one that could help: the “TAO of Journalism” pledge and seal. For details, see http://taoofjournalism.org. We just launched our new website and the seal is now available to post!
John Hamer says:
August 19, 2010 at 10:17 am
Corrected link: http://taoofjournalism.org

It’s no longer “under construction,” but is live now!
Paul Pinch says:
August 19, 2010 at 6:40 pm
DRM is another tag that can be added to content.

Who We Are

MediaShift is the premier destination for insight and analysis at the intersection of media and technology. The MediaShift network includes MediaShift, EducationShift, MetricShift and Idea Lab, as well as workshops and weekend hackathons, email newsletters, a weekly podcast and a series of DigitalEd online trainings.

About MediaShift »
Contact us »
Sponsor MediaShift »
MediaShift Newsletters »

Follow us on Social Media

@MediaShiftorg
@Mediatwit
@MediaShiftPod
Facebook.com/MediaShift