One of the main goals of online information design is to present content in a way that allows users/readers to find what they want. Tagging, the digital extension of newspaper sections, is one technique used on just about every modern news website as a way to help users browse or search, but that isn’t the only way it can be useful. Through tagging we can use computers to intelligently distribute content and enhance the media conversation. I’ll take the context of a global aggregation system and go through the way I think this can be done, walking through the steps from start to finish.
Step 1: Assigning Tags
This system will have tags for location, topic, and community, but where will they come from? Relying on people to some extent isn’t the end of the world so long as you are clever about it; for instance, CMU professor Luis Van Ahn turned image tagging into a game. Nevertheless, it is very important to automate as much of the legwork as possible. With this in mind I see tagging coming from four places:
- The Content – When new content is added, the program will extract and analyze any clues buried within. For a news story this might involve phrases and terms that are parsed out of the body; for instance “yinz” could be recognized as Pitssburgh-ese or maybe the author mentions a location. There are plenty of algorithms that can make predictions based on this type of data extraction.
- The Context – All new content is being added by someone or something, be it a person or a feed from a website. That content provider has contextual metadata (i.e. a history); for instance the system could get hints from the tags of previous submissions. If the source is a user the system could take that user’s specified areas of interest under account. Finally, if the story is being submitted as a response to existing content then it probably shares a lot of the same tags.
- The Author – Once the system makes its predictions the author will have an opportunity to fix mistakes. This means that he/she can add or remove metadata to the new content before it is officially submitted. It is very important that there be an opportunity for this to happen, although it is also important that the system does not rely on these corrections – the author might make mistakes or simply skip this step.
- The Swarm – I’ve come to trust the swarm effect, meaning I believe that a system can correct itself so long as every user can suggest changes in a non-obtrusive say (and they have motivation to take advantage of this ability). If the average reader is able to push for the addition or removal of tags this would mean that even if the initial tagging was wrong it would be corrected eventually.
None of these tag sources are failsafe, but since the swarm will correct everything over time the main concern is abuse. Some ideas to combat this: after the submission process no one user would be able to single-handedly add a tag, making it far more difficult to spoof tags or insert irrelevant tags. Also, different weights can be given to individual voices depending on their past interactions with the system (more on this in future posts).
Step 2: Getting User Preferences
This database of tagged content can be used as a way for people to find information through browsing and searching, but once again the system should do as much of the grunt work as possible. In fact, it should actively distribute targeted content to the users who would be most interested without forcing them to dig around.
Before any targeting can happen the system needs to know what the users are interested in. In order to do this it needs to somehow get the user to share what he or she cares about. I think that a lot of people find it creepy when computers start trying to guess their preferences based on what they look at, so here are a few alternate techniques:
- Blunt Elicitation – Asking the user for a lot of information when they join is a terrible idea; the more that users have to do when signing up, the more likely they will get bored and give up. However, you can still get the information, just defer it. Have a section of their profile dedicated to what locations/communities/topics they are interested in or identify with.
- Collecting Hints – It’s hard for a person to sit down and exhaustively list all of his or her interests. In order to account for this the system can collect information over time by asking for feedback about the content they read. Simply asking “Was this interesting?” could yield a lot of information to help the system better serve its audience.
Users will, at some point, specify physical regions of interest. I’m picturing a visual interface where they drag circles or boxes around the locations they care about. They will also identify communities of interest by picking from a list of the communities contained in the areas they drew in the previous step (or from a lengthy universal list). Finally, users can associate topics of interest generally or specific to a particular location or community.
Step 3: Targeting and Adding Meaning
The result of all this tagging and preference elicitation is a semantic (i.e. programmer-friendly) bond between content, people, and communities. The obvious worth of this is that content can now be effectively routed to those who care. The data can also be used in more creative ways; here are a few ideas:
- Facilitate Social Process – One of the most powerful parts of this concept is that the tagged content doesn’t just have to be news articles or blog posts. It could be polls, conversations, events, protests, advertisements, classifieds, real estate listings, etc. In other words, this structure makes it possible to add meaning through localized conversation and collective insight. (It also opens up a potentially effective business model).
- Connecting Related Content – Locations are inherently related, as are common interests. Having a detailed spectrum of tags makes it more likely that users will see related stories next to one another. By allowing them to explicitly identify these relationships the system can, in the words of Ben Melançon, lower the signal-to-noise ratio even more for readers interested in a particular thread of news.
- Identifying Trends – Being able to see comparative trends between topic, location, and community could be incredibly interesting. For instance, what locations in the United States are written about most often in Environmental news? Isn’t it interesting that almost 60% of the users who have an interest in California also care about the topic of avocados? Why is there so much educational news in Philadelphia? You get the idea.
Like always, these ideas don’t have to be applied to a global system; local news organizations could easily use them to improve their website. Some of it might be overkill depending on the amount of content put out by these groups, but that’s why the next stop on the ‘system design’ train is a focus on collecting and generating content.
(This post pertains to a bullet point from Tying it All Together – Geotagging)
View Comments (2)
Another masterful post by Dan Schultz.
The base functionality of the need mentioned several times for shared decision-making about tags has been implemented in Community Managed Taxonomy. It could use support to get to an official release (or coding help).
Dan,
Your last 3 bullet points are the points I have been waiting to see all year. I think the these 3 points hold the key to unlocking what you are looking for. We can talk more when I see you!
Think in reverse