Related Stories widgets generate traffic and revenue but have earned a bad reputation. Can they be fixed through better algorithms and user experience?
At a meetup for people working on content recommendations, AI and machine learning last month in Paris, Robbert van der Pluijm of Bibblio spoke about the challenges of building recommender systems for publishers. Here’s a transcript of his talk, originally published on Medium, where he shared some of the things Bibblio is trying.
Hello everyone, I’m the guy with the difficult name. Sorry Alex [host] for making you give it a go in the introduction.
My name is Robbert, co-organizer of RecSys London and Head of Bibblio Labs. Bibblio delivers recommendations-as-a-service for publishers and content platforms.
Today I’d like to talk briefly about how we solved the problem of building a local popularity recommender.
Before I dive in, a little bit about how Bibblio’s service works. Our customers, seen here: online publishers, libraries and course platforms, push their content to our API platform. Their content gets enriched and the customer requests the recommendations from an API endpoint.
We develop algorithms at Bibblio using a modular approach. That means that each algorithm will be built and rolled out as a separate product at first, so that customers can retrieve recommendations for each algorithm independently. Our reason for adopting this modular format is to restore transparency and control to our clients, both important values for Bibblio. To realize these values, our offering needs to be clear-cut and explainable to any person. Customers can also use our pre-built widgets that bring us tracking data without dropping a cookie on their page. You can customize these widgets, also known as modules, as well.
So what do these modules look like? The first example is The Day. They are a publisher doing current affairs for schools. Bibblio powers the ‘Related articles’ at the bottom of every content page. The Day uses a Bibblio service based on semantic similarity, which is powered by a TF-IDF algorithm.
It’s cool to see how publishers and their audiences respond well to recommendations showing something semantically similar. They almost expect to stumble across something irrelevant at the bottom of the page, so the experience we deliver is getting good reviews.
The second example is the Canadian Electronic Library. Here we do ‘Related texts’, powered by the same algorithm I mentioned before. Perhaps it’s interesting to mention that at the moment we only do internal recommendations. This allows partners to resurface great content and gives them greater coverage across their catalog.
The last example, to cover all three customer verticals, is a course platform called Coursedot. This is slightly different use case, as the courses we recommend come with a price tag for the end-user. Coursedot is interested in showing end-users relevant suggestions, not just the courses that might fill shopping baskets short-term.
Why local popularity?
For our next algorithmic product we considered a number of algorithms, finally settling on what we refer to as a local popularity recommender. It’s the first of a portfolio of algorithms that are intended to leverage interaction data, which is now becoming available. So far, we’ve only been able to draw upon document content, so we’re pretty excited that we can now enhance our algorithms with additional data. Thankfully, it turned out to be quite straightforward to implement, so we were able to receive quick feedback from our partner and their end-users.
We also noticed that modules with names like ‘Most popular’ are found everywhere on the web. But these modules are static, and often reward sensationalism and content which is largely irrelevant for end-users. We wanted to rethink popularity for content websites.
Challenges in popularity
The way the local popularity recommender works is that for each one of the source items, i.e. the item you as end-user are looking at, we rank all the other items according to how popular they are for users viewing that specific source item. Initially our idea was just to count the number of clicks to measure popularity.
We understand the limitation of clicks as a proxy for popularity, but that’s the data we had available. You could also consider scrolling data, time spent or explicit feedback data if you have it. The first thing we found is that you need to normalize those clicks and rank them according to click rate. This is in order to deal with the first type of bias — making sure you give a fair chance to documents which don’t get presented that often but every time they get presented they get clicked. You don’t want to create a prejudice against those documents. If you only count clicks then you’re not giving those documents a fair chance. You basically want to look at the conversion rate, not just number of clicks.
Now the problem with interaction data is that you have to deal with a learning period where early trends can be misleading. As I’ve said, how do you deal with recommendations that don’t get many clicks simply because not many users have seen them? How do you make sure you adapt to changing patterns and create a dynamic list of engaging recommendations across all of a customer’s site?
When you’re in this ‘transient’ period you don’t want to decide too soon on an optimal set of recommendations. In other words, you’d like to avoid taking a ‘greedy’ approach, at least in the initial warm-up phase. The welcome side-effect of this is that you don’t get stuck with a set of content items that just happened to have the highest click-rate after kicking-off the learning. The worst thing you can do is to determine popularity and then not leave room to adapt. You should give yourself the chance to explore the possibility that other — old and new — items have a higher click rate. That insight led us to investigate multi-armed bandit algorithms.
With a multi-armed bandit, each ‘arm’ could be any one of the content items from the entire corpus we’re recommending from. This means that initially recommendations using this technique will appear random and this may degrade the quality of the end-user experience. So here comes TF-IDF again to help us out. We use the TF-IDF scores to limit the pool of ‘arms’: the set of arms from which the algorithm can select its action in each learning instant is restricted to the top performing recommendations that were obtained using the TF-IDF algorithm.
— yajounet (@yajount) June 27, 2017
Periodically we update our bandit algorithm using fresh interaction data that has become available since the last learning phase. As the experiment progresses, we learn more and more about the relative payoffs, and so do a better job in choosing good recommendations. We always incorporate arms that explore new items which come into the catalog and as audience behavior changes.
The future comes highly recommend
At one point we wanted our slogan to be ‘Making Content Recommendations Great Again’, but we decided it was too soon and in the end we settled on ‘The Future Comes Highly Recommend’.
It’s important that we continue to create better discovery experiences for end-users. The challenge is to find the balance between pragmatic concerns and aspirations. Amongst those concerns, data availability is crucial. Bibblio has moved from using content data to harnessing audience interaction data, and next on our algorithmic roadmap is to utilize data about the individual user. Think personalized recommendations, home pages and even newsletters. How we utilize that data depends on many things, with our ethical standards always remaining central.
We understand that individuals are different personas, and they develop over time. We get that some people would rather receive a non-personalized response. Whatever we do, offering transparency and control to both publishers and end-users will be paramount. Thank you for listening.
Robbert van der Pluijm is head of Bibblio Labs.