This is a guest post by Joel Abrams, project manager for distribution at The Conversation – US.
It’s a common question when working on search engine optimization at a content site: How much traffic is going to the “long tail” of the archive? The converse gets asked too: Are our efforts at SEO working to drive traffic to new stories?
Answering these questions is not easy — there’s no set report in Google Analytics or Omniture. So I set out one day recently to try to answer it, and here’s what I found.
The basic procedure is to get a list of your content by publication day and marry that to traffic data by day. Then you can find the proportion that’s going to recent stories.
Step One: Make a Spreadsheet of Recent Content
First, you need to set up a data table in the spreadsheet with your recent content. If your URLs include the whole date, you’re in luck. If you only publish a few articles a day, make a list of them and fill in a column with the date manually.
If, like at the website where I work (The Conversation, a non-profit analysis and commentary site publishing research-based journalism for the general public from academics), there’s too much content to make that manageable — or you have your intern doing other things — you’re going to have to go to your dev team and get them to export a list of articles with their publication dates.
The file that came back from the dev team had the URL in column A and publication date in column B. It unfortunately had dates in a string format Excel didn’t understand: 2016-03-29T09:26:58Z. So I created a formula to transform that string into something Excel did consider a date: =REPLACE(REPLACE(B2,11,1,” “),20,1,” “). With the date in cell B2, the nested replacement replaced the T character in position 11 with a space and then the Z at position 20 with a space. If you’re creating your data, just put in the date and you can skip this step (and if you’re asking for a dump from the dev team, just have them give you the date and not the time).
Now I was ready to find out how many days old each article was. I used Excel’s DATEDIF function, with calculates the difference between two dates. The formula I used is =DATEDIF(C2,NOW(),”D”) — look at the date in C2, find the difference between then and now, and display the result in days.
Sort this data alphabetically by URL.
Step Two: Export Your Referral Data
Now it’s time to get your referral data out of your analytics package. I looked at referrals specifically from search, although you might be interested in all referral sources. In Google Analytics, that means I went to Acquisition > All Traffic > Channels. Then I clicked on Organic Search in the data chart. Then, below the graph, I clicked on “Landing Page” as the Primary Dimension, so I would get the URLs search users arrived on, instead of their search terms. Finally, I exported the report to Excel using the dropdown menu at the top of the page.
Bonus hint: When you’re using a segment or looking at part of the data so that Google is sampling the data, which is noted by a yellow notice at the top, you can click the little checkerboard-like sampling icon above that notice, and choose ‘higher precision’ to get Google to use twice the sample size.
Note that you do need to export each day of referrals separately, because content ages each day. So if you download a week of traffic and look at referrals to a week of content, on Monday you are looking at referrals to one day of stories and on Friday to five days of stories. That approximation may be good enough for your purposes, and would certainly be less work.
Now put each day’s downloaded data into a new tab of the spreadsheet with your content by date. If, like me, you’re only interested in search traffic to stories, delete any URLs for index pages.
Step Three: Calculate the Age of Your Stories
The next step is to the VLOOKUP function to find out how old the story is. For the first row, the formula is =VLOOKUP(A2,’Content by Date’!A$2:D$15,4,FALSE). The first parameter has the cell with the URL you’re looking up; the second has the range with the age data on the other sheet (be sure to add dollar signs in, so that the range doesn’t change when you fill the spreadsheet); the third is the column to take data from (in my case, the ‘days old’ parameter is in column 4); and the fourth tells Excel not to match the closest URL (otherwise, if a URL isn’t in the table, you’ll get the age of the next closest alphabetically).
Fill this function all the way down your table.
Step Four: Find Your Long-Tail Traffic
Finally, the payoff. We’ll use the SUMIF function to add up referrals to recent stories: =SUMIF(C2:C12,”<7″,B2:B12). SUMIF takes three parameters: first, the range of cells to examine (in this case, column C); the comparison, in quotation marks (note that because the age of content is relative to today, if you’re looking at referrals from 10 days ago, and you want content that was under a week old, use “<17” for less than 17 days old); and finally, the column to get the data to actually add.
Now divide that sum by the sum of all referrals, and you have your percentage.
Repeat for as many days as you think will provide a good dataset – I put each day on a separate tab within the same spreadsheet. Finally, divide the sum of all your referrals by the sum of each day’s referrals, and you have your overall referral rate to recent content.
So what does it mean? For The Conversation, 13 percent of search traffic was going to content publishing within the past week. We are more of an analysis site than a breaking news site, so much of our content is evergreen and has value as reference material. It was interesting to see that there was a fairly consistent level of archive traffic on weekends versus weekdays — but newsier traffic was highly variable (that spike on the 26th to the 29th was from stories on vegetarianism, Lil’ Kim, and Australian nuclear submarines). It also revealed that some of our newsier content was not doing well from Google. But as many of our newsier stories receive the vast majority of their pageviews on sites that republish our content under a Creative Commons license, we may be competing against ourselves.
If you do compute this ratio, please share your results in the comments. It will be interesting to be able to benchmark against other sites.
Joel Abrams is project manager for distribution at The Conversation – US, where he works on analytics, social media, and SEO, in addition to evangelizing sites to republish their wide-ranging research-based content, all of which is licensed for free under Creative Commons.