Estimating the number of blog subscriptions
July 4th, 2006 Tom Berger
Estimating the number of readers of plain web-pages is relatively straightforward. It can be done either using tools like Webalizer and Analog, that analyse the access log for the web server or by counters, ranging from simplistic ‘number of visitors’ dynamic images to the sophisticated Google Analytics, that use cross-domain resource loading to gather information about the readers accessing the site in a central database.
Unlike traditional website visitors, most readers of a blog use a news aggregator to periodically pull new items from the blog’s syndication feed. As a result, the co-relation between the number of requests and the number of times an item is read is broken, and to confuse things even more - many readers use a public aggregator service (like Bloglines and LiveJournal), which saves the feed to a central repository and serves the saved entries to many readers. For such services, growth in the number of subscribers is not represented by an increase in the number of requests made.
To get a rough estimate of the number of subscribers to a feed we need to separate between requests made by public services on behalf of more than one user, and requests made by individual news aggregators. Fortunately, a de facto convention evolved which allows public aggregators to identify themselves as such and report the number of subscribers to the feed. The way this is done is by including the number of subscribers in the user-agent request header (unfortunately no real standard exists yet, and every aggregator uses a slightly different format). All other requests are from individuals, and the number of requests from unique ip addresses roughly co-responds to the number of subscribers. In my analysis I decided to restrict this number to those addresses from which at least three requests were made during a twenty-four hour period. That way we don’t take into account users who accidentally stumbled upon the XML feed without actually intending to subscribe to it (because they wanted to copy the feed’s url from their browser’s address bar, for example, or because they use a preemptive caching mechanism like the google toolbar).
Finally, for most blogs there’s more than one way to get a feed. There are several popular feed formats (RSS and Atom and a few others, each having several different versions), and the blogging software we use may have more than one url for getting a feed (for example: one by using query parameters and one using a static resource), and an aggregator may subscriber to more than one of these. For public aggregators it is sensible to add the number of subscribers for each version. For private aggregators we can ignore redundant requests from the same address (they are probably being read by the same person anyway).
The estimate we get is still very inaccurate (and probably too low). First, not all public aggregators bother reporting the number of subscribers they serve. Google, Yahoo and MSN, for example, have a very large user base and most definitely access our feed on a regular basis, but we simply don’t know how many users hide behind them (and to make things worse, some of them may access our feeds from different addresses on different occasions, causing us to record them more than once even if they don’t have more than one subscriber). Likewise, subscribers not using a fixed address (dial-up and mobile phones subscribers, subscribers behind anonymising proxies) may cause slightly inflated figures too. Finally, some readers don’t use feed aggregators at all, instead reading the blog by occasionally visiting the HTML version of the blog using a browser.
There is another metric I could be extracting from the logs which I did not, so far, bother with. Following the logs over time, it would be nice to identify the relationship between requests to the HTML version of the blog and the number of subscriptions to the feed. Some blog entries must act as conversion points - people read them and then decide to subscribe to the feed. It would be interesting to know which entries are successful at recruiting new subscribers, because for a heterogeneous blog (with many writers, styles and categories) it is often difficult to know what readers are most interested in. I may try to add this in the future.
If you too are curious about the number of subscribers to your blog (and have access to the HTTP access log of the server hosting it) you can give my little script, Blogalizer, a try. Your questions, suggestions and improvments are, naturally, very welcome.
Entry Filed under: Technology, Tools, Our Software
3 Comments Add your own
1. Noel Welsh | July 4th, 2006 at 4:46 pm
FeedBurner is very nice site that will collect statistics for you. I use it, and it works well.
2. tom | July 4th, 2006 at 5:05 pm
FeedBurner is very nice site that will collect statistics for you
Nice and useful it may be, but the model it uses requires you to surrender your feed (and your statistics) to them, as the service is based on FeedBurner aggregating your original feed and convincing users to subscribe to the feeds produced by FeedBurner instead. I’d definitely consider using FeedBurner for some scenarios, but it’s nice to have a home-brewed solution too. And with the script being free, we even have hope to extend it until it will provide even more than what proprietary services provide today.
3. Jamie Cansdale | July 5th, 2006 at 9:22 am
You could always add an image to a post and track how many people download that. I know lots of people read email with images turned off, but I think must people want to see images in their feeds.
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>
Trackback this post | Subscribe to the comments via RSS Feed