While listening to a podcast about how RSS feed readership is measured, an idea for improving measurement of readership of feed and also web pages occurred to me. One of the things that makes it difficult to accurately measure traffic to any internet resource is that there may be proxy servers between the resource and the reader. Web-based feed reading services like Bloglines, for example, may fetch your feed once an hour, but then turn around and display it to 100 subscribers. If you only consider your server log, your readership estimate would be off by 9,900%! ...or 99%, depending on which direction you're counting. Either way, the error is huge.

How do we fix the problem? One approach is to put "web bugs" in the feed. Web bugs are little images, possibly even transparent images, that won't get cached by services like Bloglines. So while the feed may only get accessed once, the image may get accessed 100 times, giving you a better metric.

One problem with web bugs is that while Bloglines won't cache the image, web proxy servers might. So your numbers are still likely to be off.

Another problem is that people don't like web bugs. They may be used innocently in many cases, but in other cases, they're used for privacy-invading purposes. That results in people being suspicious of all web bugs.

So, what's my big idea? It's not one that will make the metrics problem go away, but it could help: create a new HTTP request header that proxies (whether web proxies, feed proxies, or whatever) can send whenever they refresh their caches that tells the origin server how many requests have been received for the resource. The first time, it might look like this:

X-Proxy-Count: 1

The first person requested it, so I'm asking you for it. When the proxy's cache expires and they ask for it again, it might look like this:

X-Proxy-Count: 100

Wow! 100 people have requested this resource since the last time the proxy fetched it!

The exact meaning of the header might be a little different for web proxies and feed proxies. For web proxies, it would be the actual number of requests received (where any request that included a X-Proxy-Count header itsself would count as the number of requests claimed in that X-Proxy-Count header). For feed proxies, ideally it would be the number of subscribers to the feed who had checked their subscriptions since the last refresh. But it might be the total number of subscribers to the feed, whether they'd checked the feed recently or not. That might be a good detail to nail down.

Reader Comment:
Antone Roundy said:
Tim Bray wrote about this issue a few days ago. Expanding on what I wrote above, here's part of what I posted in his comments: Proxy-Fetch-Count: 1000 Proxy-Active-Subscribers: 100 Proxy-Total-Subscribers: 300 The first would indicate how man...
(join the conversation below)

Since not all proxy servers would support this header, the metrics wouldn't be perfect, but it would help. And since only aggregate numbers would be sent, not the IP addresses of each subscriber, privacy advocates would be less likely to be bothered.