Real Sceptic

Ventures into scepticism

Why You Shouldn’t Use Alexa Traffic Statistics

12th August 2013 9 responses

alexa logoWatts is known for using Alexa web traffic statistics to show how well his website is doing compared to other blogs. Often to boast he’s doing far better than for example Skeptical Science or Real Climate.

Via the comment section of WottsUpWithThat the user @vitaminCSS pointed to a tweet where he joked around a bit about the graphs in response to Watts latest usage of Alexa data. Because I saw his comment I responded to his tweet saying that “Alexa is notoriously unreliable with the type of statistics it gives. You can’t do any comparisons with it.”

It was just me giving an opinion on how inaccurate the Alexa data is and that you shouldn’t use it. Watts did respond to my remark, and before I address his response to me I’ll explain why I think Alexa data is unreliable.

One of the reasons is that I graduated from university with a diploma in Information Technology. It makes me a software engineer and it allows me to use the protected title of Engineer in my country. I work for a software development company where we develop and maintain for example complex web retail software. Almost always coupled to online campaign and tracking software. This makes me well aware of the limitations of certain technologies or products.

That’s why I know that Alexa data is basically worthless if you’re trying to do any serious analysis of visitor numbers to websites. You just don’t use it as you will almost always get something that isn’t remotely close to reality (although some businesses do use this data).

What you need to know about how Alexa gathers their data is that they are dependent on users installing their toolbar (or a toolbar that passes information to Alexa). It’s this toolbar that provides data for their statistics. This makes you reliant on visitors to your site having this toolbar installed for their visit to be counted by Alexa. It’s something they state on their website:

Alexa’s traffic estimates are based on a diverse sample of millions of worldwide internet users using thousands of different types of toolbars and add-ons for Google Chrome, Firefox, and Internet Explorer.

This can introduce big biases in the data collected for websites. Something they try to correct for but it’s also something they say they can’t fully correct for (emphasis mine):

Alexa’s ranking methodology corrects for a large number of potential biases in our sample and calculates the ranks accordingly. We normalize based on the geographic location of site visitors. We correct for biases in the demographic distribution of site visitors. We correct for potential biases in the data collected from all the various browser extensions to better represent those types of site visitors who might not be in Alexa’s measurement panel. However, biases still exist, and to the extent that our sample of users differs from the set of all internet users, our traffic estimates may over- or under-estimate the actual traffic to any particular site.

This means that demographic, used browsers, and even the country your users are from matter in regards the statistics about your website. It’s why you should only put any serious stock into statistics derived from direct measurement of visitors by the website itself (and event then you need to know about weaknesses and how to interpret data). Indirect measurements can give you a hint, but should be taken with a big grain of salt. As they tend to give you the wrong answer about visitors and how they end up on sites.

All this is known in the industry. That’s why for websites you always use your own web statistics software. It’s the reason I use multiple types for my site, each with a specific goal. They show that the statistics Alexa has for my website underestimate and overestimate statistics. Some do get close but they tend to vary a lot, often I’ve seen statistics that don’t make any sense compared to the actual data.

Currently the statistics for search engine keywords are the ones that are hilariously wrong. Alexa says that “heyruka” is the most common search term visitors end up with on my website and estimates it at 64%. The actual number is 1%.

The other numbers like “pages per visitors” and “time on website” are off between 20% and 50%. That’s significant.

Alexa seems to suggest that WUWT is indeed more visited than for example Skeptical Science, but with the inherent inaccuracies you don’t know this for sure. You can only determine this if Watts and Cook would publicly release the direct traffic measurements (although they would have to be in a comparable format and measure the right metrics to make a good comparison).

Watts should know this considering how often it was pointed out to him that Alexa isn’t reliable. He should also known this because WordPress can and does track visitor statistics (if you host your own WordPress blog you need the JetPack plugin to do this, but it is standard functionality for a WordPress.com hosted blog).

He also uses sitemeter, which is a direct measurement of traffic on his website (it’s mentioned in the sidebar of his website). And if you look at the code of his blog you’ll notice that he’s also using Quantcast.

If you compare the Alexa statistics with his Quantcast statistics you’ll notice Alexa seems to be actually underestimating his statistics based on the page views per visitor (I don’t trust the Sitemeter statistics as it deviates too much and there are complaints that Sitemeter undercounts visits and page hits). Although I don’t know if Alexa is overestimating or underestimating the total visitors to his site:

StatisticAlexaSitemeterQuantcast
Daily Pageviews per Visitor3.271.44.6
Average Visit Length (mm:ss)8:320:19 -

That’s a similar deviation that I get with my website, despite that Alexa should be more accurate with more visitors. And as far as I can tell from his website code he isn’t using the Alexa Certified Site Metrics. If he was using that it would mean Alexa was tracking his actual visitor statistics (it uses a script on the website to measure traffic).

But he does have a WUWT toolbar listed in the sidebar of his website, which is a custom Alexa Toolbar. If enough of his visitors have that toolbar installed it has the potential to introduce a bias into the Alexa statistics.

Alexa has a tendency to overestimate and underestimate traffic to websites, it’s just too unreliable to do any meaningful comparisons between sites with.

All this is why I said what I did, and this is how Watts responded when he noticed the tweet:

watts-alexa-response

For one his response is an ad hominem attack, he went after me personally instead of after the point I made. As I’ve said I do my utmost to be civil, so I do not appreciate it when people do this. Especially when Watts says he wants to be treated civilly.

It also misrepresents what my point was. I was talking about the Alexa data as that is what isn’t reliable. I have no reason to believe that the direct measurements from TIME, WordPress or Quantcast aren’t accurate (I’m running a test with Quantcast).

I’m also not “110% anti WUWT.” I sometimes agree with him on certain points and I’ve defended him in the past. One example of this was when Greg Laden published an article that in my opinion was unfair to Watts. Also Laden wasn’t helping with how he was responding to critics. The one interaction I had with Laden about this on Twitter led to him blocking me.

Because of that my response to Watts was a bit snarky with me stating:

Nice comeback, a personal attack… I work in IT, that’s why I said what I did about Alexa.

Even after this hint that I might have the relevant expertise he just dismissed me and ended up blocking me. I have no idea why this was such a sensitive topic to him that he didn’t even consider that I might have a valid point.

Did you like this post?

Get updates via our RSS feed or enter your email address to receive new posts by email.

%d bloggers like this: