Googlebot UserAgent Good For Something
One of my pet annonyances when browsing around on the net is sites where you have to register for no good reason. I have enough useless accounts as it is. What's even more annonying is when they return a different result to the search engine bots so that more than just the registration page is indexed.
A prime example of this is Unison.ie. When searching for current Irish news it usually ranks fairly high on Google, however all the pages require you register first before you view them. The registration gives no advantage to people like me who just want to a quick look at the latest news. I suspect that I'm not alone and that lots of people will just go back and look for another site.
Unison's simple user agent checking makes it very easy to get in unmolested though. The User Agent Switcher Plugin for Firefox allows you to easily set exactly what user agent you want your browser to appear as. The GoogleBot isn't in the list of Useragents available, but it is easily added. Switch to GoogleBot as your useragent, and magically you will have full access to the Unison site.
I know that Unison will probably close this hole within a few days now, but it's nice to be able to make a point. According to Google's Webmaster Help Center "crawler only" pages are a thing to avoid. I would class pages that react differently to GoogleBot as "crawler only" pages.
If Unison want to require people to register in order to get nice features such as customization, then grand, I have no problem with that. However, how much traffic are they missing out on by having the register page for everyone? And how many advertising impressions are they missing out on? I know that if I go to the BBC News site I will usually end up going to other stories which interest me, which means more page impressions on the BBC site. More impressions, more chance of clicking on ads, more money!
In this day and age it is senseless to have such stupid restrictions on a site like Unison that has enough content to be a massive earner on advertisments alone.
Update: I somehow managed to forget the user agent I'm using, it is:
Googlebot/2.1 (+http://www.googlebot.com/bot.html)
2 TrackBacks
Listed below are links to blogs that reference this entry: Googlebot UserAgent Good For Something.
TrackBack URL for this entry: http://blog.moybella.net/cgi-bin/mt-tb.cgi/36

The BBC don't run ads do they? I get your point though. I never visit that unison site, always thought it was stupid having to register to just read a story.
Anthony,
They don't, however it is a site that I go to and tend to stay at for a while. Unison should have enough content with the Independent and all the local papers to be able to do the same.
I always knew they ran a subscription wall, but I've never seen their results in Google news.
This is cloaking. Plain and simple. It goes against the Google guidelines. In fact it's a ban-worthy offence. Considering all the hassle over WMW and NYT this is quite funny. Deserves some further highlighting I think.
Nice find
Rgds
Richard
Richard,
Feel free to highlight :)
Niall.
I've always used http://www.bugmenot.com/ not get through Unison.ie , but still this is interesting to see what they are doing to get higher rankings.
Tut Tut !
So how do you report offenders ?
Paul,
Lots of sites started getting good at blocking bugmenot logins so I just stopped installing it :(
Not sure where it should be reported, haven't really looked to be honest.
Great find, this will really force them to choose either Google cloaking (and leaving it open to us) or closing the hole. Ultimatum anyone?
They could of course verify by way of reverse DNS but that would be rather costly per page load (never mind atrociously black hat).
Nice trick that with Googlebot. Not thought of that.
I was Opera so I often have to "fool" a website that I'm an IE browser (just 2 keep them happy so they believe they are covering their a** 4 the masses!).
I use BugMeNot.com where I want 2 c a site but not prepared 2 signup. Works great if someone else has been there b4 - u just re-use (good 4 the environment :)
Lal
David,
Reverse DNS is easily changed. Is there any list of common ips that the Googlebot comes from out there? This could be dangerous though, what happens if Google brings a new DC online with new IP Space? All of a sudden some googlebots won't be seeing the same as others, and then I'm presuming fun will follow :)
Lal,
That's what you get for using Opera ;)
Niall.
That is a great post. Thanks for the tip. :D
@Niall: (A bit late I know)
I would never recommend using Reverse DNS for something like this, total overkill when they should just use robots.txt
However, I have seen lots of examples of people reverse-DNSing Googlebots to check they are from a real Google IP. Of course new IPs get added but as long as people keep up on it they are relatively reliable.
David,
Never too late! I wasn't looking at the robots.txt when browsing using the Googlebot user agent ;-) I must have a look at how Wordpress blocks it though.
Niall.
What say you now ladies?
Hell of an improvement over the previous incarnation of Unison :)