Don’t take Google’s word as lawPosted: November 28, 2006
I’ve spent two to three years now working on my own ‘search engines’, meaning programming web spiders, algorithms to provide semantic matches to keywords and all the gubbins that comes together to give a fully working engine. Along the way I’ve learnt a lot (often in the shape of a completely brand new programming language) and I’ve learnt that there are some things that don’t quite add up.Google in particular is a prime candidate for what I’ll call… proactive marketing. They claim they are able to do certain things with their search engine, whilst pulling out matches from a database of billions of pages within a second or less to a staggering number of concurrent users. They warn against black hat search engine techniques and claim they’ll catch ya with their clever traps.
Why doesn’t anyone ever seem to sit down and ask the simple question anymore? How the fuck can they do all that? And if they can do all that… why all the humans still sifting websites?
And you say you’re NOT using pagerank at all anymore and it’s being phased out… right how long ago did that start exactly? It’s really getting a bit long in the tooth now surely – why can’t you just… turn it off?
And keyword densities, obviously you’re tracking those when you cache the page into barrels right? That’s fair enough – but then run by me again how you’re able to distinguish between the page font weights, colours, titles etc… all whilst parsing through html of anything up to 100,000 words on a DOM tree? That’s just not efficient, it’s highly impractical and not clever – i’m having a hard time believing it’s true anymore. Sure you can guess at simple relationships by using matching keywords densities for words that occur a high number of times on a page… but that’s a guess and nothing better.
If you ask me, Google hasn’t changed the way their fundemental search functions, hasn’t turned off pagerank even, because they haven’t come up with anything better, a method efficient enough to surpass what was created whilst still at Stanford. All the doctorates are being focused on other ways to conquer the internet and search has taken a backseat until some bright genius comes up with a way to better the existing system – maybe they’ve already started indexing for it – because if that genius is anywhere, he or she is likely to be at Google already.
Of course all of the above is pure conjecture, I’m not saying it’s true and half of it probably isn’t, I just am aiming to try and get people to think about asking the questions again. Before Google we’d look stuff up, we’d ask why and how, now we’re content to let their word become gospel.