As usual I’ve been spending a horrendously long time without writing anything on my blog – and for that I apologise. However, I have spent some of my time writing an SEO (Search Engine Optimisation) handbook, covering the importance of next generation techniques and practises.
I’m sure there are those of you who are all too familiar with the increasingly backwards approaches used by a few ‘special’ SEO agents and individuals out there and perhaps for you this will merely reinforce what you already knew to be true. For those of you who don’t know what I’m talking about -then please read the book and have a good laugh at yourself for being such a silly.
You can order print copies of the book – just not yet… more details on that coming soonly! I’ll be publishing online chapter by chapter (honestly I have finished writing it, but as an SEO, if I didn’t serialise it then it would look bad).
Enjoy the read and let me know what you think, if the first edition is terrible and you order, of course it’s going to be valuable in 200 years!
First off, I’d like to introduce myself. I’m a Search Engineer, a developer and programmer. I’ve worked with clients throughout the advertising industry at many different companies. My specialty is developing software that works with the search engines of companies like Google, Yahoo and MSN and attempts to influence the rankings of my client’s websites, as well as report on those ranking changes. I’ve never been to a lecture on computer science, read a book on development methodology and yet I’m in demand. My skills lie in understanding the technology of a search engine and how to capitalise on their ranking algorithms, web crawlers and content filters and it’s the ideas I generate in this area which have kept me in gainful employment.
SEO (Search Engine Optimisation) used to be a fairly simple task where you’d make sure every page on your client’s site had Meta tags, descriptions and content unique to that page. You might then try to analyse the keyword density of your key terms to keep them somewhere between 4 and 7 percent. More often than not most SEO companies wouldn’t even attempt that.
What most SEO companies would never tell you, and this is the industry’s most well kept secret, is that they’re intrinsically lazy. If you had a good client, with good content and a product of interest then their SERs (Search Engine Rankings) would climb entirely naturally to the top spots, you’d have nothing to do but sit back and reap the benefits of your lack of work.
This is of course a sad state of affairs which no real SEO company would allow and part of this book will help you to spot the difference between a professional outfit and rank amateurs and define the widening gap between the two camps.
As the title suggests I’m writing about the next generation of SEO. It’s becoming more difficult to increase the rankings of a particular website and it will only get more difficult to manipulate a website’s ranking without any understanding of how new search engine technology works. Lucky for you, my field is semantics (how to correlate the relationship between one word and another essentially) and you’re in for a whole chapter in manipulating a semantic index similar to those increasingly used by the major search engine players.
Chapter 1 – The Past
In order to proceed correctly in the future, the most important lesson is for us to understand what happened historically. There’s no shortage of information on the internet and amongst SEOs and webmasters about how Google’s original PageRank system worked. This is in large part thanks to a paper written by Google’s founders, Larry Page and Sergey Brin, whilst they were still studying for their PhDs at Stanford University. Not long after that they received their first investment from a company called Sun Microsystems which enabled them to build upon the hardware they had in their university dorm room and create the international phenomenon we know today.
PageRank was essentially a very simple system. It counted each link from one site to another as a vote for the destination site. By voting for another site the original gave away some of its own PageRank. The idea came from Salton’s Vector Space Model, which is a mathematical principal known to most Computer Science graduates today. This simple method of calculating which websites had the most votes, and therefore deserved higher rankings, is key to all search engine algorithms as it’s extremely fast to calculate. The most important factor in any search engine is its speed in returning and ranking results, especially when you’re dealing with an index of billions of pages.
The Anatomy of a Search Engine, based on the work of Larry Page and Sergey Brin whilst at Stanford.
If you understand that all calculations undertaken by a search engine must be as fast as possible, it allows you to draw logical conclusions:
· Thinking about a page as a machine would (which struggles to actually understand rather than just read), rather than as a human, is key to analysing your websites content for SEO value.
· Is every single underlined heading, keyword color, font size, image location, keyword relationship and page title length analysed when a page is crawled? It’s highly doubtful that anything too in depth is going to be indexed, when the crawler has another hundred thousand pages to visit and rank as quickly as possible, use some common sense here. Of course as processor speeds and bandwidth increase more in depth analysis will become possible in a shorter space of time.
· The search engine needs to maximise two things: the speed of its calculations and its measure of quality relevancy. Occasionally one is going to suffer at the importance of the other, if you were going to choose between indexing a page poorly – or not at all – which would you do?
SEOs in the past were able to capitalise on this speed issue by choosing to concentrate on areas of a page such as the Meta tags, description and page title. The content itself gradually became more important as time went on but still was subject to the speed of indexing. SEOs quickly realised that keyword density (how many times a keyword appears on a page out of the total number of words) was a very quick way to determine some kind of relevancy, and that the search engines were using it too.
Once the search engines got wise they implemented filters that stopped SEOs from flooding a page with keywords. Arguments in the SEO community followed over exactly what was the ideal keyword density for a term, and this usually settled somewhere between 4 and 7 percent.
Of course the PageRank model meant that agencies were keen to build as many links to their client websites as possible. To make matters worse however they were after links that already had high PageRank values to gain the maximum ranking as quickly as possible and this sprang up a cottage industry of people generating high PageRank links, purely to sell on. Google of course were unhappy about this and their anti-spam team began its work. Blacklisting of websites which ‘farmed links’ was becoming fairly common and this moved on to other aspects of ‘black hat’ SEO behavior – where an unfair advantage was being made by some nefarious companies and individuals.
Most SEO agencies at this stage relied heavily on staff who’d be subjected to some extremely tedious and repetitive labour. Going through page after page of a website and adjusting the number of keywords on a page, slightly changing each page title and Meta tag was a boring job and not well paid.
Directors and CEOs didn’t have a whole stack of problems though, if they kept building up link relationships with ranking websites and making sure their Meta tags were in place, their job was done. Often enough they’d have clients who already had an interesting product which did most of the work itself, spreading links around the internet as people registered their interests.
This natural traffic increase was what Google was looking for as they wanted sites which progressed on their own merits rather than trying to beat the system.
In previous versions (for those of you lucky enough to see the Alpha of the world’s first search engine to run directly from the user’s own desktop) Sunbeam would ask you to input your favorite websites as a starting point for its indexing routines. This was a problem for two reasons:
- Nobody ever wants to enter anything they don’t have to, especially when that information exists somewhere on their machine.
- It limited the ‘profile’ of the user initially available to Sunbeam and how quickly they’d be able to retrieve information actually relevant to them.
It also meant that the semantic engine that appeared in the earliest release was not capable of returning accurate matches for a period whilst the engine cranked up and had indexed at least a few hundred pages.
I’d been musing over these problems for a while, I wanted an experience where the user would be able to just install the program, let it do its work without going through any configuration screens, which they may not understand or that might put them off the install completely.
The solution as it turned out, was fairly simple. Using the browsing history of the user we can track down the urls that are visited most frequently and most recently without damaging privacy. After all these are just starting points to build a profile of interests. Data like this is a goldmine for Sunbeams advanced statistical algorithms and will enable it to deliver the results that mimic the language used in the websites in your browsing history.
It doesn’t stop there though, also added are routines that scan your outlook sent messages, tracking the semantics of your own typed words. These again, are not stored as complete messages anywhere in the system, are not tied to email addresses or even subject lines and privacy here is key. What is most important here is that you as a user will never have to go through a slew of irritating questions when you install Sunbeam, that inadequately attempt to locate and disect your interests.
Seeing as I expect privacy to be such an issue here, let’s turn to another reason to use Sunbeam over Google or Yahoo:
- Your searches are your own.
- Your data will never be sent anywhere else (there isn’t the server space for it!).
- If you choose to share your search database with anyone else (as easy as emailing the one file), then that’s completely up to you and not something you have to ‘opt-in’ to.
This software is entirely your own to play with, these are the things I’m really loving about it:
- You can play with the open source search algorithm.
- You can swap, share and amalgamate databases with friends or download one from the web.
- There are no adverts, no pop ups and no interruptions.
- If you don’t remember the exact word you’re looking for, just put in a similar one, or a descriptive phrase.
- If you want to use the same database when you get home, just mail it to yourself.
- If you don’t like the results you’re getting, run a seperate database for work and for home to match your corporate and downtime moods.
- If you have to do market research on teenagers, just use the database your nephew compiled.
I have just watched a video of the most exciting user interface ever seen. It’s not of the forthcoming iPhone nor is it any kind of Apple product. This is Microsoft Surface and it promises a revolution in how we interact with our computers and mobile devices, I’m completely blown away by not the technology behind the system, but how well it’s used to produce a product that will potentially devestate Apple’s market share.
If you wondered why Bill Gates was suddenly agreeing to do an interview with Steve Jobs, then I’m pretty sure this is the reason. It doesn’t matter if he does badly in that discussion because as soon as Surface was on show then Steve Jobs had lost out anyway. Will Jobs have a rebuttal product that we haven’t heard about? I doubt it.
Pricing And Availability
You’ll be able to get Surface from winter 2007 for between $5000 and $10000. I know that’s a lot of money right now but they aim to bring the price down to a consumer level quickly and this is the first device I’ve seen that really will fit right in your living room, instead of just attempting to hide in a corner. Designer coffee tables go for far more and I know which I’d rather have.
The New Standard In Interaction
For me, as a search and user interface developer, this fits in extremely nicely with my view of tiling search results as images. An application using Windows Live Search in this way for not just searching but RSS feeds and bookmarks would be highly intuitive and allow the user to see what they want straight off the mark.
One of the most ingenious features they’ve integrated right off the mark is the ability to interact with your mobile devices. We all have phones now; they started with IR then Bluetooth, now some feature WiFi. How many of you actually use these connection abilities reguarly though? I’d guess it’s a low percentage because the hardware and software we have to connect with doesn’t make it simple and easy enough to use frequently in most cases.
What Surface lets you do is put your mobile phone, PDA or digital camera directly on the table top and a ring will appear around it to signify the connection. You can then drag media to and from the device with your finger and a bit of wrist movement, it’s so simple it makes me want to cry. I spend a lot of time shouting about the need for simple and intuitive user interfaces and this is the model we should all start building from.
This is the new standard in user interfaces, keep up.
I just thought i’d drop you a message here on this lovely Monday. First off – hello to all the new readers, many of you from reddit and stumbleupon, and thanks for reading.
Secondly, I’m working on two new articles at the moment:
- The New Science of Inbound Linking
- Practical Implementations of Semantic Technology
The first is looking at the new ways to generate links to your content and create interest in the wave of social networking that is Web 2.0 (at least that’s what they’re calling it, buzzword: synergy), and the second is talking about other ways to use semantic technology with the web other than just creating yet another search engine.
I may not necessarily release them in that order, and they may be released in multiple parts as I’m trying to work on getting things out in more readable bitesize chunks that won’t rot the very important neural pathways of my users. Apologies to the families of those now looking after vegetative ex-readers
Finally, I’m working on putting together a video podcast of a type. It’s mostly going to be me talking arse about tech (which I love and can talk arse about for a very long time) but there may be an occasional nugget in there where I get drunk and talk about technologies I really shouldn’t yet. The flat’s kitted with an HD camera and mics now so at least the picture will be clear even when I’m not.
Anyway, the fresh content will start to appear tonight/tomorrow so keep checking back. Have a good week all!