What Is A Search Engine? You Have No Idea Apparently

One of my favorite blogs, that I read just about every day is readwriteweb, a sterling tech, web 2.0 and search blog. Not so long ago their AltSearchEngines regular article was turned into a fully fledged blog in its own right headed by Charles Knight who knows about the existence of more search engines than probably anybody else on the net.

I checked it out this morning and spotted an interesting article:

Today we launch Part I of our 3 Part Series

Part I: What is a Search Engine? by Nitin Karandikar (Mon)

Oh glominy! I thought, glibbily. This is right up my street so I settled in for a powerful, thought provoking read.

Alas, the writer was a complete nitwit and I felt compelled to post this raging comment:

You’re completely wrong, I don’t know why on earth you’d try to reclassify what a search engine is when we’ve known what search engines are for a long time.

A search engine is simply “an information retrieval system designed to help find information stored on a computer system” (Wikipedia).

1. It enhances findability of relevant web content for the user

It doesn’t need to have anything to do with the web. Findability is not a word, even in italics.

2. It searches the entire web or a large subset thereof
(this excludes publisher search engines that search only a single site or group of sites)

No search engine searches the entire web. Don’t listen to the Google PR machine so much, and again, it doesn’t need to touch the web to be a search engine. Plus you’re on AltSearchEngines here… how many verticals do you guys cover?

3. Searches are specified using a keyword, phrase or question, or using input parameters, without the need for undue navigation
(I don’t consider pure directories like dmoz to be Search Engines)

So you’re saying you need an input to get an output? That’s genius.

4. It provides search results on demand, not periodically

I don’t even know what the hell you’re trying to say this for. It’s still wrong. Why does it have to do as a person asks it?

5. It provides some kind of unique or special processing of its own: either in the search algorithm, or in UI improvements, or both
(this excludes pure Rollyo or Google Coop-based search engine subsets)

This is far and away the worst thing you’ve written, you’re clearly grasping at straws. That is until you said:

The criteria described above will not remain static; as technology progresses, Search Engines will need to support increasing levels of functionality to be taken seriously.

No, i’m afraid a search engine, will always be a search engine. No matter how technology progresses it will still be a search engine.

The article you should have written is, “What search engines should have on my holidays”.

Yakov: A search engine doesn’t need to have its own index of the web or build it. A crawler of some description is responsible for building an index – that can take many forms and is often included in the search engine software itself. If you want examples of search engines without their own index, then take a look at the recent Digg API contest for some examples.

I’m hoping Charles gives you a massive kick up the backside and stops you writing what essentially is a load of bollocks.

Yes, it was a little scathing, but I get extremely irate when I see article written by someone who clearly is just trying to write for the sake of saying something. Especially on a source I have a lot of respect for because I don’t want to see them letting it through to the front page, that’s their role as editors – to weed out the rubbish and go with the quality content right?

Advertisements

Sunbeam Is Your Search Engine

Sunbeam The First User Search EngineIn previous versions (for those of you lucky enough to see the Alpha of the world’s first search engine to run directly from the user’s own desktop) Sunbeam would ask you to input your favorite websites as a starting point for its indexing routines. This was a problem for two reasons:

  1. Nobody ever wants to enter anything they don’t have to, especially when that information exists somewhere on their machine.
  2. It limited the ‘profile’ of the user initially available to Sunbeam and how quickly they’d be able to retrieve information actually relevant to them.

It also meant that the semantic engine that appeared in the earliest release was not capable of returning accurate matches for a period whilst the engine cranked up and had indexed at least a few hundred pages.

I’d been musing over these problems for a while, I wanted an experience where the user would be able to just install the program, let it do its work without going through any configuration screens, which they may not understand or that might put them off the install completely.

The solution as it turned out, was fairly simple. Using the browsing history of the user we can track down the urls that are visited most frequently and most recently without damaging privacy. After all these are just starting points to build a profile of interests. Data like this is a goldmine for Sunbeams advanced statistical algorithms and will enable it to deliver the results that mimic the language used in the websites in your browsing history.

It doesn’t stop there though, also added are routines that scan your outlook sent messages, tracking the semantics of your own typed words. These again, are not stored as complete messages anywhere in the system, are not tied to email addresses or even subject lines and privacy here is key. What is most important here is that you as a user will never have to go through a slew of irritating questions when you install Sunbeam, that inadequately attempt to locate and disect your interests.

Seeing as I expect privacy to be such an issue here, let’s turn to another reason to use Sunbeam over Google or Yahoo:

  • Your searches are your own.
  • Your data will never be sent anywhere else (there isn’t the server space for it!).
  • If you choose to share your search database with anyone else (as easy as emailing the one file), then that’s completely up to you and not something you have to ‘opt-in’ to.

This software is entirely your own to play with, these are the things I’m really loving about it:

  • You can play with the open source search algorithm.
  • You can swap, share and amalgamate databases with friends or download one from the web.
  • There are no adverts, no pop ups and no interruptions.
  • If you don’t remember the exact word you’re looking for, just put in a similar one, or a descriptive phrase.
  • If you want to use the same database when you get home, just mail it to yourself.
  • If you don’t like the results you’re getting, run a seperate database for work and for home to match your corporate and downtime moods.
  • If you have to do market research on teenagers, just use the database your nephew compiled.


I Can Has Likkle Written Contentz?

Hi Readers!

The internet is an odd place, as I look at wordpress.com right now I see the top few blogs are I CAN HAS CHEEZBURGER?, passive-aggressive notes from roommates, neighbors, coworkers and strangers and of course Scobleizer.

F or those of you yet to witness the phenomenen of icanhascheezburger then let me summarise for you by saying it’s a blog filled with cute/demonic pictures of animals, mostly feline in nature with captions underneath. The passive-aggressive notes blog is exactly as it says in the title; pictures of amusing passive-aggressive notes.

As a further exercise in demonstrating to you the power of this medium let me give you an example of an icanhascheezburger image (taken of my girlfriend’s cat, yesterday):

f**k bucket, has pub

If you haven’t been to the site, you won’t understand most likely. The ‘bucket’ is an in joke as these websites often produce. Why exactly though is it so popular over the thousands of blogs that produce well written, quality content?

It’s fast

There are many facets to the speed here, firstly it’s very quick for the authors to add a new post. All they need to do is get an image, put it in the wordpress editor, add a couple of lines about the submitter and possibly the humourous content if they can be bothered and they’re done. This means they can generate hundreds of posts in the time it takes the rest of us to put out one or two (sorry wasn’t talking about you Scoble, or you Winer). The other quick thing they can do when they add a wordpress post is to select categories, this is a very fast way of tagging essentially and means as well as quickly refreshed content they also have targeted keywords. Hello good SEO.

It’s also fast for users; if you don’t get the joke in the first pic you see, it’s a 1 click scroll to the next one. You laugh, it’s funny, you whack the link on an email and send it round the office. They even have a lolcats generator that lets you put a caption on your picture of a cat in about 20 seconds AND automatically submit it to the site. Auto generated content essentially, which is just gold.

If the site updates less often the search engines aren’t the only things that return less frequently. The same applies to all your human users as well. They’re far more likely to refresh if they think the content updates often, and even more if they think their cat might appear on the next post.

What next?

I think very soon, you’ll see an abundance of these kinds of websites arriving if people are smart (often they’re not).

All kinds of non text media will benefit from this treatment and a social voting style system for it will allow a much faster turnaround on content. You’ve seen it with Digg and this is one of the reasons they really should add an images section they’re losing out hugely there.

Other websites have also shown the advantage of fast content generation from any source. Twitter allowing updates by mobile phone for example. I can upload pictures to blogspot from my k800i directly, it’s a shame I don’t like the blogging software.

Urrr.

I completely lost my train of thought I went and read some c# documentation and then all my post ideas ran away. I may finish this later when I regain my mind.


Fear Of Google: As Seen On Google Timeline!

UPDATE:

It would appear Valleywag’s Nick Denton is lacking a sense of irony and unfortunately I seem to have my commenting privileges revoked there now. Shame. He’s thoughtfully left this little nugget seemingly ending the argument with a resounding slap to my pride:

“Hey, Phil, I don’t mind being slagged off. Comes with the job. But you didn’t do it very effectively. One could make the point that mentions of Google itself have become more frequent. But sensationalism? I don’t think you proved your point”

What’s that Nick you can’t hear my answer from all the way over there because you blocked my account? Never mind. Sensationalist articles Nick, seeing as you are unaware, are those that are published without any proof behind them. So I put together my own sensationalist article on your sensationalist article and it appears you lack a sense of humour. Fortunately you’re unable to prove to me you have one because that’d mean you wrote something of substance. Unlike you Nick I won’t delete or remove negative comments even though I rate my blog above a tabloid so feel free to hurl insults from below if you wish.

THE ORIGINAL ARTICLE:

I saw over on Valleywag they’ve written yet another hack piece on the so-called Fear Of Google with the standard sensationalism and lack of humour. They’ve even drawn a pretty graph they collated data on from the Nexis newspaper database showing their spectacular lack of knowledge on current Google events.

Being a bit of a dry and sarcastic git I present to you Fear Of Google: As Seen On Google Timeline! which is a representation of how Google itself sees the phenomenon.

As Seen On Google Timeline!

Don’t bother reading Valleywag’s article, go and read what Scoble says instead if I was you.

Personally I have no fear of Google (though I am typing this in the stationary cupboard but that’s because of my love of pens) and instead feel an increasing need to criticise them rather than run in fear. Then again, people react in the same way with governments and it’s surprising that a company can approach that level.


Search From Jason Calcanis? Valleywag Will Post Anything

Over at Valleywag they’ve ushered out a new speculative article on the next move of Jason Calcanis, the Silicon Valley entrepeneur who sold his blog network to AOL Time Warner for a multi million dollar sum:

“several people, in a position to know of his plans, say these schemes are at most hobbies, or pure disinformation; the next venture is a search engine.”

Valleywag, ever the innovators, pin this search engine venture down as a cross between Wikipedia and Google. If I recall, that would be Wikia search right?

Jason has been quick to himself respond in the comments:

“this is almost as good as the don imus stuff… i love you guys–you’ll print anything. :-)”

Certainly seems to have Valleywag down to me – so called insider tech sites these days seem to think they can invent whatever they like – I don’t know where the buck is going to stop but people are going to have to start properly researching their stories if they want to keep their readerbase, just like any other division of serious journalism. There are those of you I’m sure who’d argue that a blog is just like a tabloid newspaper and that those rules don’t apply.

Go and read a tabloid then.

Jason also briefly discusses why Wikipedia would benefit from carrying advertising – he’s not wrong – just a single well placed ad on each page would make all the difference to their amounting costs and make sure that its fantastic resource would be around for a long time to come.


Activists or Mob? The First Digg Riots

This morning I came into work, I ran some SEO checks on a few sites and started up FireFox. As usual I browsed speculatively towards Digg for my hit of the overnight news that just wouldn’t be covered in the Metro newspaper I’d browsed through on the bus.

At first I couldn’t work out exactly what had happened. The following code was written everywhere :

09-f9-11-02-9d-74-e3-5b-d8-41-56-c5-63-56-88-c0

Through my sleepy haze I realised it was the code for unlocking HD-DVD protection that I’d seen a couple of times on stories the previous day. It transpires that Digg were actively deleting the stories featuring this seemingly unthreatening code in response to a cease and desist letter.

Jay Adelson (Digg’s CEO) wrote on his blog at 1pm May 1st:

“This has all come up in the past 24 hours, mostly connected to the HD-DVD hack that has been circulating online, having been posted to Digg as well as numerous other popular news and information websites. We’ve been notified by the owners of this intellectual property that they believe the posting of the encryption key infringes their intellectual property rights. In order to respect these rights and to comply with the law, we have removed postings of the key that have been brought to our attention.”

Normally this would have been the end of the matter, but this is Digg after all which, “is all about user powered content. Everything is submitted and voted on by the Digg community.”

Digg users went on to no less than a full out cyber riot.

Activists or Mob? The First Digg Riots

The community flooded Digg with stories containing the code, making it virtually impossible for the moderating staff to keep up with deleting all the stories – or that’s how it would appear to the mass of the Digg userbase. As a search engineer though, I know how simple it would have been to remove any story containing the code, variations of the code, links to pages with the code on and so on. Very few would have been able to get through if any if Digg was really intent on making sure they wouldn’t have a legal battle to fight.

As I’ve stated, Digg is no standard news website though and offending the userbase would be a poor marketing choice because they are responsible for the revenue by clicking on and viewing the adverts. With Digg’s users so fiercely protective of this story it would seem like the only choice.

Just eight hours after Jay’s post, the founder of Digg and the main public figure for the company, Kevin Rose, posted this:

“Today was an insane day. And as the founder of Digg, I just wanted to post my thoughts…

In building and shaping the site I’ve always tried to stay as hands on as possible. We’ve always given site moderation (digging/burying) power to the community. Occasionally we step in to remove stories that violate our terms of use (eg. linking to pornography, illegal downloads, racial hate sites, etc.). So today was a difficult day for us. We had to decide whether to remove stories containing a single code based on a cease and desist declaration. We had to make a call, and in our desire to avoid a scenario where Digg would be interrupted or shut down, we decided to comply and remove the stories with the code.

But now, after seeing hundreds of stories and reading thousands of comments, you’ve made it clear. You’d rather see Digg go down fighting than bow down to a bigger company. We hear you, and effective immediately we won’t delete stories or comments containing the code and will deal with whatever the consequences might be.

If we lose, then what the hell, at least we died trying.

Digg on,

Kevin”

This post is on Digg’s blog, and is complete with a Digg submission linking to it by Kevin himself. Many have taken this as Kevin and Digg coming around to their readers point of view and allowing them with good grace to post the code as they see fit. I don’t agree, to me this post is a last ditch attempt to reason with the Digg users and say: you lose the whole site, or you get to post your dumb code. It’s also the quickest way out for Digg, by allowing the posting they no longer make it necessary and they can remove them in a day or two when everyone has clean forgotten.

As always in these situations I highly doubt that more than 5 or 10% of the Digg readership were actually involved in this – but if they shout and scream loud enough that’s all that’s needed for them to get what they want over the reasoned arguments of everybody else. Digg by allowing this has opened themselves up to a hundred other groups who will want their own way on the most popular social news site on the internet in the future.

If you’re a Digg user or have any thoughts on this new mob dynamic I’d like to hear from you in the comments, I’ll keep this story up to date as any more comes in.

update at 12:35 02 May 2007

I read this blog piece on Digg’s troubles, it’s an interesting bit of opinion from a female who are in small supply on the popular social news website and I’d have to agree with a large portion of it.