Semantic Technology: A New Context

Posted: January 15, 2007 | Author: Phill | Filed under: personal, search, technology |9 Comments

Semantic search is actually just one facet of what’s possible using semantic technology. There are many more uses and implementations that are generally not discussed and frequently passed over altogether.

This doesn’t mean that they are any less valid, it’s just that the companies that are developing these technologies for the most part are primarily search engine companies looking to apply it to on line applications over gargantuan databases of millions if not billions of websites.

So let’s have a look at some other practical uses of the technology that think slightly further out of the box.

404 error pages

How irritating is it when you hit a 404 page for an article which would have contained everything about the subject you were looking for? It’s very annoying and the chances are it hasn’t been deleted forever but just moved and then not yet re-indexed (if it ever will be). If the article is very old then it can be extremely difficult to find in a website with a mass of content.

By using semantic technology we can do a number of things to aid that lost user. If the url has been rewritten to include the title, and the referring page contains good content we can come up with the most likely pages the should have been directed to. We simply compare the referring content and the referring link against the website’s database or XML site map. This helps to ensure that your users, even if lost, will rarely not find what they’re looking for.

Statistical Analysis

Increasingly, content is being tagged and the structure of content is improving thanks to the advent of web 2.0’s social standards and astute web masters/SEOs. There is in fact a veritable goldmine of data which is available for analysis by your website or blog statistics packages. Is it being used though? Not so much (obligatory Borat quote dealt with).

Think of the data generally gathered by your statistics package:
Referring website pages.
Search engine referrals, with the keywords of the query used.

These sources are both rich for use in semantic relationship analysis. The referring links are likely going to be from articles or opinion pieces of some type whilst the search engine referrals will include the search query that was used by the user to find your page.

If this data is properly focused we can show not just where your traffic is arriving from but what your traffic is arriving from. We can use the referring pages and search engine queries to focus on the context of the referring pages, the keyword densities and break down traffic into categories and focuses. In the simplest case we can suggest the proportion of negative to positive response traffic. Tagging your articles, and selecting keywords for SEO can be greatly eased by looking at this data and seeing what areas already perform well and strengthening those. I believe there are many other uses in this area but as always I want my readers to think a bit for themselves and come up with other possibilities, the point is though that knowing the context your traffic puts you in is an invaluable resource.

DySeTagging (Dynamic Semantic Tagging [Dice – Tagging])

Dice Tagging is kind of a joke, as my first commenter from the last article will realize, I’m making up crap acronyms for fun because terms like Web 2.0 tend to make me cringe (yes I realize I’ve used it). The only recent acronym I actually use is probably AJAX.

Anyway, this Dice stuff is clever. There are reportedly a number of groups working on something similar to what I’m going to talk about – including DARPA. The premise is that the web server itself has a semantic module, and on the load of any web page or document it analyses the context of the page and generates tags to define it which are then added to the header information.

This saves a lot of load on the poor search engine at the other end, on you at your end, and enables anyone to be responsible for their own tagging systems rather than having them assigned to you by an illiterate engine programmed by a kid on an OLPC.

So.. what?

Make your own mind up, as usual I’m trying vainly to ignite some sparks in other developers and thinkers out there who can take the technology where it needs to go. I wish I had the time to spend on all the projects I thought of but unfortunately I don’t which is half the reason I have this blog now. A lot of what I write is playing Devil’s Advocate and is meant to produce a reaction! So please give me some 🙂

9 Comments on “Semantic Technology: A New Context”

evolvingtrends says:

January 16, 2007 at 5:12 am

I have a clever new paradigm I’d like to share.

It has nothing to do with spin glasses or any of that silly stuff I’ve been having a kick out of lately… Those are a form of psychotherapy for me, not ideas intended to be realized. But the idea I’d like to tell you about is very realizable and could change the way people work.

It’s a piece of cake to understand and appreciate, a bit hard to develop, but I think you’d be intrigued enough to take evaluate.

Keep this blog going. I’ll be among your readers (it seems fun so far… I like the rhythm in your argument)

Katamarius!

🙂

marc d0t fawzi @ gmail

Reply
Phill says:

January 16, 2007 at 7:39 am

sounds like fun

Reply
Luke Breuer says:

January 18, 2007 at 3:55 pm

Perhaps we could return to having sites specify their own keywords, but include a human element: reputation. Sites that do keyword stuffing would lose reputation. Combine keywords with reputation and it seems that search engines could improve searches with this data, which would in turn drive webmasters to include the correct keywords in their sites.

The above does require a large investment of time and money: people must establish identities that the search engine(s) respect and some sort of standard must be agreed on with respect to what keywords mean what. I’m not optimistic on anyone doing this, especially Google, as they like to keep search rankings as algorithmic and human-devoid as possible. However, I’m not sure how else we can dramatically increase the quality of search results without a true AI.

Reply
Phill says:

January 18, 2007 at 4:03 pm

The time and money is an issue on it, but that’s mainly because there’s too many websites… Google already have a large team of people who manually check websites out and have done for years now but they’ll never be able to check anywhere near all of them.

True AI, or at least a form of AI that can understand a website seems to be a driving force behind search over the next year and into the future. Whoever cracks it first is going to be happy.

Reply
Gnoletcom says:

April 13, 2007 at 9:15 pm

Please join for discussions on topics about The Virtual Reality.
Technology development of The Virtual Reality and its perspectives.
Gnolet.com

Reply
Mark says:

April 18, 2007 at 7:42 am

Thank You

Reply
Phill says:

April 18, 2007 at 8:04 am

no problem ?

Reply
Alex says:

April 25, 2007 at 3:24 pm

Thank You

Reply
Sanjay says:

May 19, 2007 at 6:32 pm

tx

Reply