I am not a number!

Nov 30, 2007

Questions of identity and privacy are rather topical at the moment, especially here in Britain where last week a database dump including the names, addresses, bank details of half the country, along with our children’s names and dates of birth, got “lost in the post”.

So what better time to announce a new online identity metric? My PhD supervisor, Nigel Shadbolt is the CTO of Garlik, so earlier in the week I got an invite to the launch of QDOS.

Like the online identity calculator that I wrote about before, QDOS gives you a score based on your online presence. However, this score isn’t just based on a Google search. It has four components (which are each represented by a different colour, and are combined to give a very pretty pictorial “fingerprint”; check out Tim Berners-Lee’s QDOS, for example).

Partial implementations #2: XSLT in Google Search Appliance

Nov 23, 2007

A Google Search Appliance (GSA) is a box that you plug into your network which crawls and indexes your data, and serves up the results of searches. Search results come in an XML format, and there’s a built in XSLT engine which means you can convert that XML into as many different views as you like. So you can have HTML-based search results, summaries, feeds, and so on.

My task recently was to debug some XSLT that transformed the GSA XML into an Atom feed. Easy enough, right? The GSA XML format is pretty hideous – most of the elements max out at three capital letters in length (whatever happened to human-readability) – but logical enough, and the mapping is hardly complex.

But all was not as it seemed. The GSA’s XSLT implementation is… how can I put this politely?… “non-standard”. This post describes some of the problems and workarounds.

Incomplete implementations #1: Atom in IE7

Nov 23, 2007

IE7 gives you a really quite nice view of an Atom feed. Take a look at the one for this blog, for example. You can filter by category, sort by date or title or author, and search for particular words or phrases. Pretty neat.

But it’s only a partial implementation. I’ve been having to create some Atom feeds recently, and getting them to display nicely in IE7 has proven a bit tricky. I couldn’t find any documentation about this with a quick google, so thought I’d blog it for future reference.

Converting (people) to RDF

Nov 15, 2007

As I’ve mentioned before, I’ve been a RDF sceptic for a long time. Perhaps it’s precisely because of my knowledge engineering background: in my experience, the field is about equal parts academic optimism, sales-related exaggeration and plain old information management. In other (un-minced) words, unrealistic aims with unproven technologies that are sold as being much cleverer (and more innovative) than they are. It’s not just RDF, I should say, but the whole Semantic Web pitch (typified for me by the idea of halting global terrorism using the power of Topic Maps) that seemed ludicrous to me.

Time moves on, and I might be changing my mind.

Detecting streamability in XPath expressions and patterns

Nov 6, 2007

The XSL Working Group gave some comments recently on the Last Call Working Draft of XProc. One of the comments was about a bunch of standard steps that we’ve specified which do things you can do in XSLT, such as renaming certain nodes. These steps generally use XPath expressions or XSLT patterns to identify which nodes should be processed.

What bothers the XSL WG is that these steps aren’t guaranteed to be streamable. In a streamable process, an input document can be delivered to the processor as a stream of events (and an output similarly generated as a stream of events) rather than as an in-memory representation. Such processes will start producing results more quickly and require less memory than non-streamable ones. And, because they don’t need as much memory, they are able to work on larger documents.

If the processes we defined in XProc were streamable, there’d have a clear advantage over their XSLT equivalents, and therefore a purpose. However, since they’re not guaranteed streamable, it looks like we’re simply creating yet another transformation language.