Comment spam and feed format

May 14, 2007

I guess it’s an indication of something (like just being indexed by Google) when you first get comment spam on your blog. Anyway, I really don’t want to insist that commentators create accounts here, so after several annoying days of repeatedly deleting spam comments, I installed the Drupal Spam module and every spam comment since has been captured.

Reporting on the blogosphere

May 10, 2007

I noticed what I think is a new phenomenon earlier this week, while reading my daily paper. This is an extract from an Independent story about the abduction of a toddler from a holiday resort:

In the UK, the distraught parents were criticised in internet chat rooms for allowing their children to be out of their sight. [snip]

Some bloggers taking part in discussions threads on the internet since the news broke have claimed that as well-paid professionals the couple should have known better than to leave the children unsupervised. [snip]

Of course I’ve seen stories about blogging and internet use in newspapers before, but this is the first time that I’ve noticed a mainstream news article reporting on what internet users were saying about a mainstream news story.

Big XSLT applications just got easier to manage

May 10, 2007

I used to know how to arrange my XSLT modules. Each module had to be self-contained, and any common code imported into all the modules that used it. The reason? Because when you have on-going validation of your XSLT stylesheets, if the module can’t stand alone then you get all sorts of spurious errors. For example, if you define a variable in module A, which includes module B which uses that variable, then although the application as a whole will work fine, when you’re editing module B you’ll get errors because the variable isn’t defined in that module.

That rationale just got blown out of the water.

Levenshtein distance on the diagonal

May 6, 2007

The big problem with the previous Levenshtein distance implementation is that it recurses so much a number of times (roughly) equal to the multiple of the lengths of the two strings you’re comparing. If you’re using an XSLT processor that doesn’t recognise the function as being tail recursive then you can’t compare two strings more than about 20 characters in length (400 recursions).

The problem is that the standard dynamic programming Levenshtein distance algorithm is written for procedural programming languages in which you can do useful things like updating variables. XSLT ain’t like that, so we need an alternative algorithm.

Levenshtein distance in XSLT 2.0

May 3, 2007

[UPDATE: Added a link to the full stylesheet, and edited the code so it doesn’t overlap the right-hand column.]

Levenshtein distance is a measure of how many edits it takes to get from one string to another. In the basic algorithm, each addition, deletion and substitution counts as a single edit. So, for example, the distance between "XSLT 1.0" and "XSLT 2.0" is 1: the only difference is the substitution of 2 for 1, whereas the distance between "XSLT" and "XQuery" is 5: three substitutions and two additions.

One of the interesting features of Levenshtein distance is that there’s a fairly straight-forward dynamic programming algorithm that can be used to calculate it. I thought it might be interesting to see what an XSLT 2.0 implementation might look like.