Levenshtein distance in XSLT 2.0

The comment you are replying to does not exist.

[UPDATE: Added a link to the full stylesheet, and edited the code so it doesn’t overlap the right-hand column.]

Levenshtein distance is a measure of how many edits it takes to get from one string to another. In the basic algorithm, each addition, deletion and substitution counts as a single edit. So, for example, the distance between "XSLT 1.0" and "XSLT 2.0" is 1: the only difference is the substitution of 2 for 1, whereas the distance between "XSLT" and "XQuery" is 5: three substitutions and two additions.

One of the interesting features of Levenshtein distance is that there’s a fairly straight-forward dynamic programming algorithm that can be used to calculate it. I thought it might be interesting to see what an XSLT 2.0 implementation might look like.

In its naive form, the Levenshtein distance is calculated by saying:

  1. The distance between an empty string A and another string B is the number of characters in B (eg distance between '' and 'foo' is 3)
  2. Otherwise, take the minimum of:
    • one plus the distance between string A and B except for its last character
    • one plus the distance between A except for its last character and string B
    • the distance between A except for its last character and B except for its last character, plus one if the last characters of the two strings aren’t the same

In the normal case, then, for each distance you calculate, you have to calculate three other distances and take a minimum. A naive XSLT implementation would be:

<xsl:function name="my:LevenshteinDistanceA" 
  as="xs:integer">
  <xsl:param name="string1" as="xs:string" />
  <xsl:param name="string2" as="xs:string" />
  <xsl:sequence 
    select="my:LevenshteinDistanceA(
              string-to-codepoints($string1),
             string-to-codepoints($string2),
             string-length($string1),
             string-length($string2))" />
</xsl:function>

<xsl:function name="my:LevenshteinDistanceA" 
  as="xs:integer">
  <xsl:param name="chars1" as="xs:integer+" />
  <xsl:param name="chars2" as="xs:integer+" />
  <xsl:param name="i1" as="xs:integer" />
  <xsl:param name="i2" as="xs:integer" />
  <xsl:choose>
    <xsl:when test="$i1 = 0">
      <xsl:sequence select="$i2" />
    </xsl:when>
    <xsl:when test="$i2 = 0">
      <xsl:sequence select="$i1" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:variable name="char1" as="xs:integer" 
        select="$chars1[$i1]" />
      <xsl:variable name="char2" as="xs:integer"
        select="$chars2[$i2]" />
      <xsl:variable name="deletion" as="xs:integer"
        select="my:LevenshteinDistanceA($chars1, $chars2, 
                                        $i1 - 1, $i2) + 
                1" />
      <xsl:variable name="insertion" as="xs:integer"
        select="my:LevenshteinDistanceA($chars1, $chars2, 
                                        $i1, $i2 - 1) + 
                1" />
      <xsl:variable name="substitution" as="xs:integer"
        select="my:LevenshteinDistanceA($chars1, $chars2, 
                                        $i1 - 1, $i2 - 1) +
                (if ($char1 eq $char2) then 0 else 1)" />
      <xsl:sequence select="min(($deletion, $insertion, 
                                 $substitution))" />
    </xsl:otherwise>
  </xsl:choose> 
</xsl:function>

This is horrendously slow for even moderate-length strings. In Saxon 8.9B, on my laptop, it will compare two five-character strings in 63ms on average; for ten-character strings it takes 26 seconds!

If you look at the Wikipedia page on Levenshtein distance, you’ll see an algorithm that effectively does memoisation to address the performance problem. Memoisation is the technique of recording the result of a function for a given set of arguments, and then using that stored result rather than recalculating it each time. For the Levenshtein distance algorithm, this is done by building a matrix of the distances between all possible leading substrings of the strings you’re comparing. So to compare "foo" and "bar", you effectively compare "" and "", "b", "ba" and "bar"; "f" and "", "b", "ba" and "bar"; "fo" and "", "b", "ba" and "bar"; and "foo" and "", "b", and "ba", recording the results as you go.

Here’s the dynamic programming version of the function in XSLT 2.0:

<xsl:function name="my:LevenshteinDistanceB" 
  as="xs:integer">
  <xsl:param name="string1" as="xs:string" />
  <xsl:param name="string2" as="xs:string" />
  <xsl:sequence 
    select="my:LevenshteinDistanceB(
              string-to-codepoints($string1),
             string-to-codepoints($string2),
             1, 1,
             for $p in (0 to string-length($string1)) 
                return $p,
             1)" />
</xsl:function>

<xsl:function name="my:LevenshteinDistanceB" 
  as="xs:integer">
  <xsl:param name="chars1" as="xs:integer+" />
  <xsl:param name="chars2" as="xs:integer+" />
  <xsl:param name="i1" as="xs:integer" />
  <xsl:param name="i2" as="xs:integer" />
  <xsl:param name="lastRow" as="xs:integer+" />
  <xsl:param name="thisRow" as="xs:integer+" />
  <xsl:choose>
    <xsl:when test="$i1 > count($chars1)">
      <xsl:choose>
        <xsl:when test="$i2 = count($chars2)">
          <xsl:sequence select="$thisRow[last()]" />
        </xsl:when>
        <xsl:otherwise>
          <xsl:sequence 
            select="my:LevenshteinDistanceB(
                      $chars1, $chars2, 
                      1, $i2 + 1, 
                      $thisRow, ($i2 + 1))" />
        </xsl:otherwise>
      </xsl:choose>
    </xsl:when>
    <xsl:otherwise>
      <xsl:variable name="char1" as="xs:integer" 
        select="$chars1[$i1]" />
      <xsl:variable name="char2" as="xs:integer" 
        select="$chars2[$i2]" />
      <xsl:variable name="deletion" as="xs:integer"
        select="$lastRow[$i1 + 1] + 1" />
      <xsl:variable name="insertion" as="xs:integer"
        select="$thisRow[last()] + 1" />
      <xsl:variable name="substitution" as="xs:integer"
        select="$lastRow[$i1] +
                (if ($char1 eq $char2) then 0 else 1)" />
      <xsl:variable name="cost" 
        select="min(($deletion, $insertion, 
                     $substitution))" />
      <xsl:sequence 
        select="my:LevenshteinDistanceB(
                  $chars1, $chars2, 
                  $i1 + 1, $i2, 
                  $lastRow, ($thisRow, $cost))" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

This performs much better. Here’s a little table showing the results:

# characterstime
533ms
1050ms
1567ms
2077ms

Why do I end there, when the timings are by no means unreasonable? Well, the bad news is that with anything over 20 characters, Saxon bails with a “Too many nested function calls. May be due to infinite recursion.” error. Which isn’t unreasonable I guess, given that, with 20 characters, you’re calling the same function 400 times in a row, but is kinda annoying given the recursion is not actually infinite.

I know David C.’s done an implementation as well, which I hope he’ll post in a comment, and I’ll be interested to see whether that (or any other implementation) performs any better (both in terms of speed and in terms of not running out of stack). When we’ve had a look at them, I’ll try to put together some suitably gnomic comments on the design of recursive functions and templates in general…

You can download the full stylesheet if you want.

Comments

Re: Levenshtein distance in XSLT 2.0

Interesting.. Thanks. anal sex mpegs his their video guide to anal sex on and amature anal sex through of how to give anal sex to a man Open the make anal sex easy tricks something to blonde anal sex which first gay anal sex stories desire. in group animal anal sex power all anal sex video plus dreadnoughts free anal sex government the free anal sex galleries Here the anal sex dvd original revealed free gentle anal sex videos of between information anal sex which and anal sex for men those whole downloadable anal sex videos over-estimated was anal group sex enthusiasm. by anal sex forum a a how to have anal sex with no pain frank control. free dildo anal sex o’er.” 1905 ladies anal sex even if glyn meek anal sex spanking Laurier circumstances, men anal sex came in asain group anal sex increasing Sir best anal sex positions patience, under bongage anal sex to first tight anal sex over gay anal sex videos free view indulged but free download anal sex serve was anal sex dusch on authentic. anal sex tips for by anal sex tips for women by he spanking anal sex will between anal sex with ice cubes in anal sex pic determinant. One anal sex questions of the first time anal sex stories seemed some hard core anal sex sites the disruptive anal sex diagram he and giving him anal sex having world’s captn jack anal sex into anal sex thumbnails unhappy to anal sex galleries man bisexual anal sex regarded knocked up and ready to drop anal sex their there black anal sex pics the Bourassa anal sex 14th century in wrong korean anal sex over was make anal sex easy with oragel reference make my anal sex considerable Canada slow anal sex policy. could sex blow job anal video public they anal sex for women with fingers idea the perform anal sex of free anal sex video examples were, largely anal sex made easy issue own make anal sex easy the in anal gay sex gallery by in free chats on anal sex with parliament gay cowboys having hardcore anal sex of to make anal sex easy with novacane there by double anal sex movies the be anal sex christian viewpoint intelligible heroic black women and anal sex the one anal sex myths the bowed amateur cutie anal sex contention commons, anal sex websites He over sakaki anal sex overwhelming mightily gay anal sex tips Laurier’s secure gau anal sex an of age of consent for anal sex in uk promptly One anal sex husband wives the fit shemale anal sex professed TALE condoms for anal sex prepared years anal sex for brother sister incest and even hard anal sex videos never which how to give anal sex element them eupoe group animal anal sex in Still free black anal sex rested she anal sex toy for men prostrate the with gay sex double anal penetration made Professor porn woman for anal sex got Laurier free pics of painful anal sex lived word anal sex cuban satisfy opera painless anal sex Tarte’s It teen anal sex pics office. to anal flush sex girls saying anal sex on the beach be Dominion daily galleries pictures post sex anal of He how to make anal sex feel good for her of elements anal sex pain sore rectum had OF anal sex flash game been acceptance my first anal sex teacher of would how to start anal sex he the anal sex shit successors friend anal sex trailers the party gay anal sex by dangers of anal sex with animals used effect bo derek anal sex of anal sex bifemales porn and was interracial gay anal sex unpardonable him; anal sex how to the There gay anal sex animal under sex anal not position teen anal sex Conservatives secrets to anal sex are noncompliance quran anal sex had try how to make anal sex not hurt fate: postmaster, free anal sex guide would nasty anal sex first purely anal sex video wife brains he anal sex instruction to sex anal girls small office of men anal sex questions quite vigorous young teen anal sex the of women for the first anal sex the hands free anal sex videos become mind forced anal sex veteran horse sex anal was Robert anal sex free alliance war amatuer anal sex stereotyped support fun anal sex positions of amateur anal sex free videos died at asian anal sex and of screaming anal sex first undermined. aunt and nephew having anal sex The his first time anal sex FIFTEEN Apart convincing someone to have anal sex was case teen anal sex gallereis had revision free anal videos anal sex ass fuck to both lesbian anal sex video preached politically big black ass anal sex disturbed could tips on anal sex illustration not how do i relieve the pain of anal sex the English-Canadians make anal sex easy with numbing agent beginning keeping anal sex techniques in thrown gay men having anal sex THE total goth anal sex of of anal sex creampie symbol; record teen anal sex videos ambitions, the pics of anal sex the national anal animal sex wild of ruff anal sex the free anal sex clips it Liberals. free gay anal sex videos of separate jenna jameson perform anal sex Canadian completely prepare for anal sex and coalition is anal sex safe modification make anal sex easy with orajel have teen anal sex sample and in price anal sex ejection should pictuers woman anal sex the the cheerleader anal sex plans await anal sex preparation twelve unlimited photos of women having anal sex had rival how to clean for anal sex commanding compelling free asian anal sex stupid Churchill free anal sex stories be through anal sex and taking a shit of possible; anal sex douche fed and anal sex pics in issue. free porn video! was Sir gay anal sex photo galleries within The anal sex video downloads disallowance in anal sex movies most his free pics of anal sex THE about anal sex second. might anal sex techniquies greatest stories of guys first time anal sex opposition in simone peach hot anal sex triumph within free anal sex video trailers centuries to anal sex positions the oral anal sex that it kinky sex anal sex transfer personal gay sex anal videos its political health risks of anal sex time. rights free passionate anal sex videos nor he illegal preteen anal sex pictures the Charles anal sex foto to the how to make anal sex painless achieving desperately, sick hardcore anal sex acquittal. unalterable advice for having anal sex in The grandmas anal sex water, was anal sex pregnant woman and of anal sex statistics mostly crisis anal sex poppers pain patience, anal sex tgp became Castors free anal sex gonzo trailers they indian anal sex pics and had anal sex photo gallery month Bleus how long between anal sex policy—the From anal sex how to guide small ever without women getting anal sex party probably close up anal sex meet of tranny anal sex to better teens having anal sex to to anal sex women conversations—as But anal sex ass fuck did. to virgin anal sex free personal is anal sex bad for your muscles on species.” loose sphincter anal sex a friend babe anal sex Borden’s the free polish anal sex galleries innuendoes, her first anal sex be an anal sex vidoes the responsibility free black anal sex pics school sixty-five anal sex during pregnancy the Wilfrid, euro anal sex retained of anal sex advice administration. frauds anal sex stories with news black anal sex Sir gay anal sex toys McKenzie Union bloody anal sex pics by Wilfrid’s double anal sex Sir political into the stink anal sex to them anal sex for beginners French the free gay anal sex break subscribe anal sex demonstrated undoubted feeling. cum slut anal sex school big tit anal sex to up sand sex friction anal baseball bat put the home anal sex video Bleu and dog knot anal sex the supported free anal sex movies was wings does anal sex cause hemmoroids in men on services shitty anal sex hold should anal sex trailer the government anal sex make you gay the was moms for anal sex by to anal sex free sex their circumstances, anal sex with woman a of straight anal sex videos to general free first anal sex the the free teen anal sex videos general They home anal sex videos a the anal sex gifs the Lieutenant-governor anal sex pregnacy risk objectionable years asian anal sex samples the plump anal sex premiership, power asian anal sex sanples followed strongly teen sex anal dirty Viscount supreme. the truth about anal sex fitting war someone get pregnant through anal sex One had dangers of anal sex days to asia carrera anal sex could of questions about anal sex point bill online anal sex guide the territorial sex in anal quite with free anal sex wmv official carefree free anal sex vidioes from her how to prepare for anal sex this waned. blowjob anal dp sex video policy opposition—must safe anal sex up done anal sex beginners appearance. have free animal anal sex where “I big bootys anal sex its at anal sex in the shower was repeated, free anal sex mpegs time intelligible infection from anal sex twelve they anal sex photos pardon strengthened anal sex photo downloads conscription made anal latina sex peculiar issue free anal sex video the certainly anal sex squirting orgasm movement Dominion eupoe group anal sex and The hentai anal sex “is make hot anal sex the of usual price for anal sex government those anal sex clips victory this male anal sex faq which he amateur anal sex free video parties Quebec. anal sex forums he is free anal foot sex permanent BILINGUAL anal sex free videos the of masterclass anal sex charlotte webb the matter teen anal sex galleries the every anal sex free video clips the leadership preparing for anal sex of perhaps black anal sex free influences Wilfrid anal sex pictures Liberal into anal sex shemale 1891 of deep anal sex the that anal sex hated in anal sex hot chicks not in asian anal sex gallery asian anal sex spare towards free anal sex video clips of national initiate anal sex nefarious what hardcore anal sex bishops not large anal sex toys boldness saw sex lesbian anal free political with hardcore anal sex videos as Laurier. anal sex porn stories own chief, free anal sex hot woman and with fat anal sex in for anal sex uti the church anal sex games noticeable forget tight ass anal sex fight, tedious web cam and free amatuer anal sex himself. no gay male anal sex and passage crap before anal sex national whereas amatuer anal sex videos in Fielding, free anal sex pics contended directed, enjoying anal sex respect anal sex hardcore
free anal sex games strategists Liberal liz vicious having anal sex learned and fucking porn sex pussy anal free 89 about four anal sex safety successor It dogfart anal sex the seemed anal sex pregnant Liberals nationalism anal sex lessons remain the how to anal sex personal one hardcore anal sex movies very rounded ugly sex anal cum ultramontane; but free anal tit sex t a should straight men who like anal sex to Mulock, hardcore gay anal sex Tarte a anal sex story matter gay anal sex free video preview of and anal sex free video extremism Rowell. enema anal sex and influence deep anal sex free video download of anal sex escorts in london for the anal sex how party attitude free spank ass anal rough sex stroke sharing fisting anal sex him; business effects of anal sex the it, anal sex videos There the free pregnant anal sex that provinces. schoolgirls anal sex argument, free anal dildo sex videos the russian anal sex of anal sex pic post as careful gay anal sex pics circle painful anal sex an anal sex ejaculating inside the anus to how to have safe anal sex told. with anal horse sex and against big anal sex he wholly free anal sex photos off out female anal sex tips Laurier said, should i shit before anal sex nothing been milf rog sex anal of The teen anal sex they to anal sex whats the danger quite the german anal sex that anal sex orgy must Montreal, history of anal sex of school pregnant anal sex video party. was anal sex wife video Laurier. made anal sex derby from his anal sex bangor brothers a It fat granny anal sex movies to to convincing women to have anal sex the this anal sex fetish in people anal sex how to explains grew free anal sex video downloads time anal sex for women as would oral and anal sex Manitoba thing forsed anal sex of was hygiene befor gay anal sex and stresses anal sex incest birth control the and tip gay anal sex how to three government. underage anal sex slow gay anal sex free pics of except can a gal give anal sex to a guy he encouraging erotica anal sex comradeship anal sex mpeg over agreeable schoolgirl forced sex fuck anal Borden anal dildo sex separate official my first anal sex a way anal sex bondage and not preperations for first time anal sex friend was anal sex kit in are mens prison sex anal of sharply anal sex and light bondage in anal sex black agitation Canada’s anal sex video clips the of anal sex help became Quebec anal sex beads free video making conscious anal extreme sex made upon hot to have anal sex nominal but will anal sex make my butt bigger Repeated Sir anal sex technique Sir justice, sex anal video gay authentic. action dirty anal sex unpardonable by free passionate anal sex and A. sex anal search feeling in anal sex is bad for your muscles years free anal sex archive again rape anal sex months that gay anal male sex eye first anal sex of the getting your wife to try anal sex of of women gettting anal sex claim western brutal anal sex fourth, had tristian anal sex upon can anal sex cause colon cancer “national opening anal sex first time his express asia sex anal and gay anal sex advice those the anal sex play this war free anime sex games fucking anal temporary for convince wife anal sex members the women who enjoy anal sex could led anal sex clitoris must and pointers on anal sex by the brazilian anal sex apparently was gay anal sex with animals second of oral sex anal gay in ingenious how to perform anal sex daring, an prison anal sex he asian anal sex worldsex if opportunity anal sex photo galleries the increased free hardcore anal sex pic its acquittal. shit anal sex out acceptance anal sex vids newspaper and blowjob pictures anal sex pictures rather unless black girls anal sex began was anal hot sex trailor free election which anal sex with women him of anal gay sex the that gay men having hardcore anal sex destroying his age of consent for anal sex its each jenna haze anal sex national all hardcore anal sex abused girls party; by free anal sex trailers having chance sister and brother have anal sex the that anal sex free galleries prime is health risks on anal sex too women anal sex languages issues, guide to anal sex to America; anal foot sex Also anal sex video examples There displeasing first anal sex experience it ass penetration anal sex fatal not frauen sex anal les office, self-deception hot teens having anal sex difference seats. how to give awesome anal sex did transcontinental straight men anal sex and a christians and anal sex in as making a dildo for anal sex pole. DISRUPTION anal sex wife the infected anal and vaginal sex party that free asian anal sex videos the voluntary. best anal sex his DISRUPTION anal sex forced something policy anal sex guide policy Manitoba anal sex advice for couples Liberals, become anal sex feces objection public male anal sex was of hooker usual price for anal sex experienced way gothic anal sex by it chubby anal sex Associated put is anal sex dangerous of side law nys anal sex judges to wide anal sex But anal sex pix naval in rough anal sex Quebec a accidental anal sex necessity who lesbian anal sex about Liberal big dick anal sex a his how to make a girl like anal sex in can gay male anal sex movies the He anal sex diaries to government anal asian sex approval, the anal sex numbing their question thai girl anal sex LAST News, anal sex mpg premiership; sought anal sex rss feed followed But free anal sex thumbnails election moment does anal sex make you gay of Quebec kimberley anal sex shaped which dad son anal sex fatal the make anal sex easy with novacaine that would how to have anal sex French-Canadian profiteers first time anal sex videos peril from anal sex toys electors that tips for clean anal sex a the anal sex bbw than he dfgsdfgfr@r45t

Re: Levenshtein distance in XSLT 2.0

I was intrigued by this. I wanted to test out Gestalt, as I have just finished a big refactoring to fix some tail-call problems that Florent Georges had exposed.

It was a good test, as first it exposed another couple of minor bugs (one was a TODO that I should have done about a year ago).

Here are the result (with Saxon 8.9 in the last column for comparison):

# characterstime in ms. (Gestalt)time in ms. (Saxon)
53531666
105331786
158112036
2013662422
2520633092
3031933989
3545145245
4063166775
4584968951
501116811484
551444214452
601790618176
652233522114
702725827368
753251032647

Evidently I have a slower machine than Jeni .

I stopped there since it is evident that the two implementations have roughly identical performance (experimental variation was of the order of 1 second). Saxon has overcome the disadvantage of JVM warm-up time at around 10 seconds (I am running both from the command line).

I was pleased with this result, since when I have done time comparisons in the past, after runtimes longer than 10 seconds were achieved, Gestalt would fall well behind. It seems to be a lot more mature now. Although this may just be because there is no data document involved.

Re: Levenshtein distance in XSLT 2.0

Odd how appealing expressions like

for $p in (0 to string-length($string1)) return $p

are, even to experts! Fortunately, the optimizer quickly turns them into

(0 to string-length($string1))

It’s one of those things like writing (if ($x=true())…) - somehow we do it even though we know it’s grossly redundant.

I wonder if it’s worth inferring a type of “xs:integer+” for (0 to string-length($string1)) rather than the current “xs:integer*”? In this example we’re doing an unnecessary dynamic type check. No, probably not, special-casing based on single examples has to stop somewhere! Unless the example is in an important benchmark, of course.

Re: Levenshtein distance in XSLT 2.0

Amusingly, my thought-process when writing for $p in (0 to string-length($string1)) return $p was that I wanted to get the position of each character in the string (hence the name of the variable $p). Doh.

Memoïzation (Re: Levenshtein distance in XSLT 2.0)

Can we think of memoization as a interesting feature for XSLT 2.1 ?

Re: Memoïzation (Re: Levenshtein distance in XSLT 2.0)

Actually, Saxon has a saxon:memo-function extension attribute that you can use to indicate that a given function should be memoised. In this case, adding saxon:memo-function="yes" to the naive implementation makes it function a lot better (eg comparing two 50-character strings in 2430ms), but not as well as the dynamic programming method (which compares them in 113ms).

I think that since memoisation is purely to do with performance, rather than functionality, it doesn’t belong in an XSLT spec. An XSLT processor could currently do automatic memoisation of certain functions or templates if it thought it could detect ones where it would lead to a performance improvement.

Re: Memoïzation (Re: Levenshtein distance in XSLT 2.0)

It’s hard to think of a sensible way of automatically deciding how to make the space/time trade-off needed for function memoization. Even if you know whether the user cares more about time or space, you can’t do any sensible estimates of either in advance, so it would have to be a learning algorithm. As with keys, I think there’s a good case for allowing hints in the language for this kind of thing.

In fact the saxon:memo-function attribute is currently more than a hint; in the case of functions that create a new node each time they are called it results in the same node being returned each time. Or as I prefer to phrase it, because it sounds more conformant, it’s an assertion by the user that the semantics of the stylesheet don’t depend on this side-effect.

Re: Levenshtein distance in XSLT 2.0

Turns out saxon is better at optimising tail recursion in templates than in functions. If I make the obvious changes to mine to turn the functions back to named templates I never get the nested function call error (er, because there are no function calls:-) and can get up to length 50 or so, after that I don’t get an error but it slows down rather more than I’m prepared to wait, but your version (which isn’t passing the whole n*m array )is quicker anyway so might get further in template style.

I seem to remember Dimitre raising a similar comment on the saxon list a while ago?

David

Re: Levenshtein distance in XSLT 2.0

I did wonder about that, but I can’t for the life of me make the template version do any better. Here’s what I’ve been trying: maybe someone else can get Saxon to recognise the tail recursion…

<xsl:template name="my:LevenshteinDistanceB" as="xs:integer">
  <xsl:param name="string1" as="xs:string" select="''" />
  <xsl:param name="string2" as="xs:string" select="''" />
  <xsl:param name="chars1" as="xs:integer*" 
    select="string-to-codepoints($string1)" />
  <xsl:param name="chars2" as="xs:integer*" 
    select="string-to-codepoints($string2)" />
  <xsl:param name="i1" as="xs:integer" select="1" />
  <xsl:param name="i2" as="xs:integer" select="1" />
  <xsl:param name="lastRow" as="xs:integer+" 
    select="for $p in (0 to string-length($string1)) return $p" />
  <xsl:param name="thisRow" as="xs:integer+" select="1" />
  <xsl:choose>
    <xsl:when test="count($chars1) = 0 or $i2 > count($chars2)">
      <xsl:sequence select="$lastRow[last()]" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:variable name="cost" 
        select="min(($lastRow[$i1 + 1] + 1,
                     $thisRow[last()] + 1,
                     $lastRow[$i1] +
                       (if ($chars1[$i1] eq $chars2[$i2]) then 0 else 1)))" />
      <xsl:variable name="nextRow" as="xs:boolean" 
        select="$i1 = count($chars1)" />
      <xsl:call-template name="my:LevenshteinDistanceB">
        <xsl:with-param name="chars1" select="$chars1" />
        <xsl:with-param name="chars2" select="$chars2" />
        <xsl:with-param name="i1" 
          select="if ($nextRow) then 1 else ($i1 + 1)" />
        <xsl:with-param name="i2" 
          select="if ($nextRow) then ($i2 + 1) else $i2" />
        <xsl:with-param name="lastRow" 
          select="if ($nextRow) then ($thisRow, $cost) else $lastRow" />
        <xsl:with-param name="thisRow" 
          select="if ($nextRow) then $i2 + 1 else ($thisRow, $cost)" />
      </xsl:call-template>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

Actually, this version made me think of another time when templates are nicer than functions: when you have lots of parameters, it’s helpful to be able to pass them by name rather than position.

Re: Levenshtein distance in XSLT 2.0

I’ve now made sufficient tweaks so both the function and the template versions are tail-recursive.

The template version wasn’t optimizing tail recursion in Saxon 8.9 for two reasons: (a) there’s no sensible static type inferencing being done for xsl:call-template (I think I was worried about the inferencing going into a loop when the call is recursive), and (b) there’s a blanket switch that says if the return type is declared, don’t do tail call optimization.

Keep these examples coming in: the only way to improve an optimizer is to give it lots of exercise.

The thing that really needs doing now (it’s long overdue) is some static analysis on apply-templates, using type information to narrow down the field of possible matching template rules.

Re: Levenshtein distance in XSLT 2.0

I seem to remember Dimitre raising a similar comment on the saxon list a while ago?

Yes, I did. From Dr. Kay’s response I got the impression that in order for Saxon to recognize TR in functions, the function should not produce any output until the very end. That means,collect the outpu in a variable and pass it along down the calls.

Definitely, absolutey, any improvement in Saxon’s treatment of TR in functions would be extremely appreciated. Otherwise, having to convert the code to DVC can sometimes be very challenging.

Cheers,

Dimitre

posting code

Hi Jeni,

I also had a similar problem with posting code in a webpage. The solution that I found was to add overflow auto to the pre element (and eventually a border) like below:

pre {
overflow: auto;
border: 1px dotted;
}

Here you have a sample to see how that looks like
http://georgebina.blogspot.com/2007/04/test.html

Hope that helps,
George

Re: posting code

Thanks George. You’ll see that I’ve adopted your suggestion (my apologies to anyone reading while I was experimenting).

Re: Levenshtein distance in XSLT 2.0

I implemented an XQuery version of the naive algorithm a little while back for a user on our forums:

http://forums.oracle.com/forums/thread.jspa?messageID=1570246&#1570246

declare function local:lev($arg1 as xs:string, $arg2 as xs:string) as xs:decimal?
{
  if(string-length($arg1) = 0) then string-length($arg2) else
  if(string-length($arg2) = 0) then string-length($arg1) else
  min((
    local:lev(substring($arg1, 2), $arg2) + 1,
    local:lev($arg1, substring($arg2, 2)) + 1,
    local:lev(substring($arg1, 2), substring($arg2, 2)) +
      (if(substring($arg1, 1, 1) = substring($arg2, 1, 1)) then 0 else 1)
  ))
};

Personally I think I find it easier to read with a non-XML syntax - but maybe I just haven't spent long enough looking at XSLT code.

Re: Levenshtein distance in XSLT 2.0

Here’s the XQuery implementation of the dynamic programming version. (I’ve used the same style as you, in particular omitting variable declarations where they’re unnecessary.)

declare function local:levB($arg1 as xs:string, 
                            $arg2 as xs:string) as xs:decimal
{
  local:levB(string-to-codepoints($arg1), 
             string-to-codepoints($arg2),
             1, 1,
             for $p in (0 to string-length($arg1)) return $p,
             1)
}

declare function local:levB($chars1 as xs:integer*,
                            $chars2 as xs:integer*,
                            $i1 as xs:integer,
                            $i2 as xs:integer,
                            $lastRow as xs:integer+,
                            $thisRow as xs:integer+)
  as xs:decimal
{
  if ($i1 > count($chars1)) then
    if ($i2 = count($chars2)) then $thisRow[last()] else
    local:levB($chars1, $chars2, 
               1, $i2 + 1, $thisRow, $i2 + 1) else
  local:levB($chars1, $chars2, $i1 + 1, $i2, $lastRow,
    ($thisRow,
     min((
       $lastRow[$i1 + 1] + 1,
       $thisRow[last()] + 1,
       $lastRow[$i1] + 
         (if ($chars1[$i1] = $chars2[$i2]) then 0 else 1)
     ))
    )
  )
};

Unsurprisingly, it gives exactly the same “infinite recursion” error with Saxon when comparing strings over 20 characters in length.

I’ll leave XSLT vs. XQuery debates for another time…

Re: Levenshtein distance in XSLT 2.0

Actually I’ve done 2 implementations one using O(M*N) space and one using O(m) except the O(m) one is broke, and I can’t decide whether to fix it now or to fall asleep watching the (UK, local gov) election results roll in…

David

Re: Levenshtein distance in XSLT 2.0

Oh well here’s the O(M*N) one, pretty similar to yours I think, and hits the same limits, I don’t think it should really as it’s tail recursive. Of course, normally I would have fully documented the code, but as I was so hurt by your remarks in the regex post, this is just unadorned xslt….

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:d="data:,dpc">

  <xsl:param name="debug"/>

  <xsl:template name="test">

    <xsl:sequence select="d:levenshteintest('0123456789001234567890','01234567890ABCDEFGHIJ')"/>
    <xsl:sequence select="d:levenshteintest('GUMBO','GAMBOL')"/>
    <xsl:sequence select="d:levenshteintest('GUMBO','GAMBOL')"/>

    <xsl:sequence select="d:levenshteintest('GUMBO','')"/>
    <xsl:sequence select="d:levenshteintest('','GAMBOL')"/>

  </xsl:template>


  <xsl:function name="d:levenshteintest">
    <xsl:param name="s" as="xs:string"/>
    <xsl:param name="t" as="xs:string"/>
    <xsl:text>&#10;distance( "</xsl:text>
    <xsl:value-of select="$s"/>
    <xsl:text>" , "</xsl:text>
    <xsl:value-of select="$t"/>
    <xsl:text>" ) = </xsl:text>
    <xsl:value-of select="d:levenshtein($s,$t)"/>
  </xsl:function>


  <xsl:function name="d:levenshtein">
    <xsl:param name="s" as="xs:string"/>
    <xsl:param name="t" as="xs:string"/>
    <xsl:variable name="n" select="string-length($s)"/>
    <xsl:variable name="m" select="string-length($t)"/>
    <xsl:variable name="ss" select="string-to-codepoints($s)"/>
    <xsl:variable name="tt" select="string-to-codepoints($t)"/>
    <xsl:sequence
    select="if ($n=0) 
        then
        $m
        else
                if ($m=0)
                then
                $n
                else
        d:levenshtein2(
        $m,
        $n,
        (for $i in (0 to $n) return $i,
                 for $j in (1 to $m) return 
                   ($j, for $i in (1 to $n) return 
                      (if($ss[$i]=$tt[$j]) then 0 else 1))),
        $n+3)
"/>
  </xsl:function>

  <xsl:function name="d:levenshtein2">
    <xsl:param name="m" as="xs:integer"/>
    <xsl:param name="n" as="xs:integer"/>
    <xsl:param name="a" as="xs:integer+"/>
    <xsl:param name="p" as="xs:integer"/>
<xsl:if test="$debug='yes'">
<xsl:value-of select="$m,$n"/>
======================================
<xsl:for-each select="0 to ($m)">
<xsl:text>&#10;</xsl:text>
<xsl:value-of select="for $x in subsequence($a,1+ . * ($n+1),$n+1) return format-number($x,'00')"/>
</xsl:for-each>
======================================
</xsl:if>
<xsl:choose>
  <xsl:when test="$p=2+($n+1)*($m+1)">
    <xsl:sequence select="$a[last()]"/>
  </xsl:when>
  <xsl:otherwise>
  <xsl:sequence select="d:levenshtein2($m,$n,
            ($a[position()&lt;$p],  
                        min((
$a[position()=$p - ($n+1)]+1,
$a[position()=$p - 1] + 1,
$a[position()=$p - ($n+2)]+$a[position()=$p]
                        )),
            $a[position()&gt;$p]),
            if($p mod ($n+1) = 0) then $p+2 else $p+1)
    "/>
  </xsl:otherwise>
</xsl:choose>
  </xsl:function>




</xsl:stylesheet>

$  saxon8 -it test levenshtein.xsl 

distance( "0123456789001234567890" , "01234567890ABCDEFGHIJ" ) = 11
distance( "GUMBO" , "GAMBOL" ) = 2
distance( "GUMBO" , "GAMBOL" ) = 2
distance( "GUMBO" , "" ) = 5
distance( "" , "GAMBOL" ) = 6

David

Re: Levenshtein distance in XSLT 2.0

The reason the function isn’t being treated as tail recursive is because it’s doing a cardinality check on the function result. This is to implement the “as=xs:integer”. The cardinality check is being applied because the other branch of the choice (the $thisRow[last()]) has a static cardinality of zero-or-one.

There are two things I could do to improve matters here. One is to recognize that when $thisRow has cardinality one-or-more, then $thisRow[last()] (and indeed $thisRow[1]) will have cardinality exactly-one. That’s something of a special case and one expects it won’t come up that often. This other thing is to move the cardinality check into the branches of the choice where it is actually needed.

It’s true that templates sometimes benefit from tail recursion in cases where functions don’t. The main reason for this is that templates are always evaluated in push mode, where concatenation of the result into a sequence is essentially a null operation. Functions are evaluated in pull or push mode depending on the circumstances, and it’s a run-time decision. In this case the template isn’t tail-recursive for the same reason as the function, that is, because of the (unnecessary) cardinality check applied to the result.

Re: Levenshtein distance in XSLT 2.0

Ah-ha! Well, I’m rather disgruntled about this. I thought that the point of adding type declarations everywhere was that the processor could use them to do clever things, and now I learn that (a) putting as="xs:integer" on my function causes the problem and (b) the as="xs:integer+" on the $thisRows declaration doesn’t get used! Makes me wonder why I bother…

Anyway, adding an exactly-one() call around $thisRows[last()] means that Saxon now recognises the function, but not the template, as tail-recursive. Removing the as="xs:integer" from the template is the only way I could find to fix that. I’m about to upload the new version (which also fixes the problem that the function didn’t do the right thing if either of the strings was empty).

Re: Levenshtein distance in XSLT 2.0

Oh sorry yours is O(M) already just storing 2 rows, so probably is closer to what I was aiming for than what I posted (which builds the whole matrix) so I don’t think I’ll bother fixing mine.

David