Re: Unicode database in XML

I do sometimes wonder if we’re the same person:-)

most (or some, depending on how you count) of the data in UnicodeData.txt has been available as “unicode.xml” for years (er, decades nearly:-) First in Sebastian’s jadetex support for dsssl then in the MathML sources, and most recently in the sources for the entity set draft at http://www.w3.org/2003/entities/xml/unicode.xml

As part of the build up to the the MathML 3 drafts I recently needed to update those to Unicode 5 (which has some new characters specifically to support the entity sets) and so I got fed up of only having “most” of UnicodeData.txt and so I “put together an XSLT 2.0 stylesheet” which added all the missing info.

The draft is currently w3c-member only but only really as that’s a convenient cvs archive until ready to update the public version. If you (or your readers) have w3c access they may like to compare with http://www.w3.org/Math/Group/spec/xml/unicode.xml (the http-view of that is a client side stylesheet which doesn’t show all the information in the file it does have all of unicodedata.txt plus a pile of other stuff about entity names and names in mathematica, affii glyph register and TeX etc. I really should get that updated to a public part of the site…

Although actually while having unicodedata as xml is clearly a good thing, isn’t the approved way of solving the stated example problem to use a collation to do the comparison, which probably needs to be written in some less interesting language than XSLT, and may not benefit so much from an xml version?

David

Reply

The content of this field is kept private and will not be shown publicly.