- markup (52)
- xml (7)
- xslt (21)
- atom (8)
- overlapping markup (2)
- schema (9)
- creole (4)
- xforms (1)
- pipelines (7)
- coding (2)
- dtll (1)
- genealogy (3)
- gtd (1)
- hardware (1)
- legislation (1)
- ontologies (2)
- unicode (1)
- web (24)
- google (3)
- rdf (6)
- rest (3)
- wikis (1)
- work (1)
- xpath (1)
- xquery (1)
- xtech2008 (3)
- life (26)
- children (5)
- equality (6)
- environment (4)
- gadgets (5)
- software (3)
- xlinq (2)
- conferences (7)
- xtech (6)
- blog (7)
- drupal (3)
Re: Things that make me scream: xml:space="preserve" in WordML
I’d definitely agree that Whitespace handling is hard. The SketchPath project I’ve been working on certainly has issues with this but doesn’t have the same dilemma as Oxygen because its not an XML editor. SketchPath mercilessly (without regard to xml:space) removes whitespace if this is likely to impact on auto-indenting and preserves it otherwise. In practice, the affected whitespace characters are consecutive linefeeds (one’s ok) or any number of tabs, these are replaced by a single space character.
Perhaps XML Editors should also have such a ‘read only’ auto-indented view? One thing I’m considering is colour-coding the space characters that replace other whitespace - so tabs could be ‘bluespace’ and linefeeds ‘redspace’, or is this unnecessary?
I’ve used Oxygen and found it very useful, with many excellent features. I’m really surprised therefore that Oxygen experiences the problems you describe with very long lines, hopefully this is on the Oxygen people’s ‘to do’ list.
If the the Oxygen auto-indent fix fails only because of WordML’s xml:space issue, then it should work with Word2007’s OOXML because this only uses xml:space on elements and then only when they contain whitespace.
I think of OOXML (and its predecessor) as ‘data centric’ rather than ‘document centric’ so I don’t judge it in the same light as, say, DocBook (keeping out of the OOXML vs ODF debate too). It would be interesting to see how well (or not) DocBook implementations handle whitespace, especially because it uses ‘mixed content’ elements.