Datatype Library Language (DTLL)

Status

This document is a basic specification of the Datatype Library Language (DTLL). It includes, embedded within it, the RELAX NG Compact Syntax schema for DTLL. There are still many areas that require greater detail.

This version is a simplification of the previous version of DTLL which attempts to find the minimum required to support the definition of datatypes for the purposes of validation. In particular, the changes are:

Introduction

Unlike XML Schema, RELAX NG doesn't provide a mechanism for users to define their own types. If they're not satisfied with the two built-in types of string and token, RELAX NG users have to create a datatype library, which they then refer to from the schema.

Most RELAX NG validators provide built-in support for the XML Schema datatype library. Many also support an interface that allows you to plug in datatype modules, written in the programming language of your choice, to define extra datatypes. But the fact that these datatype libraries have to be programmed means that ordinary users find them hard to construct.

One option would be for RELAX NG validators to support datatype definition via XML Schema - using <xs:simpleType> elements to create new atomic types. However, there are several problems with this:

So the primary motivation for putting together a language for datatype libraries is to enable RELAX NG users to construct their own datatypes without having to resort to a procedural programming language or having to learn how to use XML Schema, which might not be suited for their needs.

Overview

datatypes xs = "http://www.w3.org/2001/XMLSchema-datatypes"
default namespace dt = "http://www.jenitennison.com/datatypes"
namespace local = ""

start = \datatypes

<datatypes> is the document element.

The version attribute holds the version of the datatype library language. The current version is 0.4.

If a DTLL version 0.4 processor encounters a datatype library with a version higher than 0.4, it must treat any attributes or elements that it doesn't understand (that are not part of DTLL 0.4) in the same way as it would treat extension attributes or elements found in the same location.

\datatypes = element datatypes {
               attribute version { "0.4" },
               ns?, 
               extension-attribute*,
               top-level-element*
             }

Top-Level Elements

top-level-element |= named-datatype
top-level-element |= top-level-map
top-level-element |= \include
top-level-element |= \div
top-level-element |= extension-top-level-element

<include> elements include datatype libraries from elsewhere. It is as if the content of the included document (the children of the <datatypes>element) is inserted into the datatype library in place of the <include> element.

\include = element include {
             attribute href { xs:anyURI },
             extension-attribute*
           }

It is an error for a datatype library to contain circular includes. If the datatype library A includes the datatype library B, then B must not include A or include any datatype library that (at any remove) includes A.

<div> elements are simply used to partition a datatype library and to provide a scope for ns attributes.

\div = element div {
         ns?,
         extension-attribute*,
         top-level-element*
       }

Extension top-level elements can be used to hold data that is used within the datatype library (such as code lists used to test enumerated values), documentation, or information that is used by implementations. For example, an extension top-level element can be used by an implementation to define extension functions (using XSLT, for example) that can be used in the XPath expressions used within the datatype library.

extension-top-level-element = extension-element

Datatype Definitions

Named datatypes are given at the top level of the datatype library using <datatype> elements. Each named datatype has a qualified name that can be used to refer to it.

The name of the datatype is given in the name attribute. If this is unprefixed, the nearest ancestor ns attribute (including one on the <datatype> element itself) is used to provide the namespace for the datatype.

named-datatype = element datatype {
                   attribute name { xs:QName }, ns?,
                   extension-attribute*,
                   datatype-definition-element*
                 }

Anonymous datatypes are used to provide the datatype for a property or variable if that property or variable's type can't be referred to by name.

anonymous-datatype = element datatype { 
                       extension-attribute*,
                       datatype-definition-element*
                     }

Datatypes are referenced using qualified names. If the qualified name hasn't got a prefix, the nearest ancestor ns attribute (including one on the element that's referring to the datatype) is used to resolve the name.

datatype-reference = xs:QName

A datatype definition consists of a number of elements that test values and define variables. If a value passes the tests specified by these elements, then it's a valid value for the datatype.

datatype-definition-element |= property
datatype-definition-element |= parse
datatype-definition-element |= condition
datatype-definition-element |= except
datatype-definition-element |= variable
datatype-definition-element |= local-map
datatype-definition-element |= extension-definition-element

Extension definition elements can be used at any point within a datatype definition. If a processor doesn't recognise an extension definition element, it must ignore it and behave as if the value passed whatever test the extension definition element represented.

Using Extension Definition Elements for Documentation

Extension definition elements can be used to hold documentation about the datatype. For example, an <eg:example> element might be used to provide example legal values of the datatype:

<datatype name="RRGGBBColour">
  <eg:example>#FFFFFF</eg:example>
  <eg:example>#123456</eg:example>
  <parse name="RRGGBB">
    <regex>#(?[RR][0-9A-F]{2})(?[GG][0-9A-F]{2})(?[BB][0-9A-F]{2})</regex>
  </parse>
  ...
</datatype>
extension-definition-element = extension-element

Except

Certain aspects of a datatype definition can be negated by being placed in an <except> element. A value is only valid if it isn't valid according to any of the datatype definition elements held within an <except> element.

except = element except {
           extension-attribute*,
           negative-test+
         }
         
negative-test |= condition
negative-test |= variable
negative-test |= parse

Parsing

Parsing can perform two functions: it tests whether a value adheres to a particular format, and can assign a tree value to a variable to enable pieces of the string value to be extracted, tested, assigned to properties and so on.

The <parse> element holds any number of parsing methods, one or more of which must be satisfied in order for the value to be considered valid. The name attribute, if present, specifies the name of the variable to which the tree resulting from the parse is assigned. The first successful parse of those specified within the <parse> element is used to give the value of this variable (thus the processor does not have to attempt to perform any parses once one has been successful).

A datatype can specify as many <parse> elements as it wishes. All must be satisfied by a value for that value to be a legal value of the datatype.

parse = element parse {
          name?, preprocess*,
          extension-attribute*,
          parsing-method+
        }

Preprocessing

Before a value is parsed by a <parse> element, it can be preprocessed. This does not change the string value, but it may simplify the specification of the parsing method that's used.

The only built-in form of preprocessing is whitespace processing. The whitespace can be preserved ('preserve'), whitespace characters replaced by space characters ('replace'), or leading and trailing whitespace stripped and sequences of whitespace characters replaced by spaces ('collapse', the default).

preprocess |= attribute whitespace {
                "preserve" | "replace" | "collapse"
              }

Implementations may specify extension preprocessing methods with additional attributes. These must be ignored by implementations that don't support them.

preprocess |= extension-preprocess-attribute
extension-preprocess-attribute = extension-attribute

Parsing Methods

There are two core methods of parsing: via a regular expression, and by specifying a list. This set of methods can be supplemented by extension parsing elements.

parsing-method |= regex
parsing-method |= \list
parsing-method |= extension-parsing-element

Regex Parsing

The <regex> element specifies parsing via an extended regular expression. To be a legal value, the entire string value must be matched by the regular expression. (Although it's legal to use ^ and $ to mark the beginning and end of the matched string, it's not necessary.)

The tree value generated by parsing consists of a root (document) node with text node and element children. The string value of the root (document) node is the string value itself. There is one element for each named subexpression. The element's name being the name of the subexpression with the namespace indicated by the prefix indicated in the name. If no prefix is used, the element is in no namespace. The string value of each of these elements is the matched part of the string value as a whole.

Regular Expression Parsing

For example, the regex:

(?[year]-?[0-9]{4})-(?[month][0-9]{2})-(?[day][0-9]{2})

parsing the value:

2003-12-19

generates the tree:

(root)
   +- year
   |   +- "2003"
   +- "-"
   +- month
   |   +- "12"
   +- "-"
   +- day
       +- "19"
regex = element regex {
          regex-flags*,
          extension-attribute*,
          extended-regular-expression
        }
Regular Expression Flags

Four attributes modify the way in which regular expressions are applied. These are equivalent to the flags available within XPath 2.0.

By default, the "." meta-character matches all characters except the newline (#xA) character. If dot-all="true" then "." matches all characters, including the newline character.

regex-flags |= attribute dot-all { boolean }

By default, ^ matches the beginning of the entire string and $ the end of the entire string. If multi-line="true" then ^ matches the beginning of each line as well as the beginning of the string, and $ matches the end of each line as well as the end of the string. Lines are delimited by newline (#xA) characters.

regex-flags |= attribute multi-line { boolean }

By default, the regular expression is case sensitive. If case-insensitive="true" then the matching is case-insensitive, which means that the regular expression "a" will match the string "A".

regex-flags |= attribute case-insensitive { boolean }

By default, whitespace within the regular expression matches whitespace in the string. If ignore-whitespace="true", whitespace in the regular expression is removed prior to matching, and you need to use "\s" to match whitespace. This can be used to create more readable regular expressions.

Ignoring Whitespace in Regular Expressions
<regex ignore-whitespace="true">
    (?[year][0-9]{4})-
    (?[month][0-9]{2})-
    (?[day][0-9]{2})
  </regex>

This is not the same as <parse whitespace="collapse">...</parse>, which preprocesses the string value itself.

regex-flags |= attribute ignore-whitespace { boolean }

Boolean values are 'true' or 'false', with optional leading and trailing whitespace.

boolean = xs:boolean { pattern = "true|false" }

Lists

The <list> element specifies parsing of the string value into a list of values, simply using a separator attribute to provide a regular expression to break up the list into items.

The result of parsing the string value based on the <list> element is a node-set of sibling elements. The names of the item elements are implementation-defined.

Parsing Lists

For example, if you have:

<list separator="\s*,\s*" />

and the string value:

1, 2, 3, 45

then the variable is set to the elements in the tree:

(root)
   +- item
   |   +- "1"
   +- item
   |   +- "2"
   +- item
   |   +- "3"
   +- item
       +- "45"

These elements need not be named 'item'.

The separator attribute specifies a regular expression that matches the separators in the list. The default is "\s+" (one or more whitespace characters). It is an error if the regular expression matches an empty string (i.e. if it matches "").

\list = element list {
          attribute separator { regular-expression }?,
          extension-attribute*
        }

Extension Parsing Elements

Extension parsing elements can be used to parse elements using methods other than the core methods explained above. Extension parsing elements can be used, for example, to parse a value using EBNF (Extended Backus-Naur Form) or PEGs (Parsing Expression Grammars).

If the extension parsing element isn't recognised, the value is considered to fail the parse. If the extension parsing element occurs in a <parse> element without any alternative parsing methods, this means no value can match the datatype, and the implementation must issue a warning. Usually, an extension parsing element will be used alongside a built-in parsing method.

Using Extension Parsing Elements
<parse name="path">
   <ext:ebnf ref="http://www.w3.org/1999/xpath" />
   <regex dot-all="true">.*</regex>
 </parse>
extension-parsing-element = extension-element

Testing

Conditions define run-time tests that check values.

The <condition> element tests whether a particular condition is satisfied by a value. The value is not valid if the test evaluates to false.

condition = element condition {
               extension-attribute*,
               test 
             }

Tests are done through a test attribute which holds an XPath expression. If the effective boolean value of the result of evaluating the XPath expression is true then the test succeeds and the condition is satisfied.

test = attribute test { XPath }

Variable Binding

Properties and variables declare variables for use in binding expressions (i.e. XPath expressions). Property variables are of the form $this.name where name is the name of the property; ordinary variables just use the name of the variable. The variable $this refers to the value itself (as does the XPath expression .).

Variable binding is carried out in the order the variables are declared. It is an error if a variable is referenced without being declared. The scope of a variable binding is limited to the following siblings of the variable declaration and their descendants.

Properties

The <property> element specifies a property of the datatype. The values of properties are available via the dt:property() extension function within XPath expressions in DTLL (or via other implementation-defined APIs). The value of a property for a value can be referenced using $this.name where name is the value of the name attribute on the <property> element.

Properties

For example, consider:

<datatype name="RRGGBB">
   <parse name="colour">
     <regex ignore-whitespace="true">
       #(?[red][0-9A-F]{2})
        (?[green][0-9A-F]{2})
        (?[blue][0-9A-F]{2})
     </regex>
   </parse>
   <property name="red" type="hexByte" select="$colour/red" />
   <property name="green" type="hexByte" select="$colour/green" />
   <property name="blue" type="hexByte" select="$colour/blue" />
   <property name="is-greyscale" select="$this.red = $this.green and 
                                         $this.green = $this.blue" />
</datatype>
property = element property {
             name, type?, binding,
             extension-attribute*
           }

Variables

The <variable> element binds a value to a variable. Variables are similar to properties except that their values aren't accessible via APIs. The value of a variable is accessed through $name , where name is the name of the variable. It is an error if the name of a variable starts with (or is) 'this'. For future use, it is also an error if the name of a variable starts with (or is) 'type'. Variables are used for intermediate calculations.

variable = element variable {
             name, type?, binding,
             extension-attribute*
           }

Type Specifiers

There are two ways to specify a type: via a type attribute or via an anonymous <datatype> element.

type |= attribute type { datatype-reference }
type |= anonymous-datatype

If there is a mapping specified from the type of the provided value to the required type, then that mapping is used to convert the value to the required type. If the value is a standard XPath 1.0 type (string, number, boolean or node-set), then that value is converted to a string using the string() function and interpreted as the string value of the required type. Otherwise (there's no mapping and the value is not a standard XPath type), it's an error.

If no type is specified for a variable or property, then the supplied value is used directly. Note that this value can be a standard XPath type (string, number, boolean or node-set) as well as a value of a datatype defined in the datatype library.

Value Specifiers

There are two built-in ways to bind a value to a property or variable: through the value attribute, which holds a literal value or through a select attribute, which holds an XPath expression. Implementations can also define their own extension binding elements.

binding = (literal-value | select), extension-binding-element*

If a value attribute is specified, its value is the string value of the value of the variable or property; the type of the variable or property is used to interpret that value.

literal-value = attribute value { text }

If a select attribute is specified, the XPath expression it contains is evaluated to give the value of the property or variable.

select = attribute select { XPath }

Extension binding elements are used where more power is needed to specify the value of a parameter, property or variable. This can be used to provide values using methods such as XSLT or MathML. If an implementation does not support any of the extension binding elements specified, then it must assign to the variable the value specified by the value or select attribute instead. If an implementation supports one or more of the extension binding elements, then it must use the first extension binding element it understands to calculate the value of the variable.

extension-binding-element = extension-element

Maps

Maps provide a way of converting a value of one datatype to another datatype. Maps are either strong or weak. If there's a strong map from datatype A to datatype B then every legal value of datatype A must map onto a legal value of datatype B. A weak map means that some of the values of datatype A can be mapped on to legal values of datatype B. In both cases, the mapping is uni-directional: often a strong map from A to B is coupled with a weak map from B to A.

The <map> element defines a map from one datatype to another. The attributes of the <map> element defines how the mapping is done.

Note that it is possible for there to be maps to and from two datatypes, but it is not necessarily the case that a round-trip will result in the same string value.

Changes When Round-Tripping

For example, with the datatype definitions:

<datatype name="UKDate">
  <parse name="date">
    <regex ignore-whitespace="true">
      (?[day][0-9]{1,2})/(?[month][0-9]{1,2})/(?[year][0-9]{4})
    </regex>
  </parse>
  <property name="year" select="$date/year" />
  <property name="month" select="$date/month" />
  <property name="day" select="$date/day" />
</datatype>

<datatype name="ISODate">
  <parse name="date">
    <regex ignore-whitespace="true">
      (?[year][0-9]{4})-(?[month][0-9]{2})-(?[day][0-9]{2})
    </regex>
  </parse>
  <property name="year" select="$date/year" />
  <property name="month" select="$date/month" />
  <property name="day" select="$date/day" />
</datatype>
        
<map from="UKDate" to="ISODate"
  select="concat(format-number($this.year, '0000'), '-',
                 format-number($this.month, '00'), '-',
                 format-number($this.day, '00'))" />
        
<map from="ISODate" to="UKDate"
  select="concat($this.day, '/', $this.month, '/', $this.year)" />

the UKDate "5/1/1947" maps to the ISODate "1947-01-05", which maps back to the UKDate "05/01/1947".

Local maps appear within a <datatype> element and define maps from or to the datatype in which they're defined to or from the datatype referenced in the to or from attribute. Top-level maps appear within the <datatypes> element and define maps from the datatype referenced in the from attribute to the datatype referenced in the to attribute.

local-map = element map {
              (from | to), kind?, mapping,
              extension-attribute*
            }
      
top-level-map = element map {
                  from, to, kind?, mapping,
                  extension-attribute*
                }

The to attribute holds a reference to a datatype, which is the datatype to which a value can be mapped, or a *. The value * indicates that the map describes how to map from the datatype specified by the from attribute to any other datatype.

to = attribute to { datatype-reference | "*" }

The from attribute holds a reference to a datatype, which is the datatype from which a value can be mapped, or a *. The value * indicates that the map describes how to map to the datatype specified by the to attribute from any other datatype.

from = attribute from { datatype-reference | "*" }

The kind attribute indicates whether the map is a strong or weak map. Strong maps are guaranteed to succeed; weak maps may fail, depending on the value. If the kind attribute is missing, the <map> element defines a strong map if both datatypes are specified and a weak map if the map is to/from any type.

kind = attribute kind { "strong" | "weak" }

A <map> element that specifies a map from datatype A to datatype B also implicitly defines weak maps from A to any type via B and from any type to B via A.

It is an error if there are two or more explicit maps defined between the same two datatypes, or from/to a datatype and any type. It is an error if there are two or more implicit maps from/to a datatype and any type unless there is an explicit map from/to that datatype and any type. There can only be one map defined to be from any type to any type.

Mapping Error

The following is an error:

1 | <map from="A" to="B" select="..." />
2 | <map from="A" to="C" select="..." />

1 sets up an explicit map from A to B. This sets up an implicit map from A to any type via B and from any type to B via A. 2 sets up an explicit map from A to C. This sets up an implicit map from A to any type via B and from any type to B via A. This is an error because there are two implicit maps from A to any type and no explicit map from A to any type. To fix the error, an explicit map from A to any type needs to be created:

<map from="A" to="*" as="B" />

The map itself is defined through a binding which creates a string which is a valid string value for the target datatype or through an as attribute. If an as attribute is provided, the mapping must be carried out via the intermediate datatype specified by the as attribute.

mapping |= binding
mapping |= attribute as { datatype-reference }
Mapping via other datatypes

In this example, hue-saturation-luminance (HSL) and red-green-blue (RGB) colours map to and from each other directly, and colour keywords can be mapped to RGB colours. To map from colour keywords to HSL colours, you first convert from the keyword to the RGB colour, then from that to the equivalent HSL colour.

<map from="HSLcolour" to="RGBcolour" select="..." />
<map from="RGBcolour" to="HSLcolour" select="..." />
<map from="colourkeyword" to="RGBcolour" select="..." />
<map from="colourkeyword" to="HSLcolour" as="RGBcolour" />

To work out how to convert from a source value of type S to a target value of a required type R, an application has to locate an appropriate mapping pathway to use. A mapping pathway can consist of several steps via intermediate types. To convert from S to R, the value is converted from S to an intermediate type I and then from I to R.

There is a mapping pathway from S to R if a mapping binding is specified for converting directly from S to R, or if there is a mapping pathway from S to I and a mapping pathway from I to R.

There may be multiple mappings specified from S to R. The first of the following list of available mappings that provides a mapping pathway from S to R is used.

Identifying Mapping Pathways

Consider the following mapping definitions and the mapping from A to B:

1 | <map from="A" to="*" as="C" />
2 | <map from="A" to="C" select="..." />
3 | <map from="A" to="D" select="..." />
4 | <map from="D" to="B" select="..." />

These explicit mappings generate the following implicit mappings:

5 | <map from="*" to="C" as="A" />
6 | <map from="*" to="D" as="A" />
7 | <map from="D" to="*" as="B" />
8 | <map from="*" to="B" as="D" />

There are two possible mappings from A to B: 1 (explicitly from A to any type, via C) and 8 (implicitly from any type to B via D). Since 1 is preferred over 8, we first try to find a mapping pathway via C. There's a mapping binding from A to C (2), and an implicit mapping (8) from any type to B via D, so we try to find a mapping from C to D and from D to B. There's an implicit mapping from any type to D (6) via A, so we need mappings from C to A and from A to D. There's no mapping from C to A, so there's no mapping pathway based on 1.

Since the mapping defined by 1 does not lead to a mapping pathway, we try to find a mapping pathway using the mapping defined by 8, via D. There's a mapping binding defined from A to D (3) and a mapping binding defined from D to B (4).

The final conversion used is from A to D to B, using the mapping bindings defined in 3 and 4.

Common Constructs

Common Types

XPath 1.0 expressions are used to bind values to variables or properties and to express tests in conditions.

XPath = text

Variable and property values are available within an XPath expression if the variable or property is declared prior to the XPath expression.

Within a datatype library, each datatype has a corresponding extension function named after the name of the datatype. This function takes a single argument, which can be of any type, and returns a typed value of the type specified by the name of the function. The supplied value is converted to the required type using the same rules as for type conversions for variables. Note that this works for all datatypes, including lists.

Other extension functions are:

dt:item(list-value, number)

returns the item in the list-value at the index given by the number (counting starts from 1); returns an empty string if the number is greater than the number of items in the list-value. Values that aren't of a list type are treated like list-type values with a single item.

dt:property(value, prop-name)

returns the value of the named property for the value

dt:if(test, true, false)

returns the true value if the test is true and the false value if the test is false. Note that both the true and false arguments are evaluated (unlike the if expression in XPath 2.0.

dt:default(value, default)

returns the first argument if the effective boolean value of the first argument is true, and the second argument otherwise

A regular expression as defined in XPath 2.0

regular-expression = text

Extended regular expressions can have named subexpressions. Named subexpressions are specified with the syntax (?[name]regex) where name is name of the subexpression and regex is the subexpression itself.

Extended Regular Expression
(?[year]-?[0-9]{4})-(?[month][0-9]{2})-(?[day][0-9]{2})
extended-regular-expression = text

Common Attributes

name = attribute name { xs:NCName }
dt-name = attribute dt:name { xs:NCName }
ns = attribute ns { xs:anyURI }

Extension Elements and Attributes

Extension elements are any attributes that aren't in the DTLL namespace. They can contain anything (including DTLL elements). Extension attributes are any attributes that are in neither the DTLL namespace or no namespace (unprefixed). They can have any kind of value.

extension-element = element * - dt:* { anything }

extension-attribute = attribute * - (local:* | dt:*) { text }

anything = attribute * { text }*,
           mixed { element * { anything }* }

Appendix A Extended Examples

Numbers

This example shows a way of defining the numeric datatypes from XML Schema, plus a hexadecimal byte datatype. It includes an extension function defined within the datatype library using XSLT 2.0.

<datatypes version="0.4" 
           xmlns="http://www.jenitennison.com/datatypes"
           xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
           xsl:version="2.0"
           xmlns:eg="http://www.jenitennison.com/datatypes/examples"
           ns="http://www.jenitennison.com/datatypes/examples">
  
<datatype name="double">
  <parse name="double">
    <regex>(?[mantissa](\+|-)?[0-9]+(\.[0-9]+)?)([eE](?[exponent][0-9]+))?</regex>
    <regex>(?[inf]\+?INF)</regex>
    <regex>(?[neginf]-INF</regex>
    <regex>(?[nan]NaN)</regex>
  </parse>
  <variable name="mantissa" type="decimal" select="$double/mantissa" />
  <variable name="exponent" type="integer" 
    select="dt:default($double/exponent, 0)" />
  <property name="xpath-value" 
    select="dt:if($double/inf,
                  1 div 0,
            dt:if($double/neginf,
                  -1 div 0,
            dt:if($double/nan,
                  number('NaN'),
              eg:power($mantissa, $exponent))))" />
</datatype>  

<datatype name="decimal">
  <parse>
    <regex>(\+|-)?[0-9]+(\.[0-9]+)?</regex>
  </parse>
  <map to="double" select="." />
  <map from="double" kind="weak" select="." />
</datatype>    
  
<datatype name="integer">
  <parse>
    <regex>(\+|-)?[0-9]+</regex>
  </parse>  
  <map from="decimal" select="round(.)" />
  <map to="decimal" select="." />
</datatype>

<datatype name="nonNegativeInteger">
  <parse>
    <regex>\+?[0-9]+</regex>
  </parse>    
  <map from="integer" select="dt:if(. >= 0, ., -.)" />
  <map to="integer" select="." />
</datatype>  
  
<datatype name="positiveInteger">
  <condition test=". != 0" />
  <map to="nonNegativeInteger" select="." />
  <map from="nonNegativeInteger" kind="weak" select="." />
</datatype>
  
<datatype name="nonPositiveInteger">
  <parse>
    <regex>-[0-9]+</regex>
  </parse>  
  <map from="integer" select="dt:if(. > 0, -., .)" />
  <map to="integer" select="." />
</datatype>  
  
<datatype name="negativeInteger">
  <variable name="value" type="nonPositiveInteger" select="." />
  <condition test=". != 0" />
  <map to="nonPositiveInteger" select="." />
  <map from="nonPositiveInteger" kind="weak" select="." />
</datatype>  
  
<datatype name="long">
  <variable name="value" type="integer" select="." />
  <condition test=". >= -9223372036854775808" />
  <condition test=". &lt;= 9223372036854775807" />
  <map to="integer" select="." />
  <map from="integer" kind="weak" select="." />
</datatype>  

<datatype name="int">
  <variable name="value" type="long" select="." />
  <condition test=". >= -2147483648" />
  <condition test=". &lt;= 2147483647" />
  <map to="long" select="." />
  <map from="long" kind="weak" select="." />
</datatype>  

<datatype name="short">
  <variable name="value" type="int" select="." />
  <condition test=". >= -32768" />
  <condition test=". &lt;= 32767" />
  <map to="int" select="." />
  <map from="int" kind="weak" select="." />
</datatype>  

<datatype name="byte">
  <variable name="value" type="short" select="." />
  <condition test=". >= -128" />
  <condition test=". &lt;= 127" />
  <map to="short" select="." />
  <map from="short" kind="weak" select="." />
</datatype>  

<datatype name="unsignedLong">
  <variable name="value" type="nonNegativeInteger" select="." />
  <condition test=". &lt;= 18446744073709551615" />
  <map to="nonNegativeInteger" select="." />
  <map from="nonNegativeInteger" kind="weak" select="." />
</datatype>  

<datatype name="unsignedInt">
  <variable name="value" type="unsignedLong" select="." />
  <condition test=". &lt;= 4294967295" />
  <map to="unsignedLong" select="." />
  <map from="unsignedLong" kind="weak" select="." />
</datatype>  

<datatype name="unsignedShort">
  <variable name="value" type="unsignedInt" select="." />
  <condition test=". &lt;= 65535" />
  <map to="unsignedInt" select="." />
  <map from="unsignedInt" kind="weak" select="." />
</datatype>  
  
<datatype name="unsignedByte">
  <variable name="value" type="unsignedShort" select="." />
  <condition test=". &lt;= 255" />
  <map to="unsignedShort" select="." />
  <map from="unsignedShort" kind="weak" select="." />
</datatype>  
  
<datatype name="hexByte">
  <parse>
    <regex>[0-9A-F]{2}</regex>
  </parse>
  <variable name="hexDigits" select="'0123456789ABCDEF'" />
  <map to="unsignedByte" 
    select="string-length(substring-before(substring(., 1, 1),
                                           $hexDigits) * 16 +
            string-length(substring-before(substring(., 2, 1),
                                           $hexDigits))" />
  <map from="unsignedByte" 
    select="concat(substring($hexDigits, floor(. div 16), 1),
                   substring($hexDigits, . mod 16, 1))" />
</datatype>  
  
<xsl:function name="eg:power">
  <xsl:param name="number" />
  <xsl:param name="power" />
  <xsl:sequence select="eg:_power($number, $power, 1)" />
</xsl:function>

<xsl:function name="eg:_power">
  <xsl:param name="number" />
  <xsl:param name="power" />
  <xsl:param name="result" />
  <xsl:choose>
    <xsl:when test="$power = 0">
      <xsl:sequence select="$result" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:sequence select="eg:_power($number, $power - 1, $result * $number)" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>  
  
</datatypes>

Dates and Durations

This example illustrates the date, time and duration types from XML Schema and XPath 2.0.

<datatypes version="0.4" 
           xmlns="http://www.jenitennison.com/datatypes"
           xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
           xsl:version="2.0"
           xmlns:eg="http://www.jenitennison.com/datatypes/examples"
           ns="http://www.jenitennison.com/datatypes/examples">

<datatype name="dateTime">
  <parse name="dateTime">
    <regex ignore-whitespace="true">
      (?[date]-?[0-9]{4,}-[0-9]{2}-[0-9]{2})
      T(?[time][0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+))
      (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))?
    </regex>
  </parse>
  <property name="timezone" type="timezone" select="$dateTime/timezone" />
  <property name="date" type="date" 
    select="concat($dateTime/date, $this.timezone)" />
  <property name="time" type="time" 
    select="concat($dateTime/time, $this.timezone)" />
  <property name="year" type="year" select="dt:property($this.date, 'year')" />
  <property name="month" type="month" select="dt:property($this.date, 'month')" />
  <property name="day" type="day" select="dt:property($this.date, 'day')" />
  <property name="hour" type="hour" select="dt:property($this.time, 'hour')" />
  <property name="minute" type="minute" select="dt:property($this.time, 'minute')" />
  <property name="second" type="second" select="dt:property($this.time, 'second')" />
</datatype>  
  
<datatype name="date">
  <parse name="date">
    <regex ignore-whitespace="true">
      (?[year]-?[0-9]{4,})-
      (?[month][0-9]{2})-
      (?[day][0-9]{2})
      (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))?
    </regex>
  </parse>
  <property name="year" type="year" select="$date/year" />
  <property name="month" type="month" select="$date/month" />
  <property name="day" type="day" select="$date/day" />
  <property name="timezone" type="timezone" select="$date/timezone" />
  <condition test="($this.month = 1 or $this.month = 3 or $this.month = 5 
                    $this.month = 7 or $this.month = 8 or $this.month = 10 or
                    $this.month = 12) or $this.day &lt;= 30" />
  <condition test="$this.month != 2 or
                   $this.day &lt;= 28 or
                   ($this.day = 29 and
                    ($this.year mod 400 = 0 or
                     ($this.year mod 4 = 0 and
                      not($this.year mod 100 = 0))))" />
  
  <map from="dateTime" select="dt:property(., 'date')" />
  <map to="dateTime" 
    select="concat(dt:property(., 'year'), '-',
                   dt:property(., 'month'), '-',
                   dt:property(., 'day'), 'T00:00:00',
                   dt:property(., 'timezone'))" />
</datatype>
  
<datatype name="time">
  <parse name="time">
    <regex ignore-whitespace="true">
      (?[hour][0-9]{2}):
      (?[minute][0-9]{2}):
      (?[second][0-9]{2}(\.[0-9]+)?)
      (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))?
    </regex>
  </parse>
  <property name="hour" type="hour" select="$time/hour" />
  <property name="minute" type="minute" select="$time/minute" />
  <property name="second" type="second" select="$time/second" />
  <property name="timezone" type="timezone" select="$time/timezone" />
  <map from="dateTime" select="dt:property(., 'time')" />
</datatype>
  
<datatype name="gYearMonth">
  <parse name="gYearMonth">
    <regex ignore-whitespace="true">
      (?[year]-?[0-9]{4,})-
      (?[month][0-9]{2})
      (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))?
    </regex>
  </parse>
  <property name="year" type="year" select="$gYearMonth/year" />
  <property name="month" type="month" select="$gYearMonth/month" />
  <property name="timezone" type="timezone" select="$gYearMonth/timezone" />
  
  <map from="date"
    select="concat(dt:property(., 'year'), '-', 
                   dt:property(., 'month'),
                   dt:property(., 'timezone'))" />
  <map to="date" 
    select="concat(dt:property(., 'year'), '-',
                   dt:property(., 'month'), '-01',
                   dt:property(., 'timezone'))" />
</datatype>  

<datatype name="gYear">
  <parse name="gYear">
    <regex ignore-whitespace="true">
      (?[year]-?[0-9]{4,})
      (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))?
    </regex>
  </parse>
  <property name="year" type="year" select="$gYear/year" />
  <property name="timezone" type="timezone" select="$gYear/timezone" />
  
  <map from="gYearMonth"
    select="concat(dt:property(., 'year'), dt:property(., 'timezone'))" />
  <map to="gYearMonth" 
    select="concat(dt:property(., 'year'), '-01', dt:property(., 'timezone'))" />
</datatype>  
  
<datatype name="gMonthDay">
  <parse name="date">
    <regex ignore-whitespace="true">
      --(?[month][0-9]{2})-
      (?[day][0-9]{2})
      (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))?
    </regex>
  </parse>
  <property name="month" type="month" select="$date/month" />
  <property name="day" type="day" select="$date/day" />
  <property name="timezone" type="timezone" select="$date/timezone" />
  <condition test="($this.month = 1 or $this.month = 3 or $this.month = 5 
                    $this.month = 7 or $this.month = 8 or $this.month = 10 or
                    $this.month = 12) or $this.day &lt;= 30" />
  <condition test="$this.month != 2 or
                   $this.day &lt;= 29" />
  
  <map from="date"
    select="concat('--', dt:property(., 'month'), '-', 
                   dt:property(., 'day'),
                   dt:property(., 'timezone'))" />
</datatype>

<datatype name="gMonth">
  <parse name="date">
    <regex ignore-whitespace="true">
      --(?[month][0-9]{2})
      (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))?
    </regex>
  </parse>
  <property name="month" type="month" select="$date/month" />
  <property name="timezone" type="timezone" select="$date/timezone" />

  <map from="gMonthDay"
    select="concat('--', dt:property(., 'month'),
                   dt:property(., 'timezone'))" />
  <map to="gMonthDay"
    select="concat('--', dt:property(., 'month'), '-01',
                   dt:property(., 'timezone'))" />
</datatype>

<datatype name="gDay">
  <parse name="date">
    <regex ignore-whitespace="true">
      ---(?[day][0-9]{2})
      (?[timezone](Z|((\+|-)[0-9]{2}:[0-9]{2})))?
    </regex>
  </parse>
  <property name="day" type="day" select="$date/day" />
  <property name="timezone" type="timezone" select="$date/timezone" />

  <map from="gMonthDay"
    select="concat('---', dt:property(., 'day'),
                   dt:property(., 'timezone'))" />
</datatype>

<datatype name="year">
  <parse>
    <regex>-?[0-9]{4,}</regex>
  </parse>
  <condition test=". != 0" />
</datatype>
  
<datatype name="month">
  <parse>
    <regex>[0-9]{2}</regex>
  </parse>
  <condition test=". >= 1" />
  <condition test=". &lt;= 12" />
  <variable name="month-element"
    select="document('')/*/eg:months/eg:month[position() = $this]" />
  <property name="abbreviation" select="string($month-element/@abbr)" />
  <property name="name" select="string($month-element)" />
</datatype>

<eg:months>
  <eg:month abbr="Jan">January</eg:month>
  <eg:month abbr="Feb">February</eg:month>
  <eg:month abbr="Mar">March</eg:month>
  <eg:month abbr="Apr">April</eg:month>
  <eg:month abbr="May">May</eg:month>
  <eg:month abbr="Jun">June</eg:month>
  <eg:month abbr="Jul">July</eg:month>
  <eg:month abbr="Aug">August</eg:month>
  <eg:month abbr="Sep">September</eg:month>
  <eg:month abbr="Oct">October</eg:month>
  <eg:month abbr="Nov">November</eg:month>
  <eg:month abbr="Dec">December</eg:month>
</eg:months>  
  
<datatype name="day">
  <parse>
    <regex>[0-9]{2}</regex>
  </parse>
  <condition test=". >= 1" />
  <condition test=". &lt;= 31" />
</datatype>  
  
<datatype name="hour">
  <parse>
    <regex>[0-9]{2}</regex>
  </parse>
  <condition test=". >= 0" />
  <condition test=". &lt; 24" />
</datatype>
  
<datatype name="minute">
  <parse>
    <regex>[0-9]{2}</regex>
  </parse>
  <condition test=". >= 0" />
  <condition test=". &lt; 60" />
</datatype>
  
<datatype name="second">
  <parse>
    <regex>[0-9]{2}(\.[0-9]+)?</regex>
  </parse>
  <condition test=". >= 0" />
  <condition test=". &lt;= 60" />
</datatype>
  
<datatype name="timezone">
  <parse name="timezone">
    <regex>Z</regex>
    <regex>(?[hour](\+|-)[0-9]{2}):(?[minute][0-9]{2})</regex>
  </parse>
  <condition test="$timezone = 'Z' or 
                   ($timezone/hour >= -14 and $timezone/hour &lt;= 14)" />
  <condition test="$timezone = 'Z' or
                   ($timezone/minute >= 0 and $timezone/minute &lt; 60)" />
  <map to="dayTimeDuration" 
    select="dt:if($timezone =  'Z',
                  'PT0H0S',
            dt:if($timezone/hour >= 0,
                  concat('PT', $timezone/hour, 'H',
                               $timezone/minute, 'M'),
                  concat('-PT', -$timezone/hour, 'H',
                                $timezone/minute, 'M')))" />
  <map from="dayTimeDuration" kind="weak"
    select="dt:if(dt:property(., 'hours') = 0 and
                  dt:property(., 'minutes') = 0,
                  'Z',
            dt:if(dt:property(., 'hours') >= 0,
                  concat('+', format-number(dt:property(., 'hours'), '00'),
                         ':', format-number(dt:property(., 'minutes'), '00')),
                  concat('-', format-number(-dt:property(., 'hours'), '00'),
                         ':', format-number(-dt:property(., 'minutes'), '00'))))" />
</datatype>
  
<include href="numbers.dtl" />  
  
<datatype name="duration">
  <parse name="duration">
    <regex ignore-whitespace="true">
      (?[neg]-)?
      P(?[years][0-9]+Y)?
       (?[months][0-9]+M)?
       (?[days][0-9]+D)?
       (T(?[hours][0-9]+H)?
         (?[minutes][0-9]+M)?
         (?[seconds][0-9](\.[0-9]+)?)?)?
    </regex>
  </parse>
  <condition test="$duration/years or $duration/months or $duration/days or
                   $duration/hours or $duration/minutes or $duration/seconds" />
  <variable name="neg" type="integer" select="dt:if($duration/neg, -1, 1)" />
  <property name="years" type="nonNegativeInteger" 
    select="$neg * dt:default($duration/years, 0)" />
  <property name="months" type="nonNegativeInteger" 
    select="$neg * dt:default($duration/months, 0)" />
  <property name="days" type="nonNegativeInteger" 
    select="$neg * dt:default($duration/days, 0)" />
  <property name="hours" type="nonNegativeInteger" 
    select="$neg * dt:default($duration/hours, 0)" />
  <property name="minutes" type="nonNegativeInteger" 
    select="$neg * dt:default($duration/minutes, 0)" />
  <property name="seconds" type="decimal" 
    select="$neg * dt:default($duration/seconds, 0)" />
</datatype>  
  
<datatype name="canonical-duration">
  <parse>
    <regex ignore-whitespace="true">
      (?[neg]-)?
      P(?[months][0-9]+M)?
       (T(?[seconds][0-9](\.[0-9]+)?)?)?
    </regex>
  </parse>
  <variable name="duration" type="duration" select="." />
  <property name="months" type="nonNegativeInteger"
    select="dt:property($duration, 'months')" />
  <property name="seconds" type="nonNegativeInteger"
    select="dt:property($duration, 'seconds')" />
  <map from="duration"
    select="dt:if(dt:property(., 'years') >= 0,
                  concat('P', dt:property(., 'years') * 12 + 
                              dt:property(., 'months'), 'MT',
                              dt:property(., 'days') * 24 * 60 * 60 +
                              dt:property(., 'hours') * 60 * 60 +
                              dt:property(., 'minutes') * 60 +
                              dt:property(., 'seconds'), 'S'),
                  concat('-P', -dt:property(., 'years') * 12 +
                               -dt:property(., 'months'), 'MT'))
                               -dt:property(., 'days') * 24 * 60 * 60 +
                               -dt:property(., 'hours') * 60 * 60 +
                               -dt:property(., 'minutes') * 60 +
                               -dt:property(., 'seconds'), 'S')" />
  <map to="duration" select="." />
</datatype>
  
<datatype name="yearMonthDuration">
  <parse>
    <regex ignore-whitespace="true">
      (?[neg]-)?
      P(?[years][0-9]+Y)?
       (?[months][0-9]+M)?
    </regex>
  </parse>
  <variable name="duration" type="duration" select="." />
  <property name="years" type="nonNegativeInteger"
    select="dt:property($duration, 'years')" />
  <property name="months" type="nonNegativeInteger"
    select="dt:property($duration, 'months')" />
  
  <map from="duration"
    select="dt:if(dt:property(., 'years') >= 0,
                  concat('P', dt:property(., 'years'), 'Y',
                              dt:property(., 'months'), 'M'),
                  concat('-P', -dt:property(., 'years'), 'Y',
                               -dt:property(., 'months'), 'M'))" />
  <map to="duration" select="." />
</datatype>  
  
<datatype name="canonical-yearMonthDuration">
  <parse>
    <regex ignore-whitespace="true">
      (?[neg]-)?P(?[months][0-9]+M)
    </regex>
  </parse>
  <variable name="duration" type="canonical-duration" select="." />
  <property name="months" type="nonNegativeInteger"
    select="dt:property($duration, 'months')" />
  <map from="canonical-duration"
    select="dt:if(dt:property(., 'months') >= 0,
                  concat('P', dt:property(., 'months'), 'M'),
                  concat('-P', -dt:property(., 'months'), 'M'))" />
  <map to="canonical-duration" select="." />
</datatype>
  
<datatype name="dayTimeDuration">
  <parse>
    <regex ignore-whitespace="true">
      (?[neg]-)?
      P(?[days][0-9]+D)?
       (T(?[hours][0-9]+H)?
         (?[minutes][0-9]+M)?
         (?[seconds][0-9](\.[0-9]+)?)?)?
    </regex>
  </parse>
  <variable name="duration" type="duration" select="." />
  <property name="days" type="nonNegativeInteger"
    select="dt:property($duration, 'days')" />
  <property name="hours" type="nonNegativeInteger"
    select="dt:property($duration, 'hours')" />
  <property name="minutes" type="nonNegativeInteger"
    select="dt:property($duration, 'minutes')" />
  <property name="seconds" type="nonNegativeInteger"
    select="dt:property($duration, 'seconds')" />

  <map from="duration"
    select="dt:if(dt:property(., 'days') >= 0,
                  concat('P', dt:property(., 'days'), 'DT',
                              dt:property(., 'hours'), 'H',
                              dt:property(., 'minutes'), 'M',
                              dt:property(., 'seconds'), 'S'),
                  concat('-P', -dt:property(., 'days'), 'DT',
                               -dt:property(., 'hours'), 'H',
                               -dt:property(., 'minutes'), 'M',
                               -dt:property(., 'seconds'), 'S')" />
  <map to="duration" select="." />
</datatype>
  
<datatype name="canonical-dayTimeDuration">
  <parse>
    <regex ignore-whitespace="true">
      (?[neg]-)?PT(?[seconds][0-9]+S)
    </regex>
  </parse>
  <variable name="duration" type="canonical-duration" select="." />
  <property name="seconds" type="decimal"
    select="dt:property($duration, 'seconds')" />
  <map from="canonical-duration"
    select="dt:if(dt:property(., 'seconds') >= 0,
                  concat('PT', dt:property(., 'seconds'), 'S'),
                  concat('-PT', -dt:property(., 'seconds'), 'S'))" />
  <map to="canonical-duration" select="." />
</datatype>

</datatypes>