<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jenitennison.com/blog" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>google</title>
 <link>http://www.jenitennison.com/blog/taxonomy/term/19</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>Partial implementations #2: XSLT in Google Search Appliance</title>
 <link>http://www.jenitennison.com/blog/node/64</link>
 <description>&lt;p&gt;A &lt;a href=&quot;http://www.google.com/enterprise/gsa/&quot; title=&quot;Google Search Appliance&quot;&gt;Google Search Appliance&lt;/a&gt; (GSA) is a box that you plug into your network which crawls and indexes your data, and serves up the results of searches. Search results come in an XML format, and there&amp;#8217;s a built in XSLT engine which means you can convert that XML into as many different views as you like. So you can have HTML-based search results, summaries, feeds, and so on.&lt;/p&gt;

&lt;p&gt;My task recently was to debug some XSLT that transformed the GSA XML into an Atom feed. Easy enough, right? The GSA &lt;a href=&quot;http://code.google.com/apis/searchappliance/documentation/46/xml_reference.html#results_xml&quot; title=&quot;Google Search Appliance Documentation: XML Results Reference&quot;&gt;XML format&lt;/a&gt; is pretty hideous &amp;#8212; most of the elements max out at three capital letters in length (whatever happened to human-readability) &amp;#8212; but logical enough, and the mapping is hardly complex.&lt;/p&gt;

&lt;p&gt;But all was not as it seemed. The GSA&amp;#8217;s XSLT implementation is&amp;#8230; how can I put this politely?&amp;#8230; &amp;#8220;non-standard&amp;#8221;. This post describes some of the problems and workarounds.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;To get the GSA to use your own XSLT, you have to go through its web interface. Basically there&amp;#8217;s a form with a text field in which you can type your XSLT. Or you can upload a file that you develop offline. Naturally you&amp;#8217;re going to do the latter because it means you can use your favourite editor with helpful things like syntax highlighting and validation-as-you-type, but of course that means switching between web browser windows and your IDE as you develop.&lt;/p&gt;

&lt;p&gt;So I upload the transformation, point the browser at a relevant search page, and&amp;#8230; oh&amp;#8230;&lt;/p&gt;

&lt;p&gt;When the GSA doesn&amp;#8217;t like the XSLT that you use, you get a really helpful error message. It says:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Internal server error.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So you know that there&amp;#8217;s been an error. With the server. Internally.&lt;/p&gt;

&lt;p&gt;Back to basics, I thought. Let&amp;#8217;s find out what processor the server&amp;#8217;s using. Then we can develop on that processor and be pretty sure the resulting XSLT will work. So I load up the default XSLT (which is used to create an HTML result) and add the line&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:value-of select=&quot;system-property(&#039;xsl:vendor&#039;)&quot; /&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Save the XSLT, reload the page, and&amp;#8230;&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Internal server error.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Okaaay&amp;#8230; so this is an XSLT processor that doesn&amp;#8217;t support the &lt;code&gt;xsl:vendor&lt;/code&gt; system property. If it doesn&amp;#8217;t support that, I&amp;#8217;m going to have to tread carefully. So let&amp;#8217;s start with something really simple:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet version=&quot;1.0&quot;
   xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;&amp;gt;

&amp;lt;xsl:template match=&quot;/&quot;&amp;gt;
  &amp;lt;xsl:copy-of select=&quot;.&quot; /&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Save the XSLT, reload the page, and&amp;#8230;&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Internal server error.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On a whim, I tried&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot; 
   version=&quot;1.0&quot;&amp;gt;

&amp;lt;xsl:template match=&quot;/&quot;&amp;gt;
  &amp;lt;xsl:copy-of select=&quot;.&quot; /&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;instead. Save the XSLT, reload the page, and&amp;#8230; Success!&lt;/p&gt;

&lt;p&gt;Can you spot the difference? Yes, that&amp;#8217;s right: it&amp;#8217;s the order of the XSLT namespace declaration and the version attribute. Namespace declaration first, you&amp;#8217;re OK, version first, you&amp;#8217;re not.&lt;/p&gt;

&lt;p&gt;Okaaay&amp;#8230; so this is an XSLT processor that doesn&amp;#8217;t support the XML Recommendation (which says that attribute order doesn&amp;#8217;t matter). But heck, why split hairs? At least it&amp;#8217;s working! Now to create some Atom instead:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot; 
   version=&quot;1.0&quot;
   xmlns=&quot;http://www.w3.org/2005/Atom&quot;&amp;gt;

&amp;lt;xsl:template match=&quot;/&quot;&amp;gt;
  &amp;lt;feed /&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Save the XSLT, reload the page, and we&amp;#8217;re back to&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Internal server error&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At least there&amp;#8217;s some &lt;a href=&quot;http://code.google.com/apis/searchappliance/documentation/46/xml_reference.html#results_xslt&quot; title=&quot;Google Search Appliance Documentation: Custom HTML&quot;&gt;documentation&lt;/a&gt; about this one:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;XSL stylesheets that include other files may not be used with the Google search engine. An XSL stylesheet that contains the following tags generates an error result:&lt;/p&gt;
  
  &lt;ul&gt;
  &lt;li&gt;&lt;code&gt;&amp;lt;xsl:import&amp;gt;&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;&amp;lt;xsl:include&amp;gt;&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;xmlns:&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;document()&lt;/code&gt;&lt;/li&gt;
  &lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Read that again. Yes, the third bullet point. That&amp;#8217;s right, it&amp;#8217;s saying that an XSLT that contains a namespace declaration will generate an error result because it &amp;#8220;includes other files&amp;#8221;.&lt;/p&gt;

&lt;p&gt;But, but, but, namespace declarations in XSLT stylesheets (or elsewhere for that matter) do not indicate file inclusion. Namespace URIs are &lt;em&gt;identifiers&lt;/em&gt;, not &lt;em&gt;locations&lt;/em&gt;. They are strings. They are not resolved. You do not need to be connected to the &amp;#8216;net to use them.&lt;/p&gt;

&lt;p&gt;And how am I supposed to serve an Atom feed, since Atom documents use a namespace? Or XHTML for that matter? Fortunately, the GSA only goes so far in banning namespace declarations: you&amp;#8217;re OK as long as you don&amp;#8217;t put them on the &lt;code&gt;&amp;lt;xsl:stylesheet&amp;gt;&lt;/code&gt; element. Moving it to the &lt;code&gt;&amp;lt;feed&amp;gt;&lt;/code&gt; element as in&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot; 
   version=&quot;1.0&quot;&amp;gt;

&amp;lt;xsl:template match=&quot;/&quot;&amp;gt;
  &amp;lt;feed xmlns=&quot;http://www.w3.org/2005/Atom&quot; /&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and you&amp;#8217;re OK. Of course you have to repeat the namespace declaration in every template so you don&amp;#8217;t end up creating elements in no namespace. Tedious, oh so tedious, but workable.&lt;/p&gt;

&lt;p&gt;(I have a vague suspicion that the idea behind banning namespace declarations is something to do with certain XSLT processors using namespace URIs to pull in Java classes. But addressing that problem by banning namespace declarations entirely isn&amp;#8217;t just throwing the baby out with the bathwater, it&amp;#8217;s throwing the whole bathroom suite out of the window. And if you then allow namespace declarations further down the stylesheet, you haven&amp;#8217;t actually solved the problem.)&lt;/p&gt;

&lt;p&gt;Amazingly enough, given the inauspicious beginning, everything else I tried actually worked. I suspect that it&amp;#8217;s some standard XSLT processor underneath with a regex based filter that (among other things) limits what&amp;#8217;s allowed in the &lt;code&gt;&amp;lt;xsl:stylesheet&amp;gt;&lt;/code&gt; start tag. They probably disallow &lt;code&gt;system-property(&#039;xsl:vendor&#039;)&lt;/code&gt; for security &amp;#8212; knowledge is power, after all.&lt;/p&gt;

&lt;p&gt;Anyway, my suggestions to others who might want to create a customised XSLT processor:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use a custom URL resolver to restrict access to documents.&lt;/li&gt;
&lt;li&gt;Restrict external function calls using something like the &lt;code&gt;ALLOW_EXTERNAL_FUNCTIONS&lt;/code&gt; property in JAXP&lt;/li&gt;
&lt;li&gt;Document the restrictions you&amp;#8217;re placing on the stylesheets.&lt;/li&gt;
&lt;li&gt;Produce meaningful error messages that explain the extra restrictions when they&amp;#8217;re broken.&lt;/li&gt;
&lt;/ol&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/64#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/5">xslt</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/19">google</category>
 <pubDate>Fri, 23 Nov 2007 22:22:19 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">64 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>XTech 2007: Thursday 17th May Afternoon</title>
 <link>http://www.jenitennison.com/blog/node/21</link>
 <description>&lt;p&gt;&lt;strong&gt;UPDATE:&lt;/strong&gt; Dare Obasanjo has written &lt;a href=&quot;http://www.25hoursaday.com/weblog/2007/06/09/WhyGDataAPPFailsAsAGeneralPurposeEditingProtocolForTheWeb.aspx&quot; title=&quot;Why GData/APP Fails as a General Purpose Editing Protocol for the Web&quot;&gt;an interesting critique&lt;/a&gt; on using the &lt;a href=&quot;http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-15.html&quot; title=&quot;Atom Publishing Protocol (v15)&quot;&gt;Atom Publishing Protocol&lt;/a&gt; as the basis for general purpose sharing of data in the way that the &lt;a href=&quot;http://code.google.com/apis/gdata/index.html&quot; title=&quot;Google Data API&quot;&gt;Google Data API&lt;/a&gt; does.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Thursday afternoon had a few really interesting talks. I learned about the Google Data API (no longer called gData); Oracle&amp;#8217;s use of XLink to represent relationships between documents, and the requirements that entails; using XSLT to create JSON to use Exhibit widgets; and using XMPP to enhance instant messaging.&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/33&quot; title=&quot;Google Data API (Talk)&quot;&gt;Google Data API&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;Frank Mantek&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;http://code.google.com/apis/gdata/index.html&quot; title=&quot;Google Data API&quot;&gt;Google Data API&lt;/a&gt; is the unified API that Google offers to all its services, such as Google Base, Blogger, Google Calendar, Google Spreadsheets and so on.&lt;/p&gt;

&lt;p&gt;Frank talked about how awful SOAP/WSDL is, in particular how two services developed in different platforms can&amp;#8217;t talk to each other (which one might imagine is rather the point of Web Services). (Later, when challenged by a Microsoft guy about this claim, he revealed that he&amp;#8217;d been a major developer of the SOAP/WSDL stuff at Microsoft, so knew exactly what he was talking about from bitter experience.)&lt;/p&gt;

&lt;p&gt;So the Google Data API is a RESTful API, using the &lt;a href=&quot;http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-15.html&quot; title=&quot;Atom Publishing Protocol (v15)&quot;&gt;Atom Publishing Protocol&lt;/a&gt; with a few additions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extra data model&lt;/li&gt;
&lt;li&gt;querying&lt;/li&gt;
&lt;li&gt;concurrency control&lt;/li&gt;
&lt;li&gt;extra authentication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What this basically means is that you can query any of the Google services using HTTP, and get back an Atom document. The URI can contain queries (the precise nature of which depend on the service; &lt;a href=&quot;http://base.google.com/&quot; title=&quot;Google Base&quot;&gt;Google Base&lt;/a&gt;, for example, uses a single URI request parameter that has a complex internal query syntax), and you get back the feed with the items that you&amp;#8217;d requested. The Atom items themselves have the basic Atom elements, but then a bunch of service-specific elements that provide the extra information you need.&lt;/p&gt;

&lt;p&gt;Listening to this talk I finally got what &lt;a href=&quot;http://www.tbray.org/ongoing/&quot; title=&quot;ongoing&quot;&gt;Tim Bray&lt;/a&gt; was talking about at the &lt;a href=&quot;http://www.xmlsummerschool.com/&quot; title=&quot;XML Summer School, Oxford&quot;&gt;XML Summer School&lt;/a&gt; a couple of years ago: REST gives us verbs and Atom gives us objects and lists of objects. I didn&amp;#8217;t get it before, because, after all, aren&amp;#8217;t all XML documents objects? But I think the point is that Atom has a lot of the mechanics that you need for talking about objects built into it, and the extensibility necessary for adding your own information to it (which is what each of Google&amp;#8217;s services are doing).&lt;/p&gt;

&lt;p&gt;The really interesting part of the talk was where Frank started talking about what the problems (still) are. The problems I noted were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Atom&amp;#8217;s verbose&lt;/li&gt;
&lt;li&gt;Google have to use &lt;code&gt;&amp;lt;category&amp;gt;&lt;/code&gt; to indicate the kind of thing they&amp;#8217;re representing (as opposed to using the document element which is what you&amp;#8217;d do with normal XML documents)&lt;/li&gt;
&lt;li&gt;the &lt;code&gt;rel&lt;/code&gt; attribute is too vague&lt;/li&gt;
&lt;li&gt;they made up their own markup languages, rather than reusing existing standards&lt;/li&gt;
&lt;li&gt;they should be using &lt;a href=&quot;http://en.wikipedia.org/wiki/HTTP_ETag&quot; title=&quot;Wikipedia: HTTP ETags&quot;&gt;ETags&lt;/a&gt; for concurrency control&lt;/li&gt;
&lt;li&gt;they haven&amp;#8217;t got any versioning (eek)&lt;/li&gt;
&lt;li&gt;incremental updates are a problem; they don&amp;#8217;t want to serve the whole Atom feed (to a mobile device) when only a small amount has changed, so what they do is have several feeds, each of which reveals a different part of the information&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/81&quot; title=&quot;From Trees to Graphs: Evolving XML for building enterprise applications&quot;&gt;From Trees to Graphs: Evolving XML for building enterprise applications&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;Ravi Murthy&lt;/h3&gt;

&lt;p&gt;Ravi Murthy talked about the provision for defining links between documents in &lt;a href=&quot;http://www.oracle.com/&quot; title=&quot;Oracle&quot;&gt;Oracle&lt;/a&gt;&amp;#8217;s database, and their consequent requirements. Information Oracle&amp;#8217;s XML database has a file system abstraction (every XML &amp;#8216;object&amp;#8217; has a file path) with access control, versioning, metadata and protocol access. Within an XML &amp;#8216;object&amp;#8217; stored in the database, they use XLink to represent the relationships with other objects. When you export the XML, the XLinks get resolved to create the XML document.&lt;/p&gt;

&lt;p&gt;Using XLink to represent relationships between documents brings a whole new set of constraints that you might want to express in a schema language, or annotations that you can use to describe the links (depending on how you look at it):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;type&lt;/strong&gt; of the linked resource (eg the document element&amp;#8217;s name, substitution group or XSD type)&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;scope&lt;/strong&gt; of a particular reference, similar to the scoping of XSD&amp;#8217;s identity constraints&lt;/li&gt;
&lt;li&gt;That a particular link is &lt;strong&gt;acyclic&lt;/strong&gt; (eg, given an XPath expression, keep evaluating it and make sure you never get back to where you started)&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;kind&lt;/strong&gt; of a link, one of:
&lt;ul&gt;&lt;li&gt;&lt;strong&gt;hard&lt;/strong&gt;: the target of the link must exist, and cannot be deleted while this resource exists (but can be renamed) &amp;#8212; these are similar to links in normal databases&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;symbolic&lt;/strong&gt;: trust the file path specified by the link and only resolve it on demand&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;weak&lt;/strong&gt;: like a hard link, except the target can be deleted, in which case the link becomes symbolic&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;versioning&lt;/strong&gt; of a link, whether it points to the &amp;#8220;current&amp;#8221; version of a resource or a specific version&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These extra constraints are expressed as annotations on the definitions of &lt;code&gt;xlink:href&lt;/code&gt; attributes in XSD schemas for the documents held in the database.&lt;/p&gt;

&lt;p&gt;Ravi also talked a bit about expressing decomposition rules: how an XML document should be shredded when it gets put into the database. They use XPath to specify rules that indicate that particular elements should be placed at a particular filepath.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;I was really flattered in the tea break. Chatting with a guy called &lt;a href=&quot;http://philwilson.org/blog/&quot; title=&quot;Phil&#039;s Blog&quot;&gt;Phil&lt;/a&gt; working at the University of Bath, who politely asked about my presentation, and after I&amp;#8217;d explained how it was all to do with overlapping markup and that kind of hard-core theory he said: &amp;#8220;You don&amp;#8217;t &lt;em&gt;look&lt;/em&gt; like a markup geek&amp;#8221;. Me: &amp;#8220;What, because I&amp;#8217;m a girl?&amp;#8221;. Him: &amp;#8220;No, no, that&amp;#8217;s not what I meant. You just look more Web 2.0-ey.&amp;#8221; &lt;a href=&quot;http://lapin-bleu.net/riviera/&quot; title=&quot;Max&#039;s Blog&quot;&gt;Max&lt;/a&gt; was there at the time, and labelled me &amp;#8220;the Geekess of XSLT&amp;#8221;, which I think clarified things. (Actually most of the people at XTech this year were Web 2.0-ey rather than markup geeks, but I&amp;#8217;m glad I &lt;em&gt;looked&lt;/em&gt; as though I fitted in.) &lt;/p&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/155&quot; title=&quot;XML-powered Exhibit: A Case Study of JSON &amp;amp; XML Coexistence&quot;&gt;XML-powered Exhibit: A Case Study of JSON &amp;amp; XML Coexistence&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://metacognition.info/&quot; title=&quot;Chimezie Ogbuji&#039;s Website&quot;&gt;Chimezie Ogbuji&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;&amp;#8220;What&amp;#8217;s &lt;a href=&quot;http://simile.mit.edu/wiki/Exhibit&quot; title=&quot;Exhibit Wiki&quot;&gt;Exhibit&lt;/a&gt;?&amp;#8221; I hear you ask. Or maybe you&amp;#8217;re more with-it than I am, but that&amp;#8217;s what I was asking. Chimezie never really explained, but I kinda gathered that it&amp;#8217;s a funky AJAX toolset for creating views of data by importing scripts and using magical IDs and extension attributes within web pages. The other phrase that Chimezie dropped in was &lt;a href=&quot;http://www.w3.org/TR/backplane/&quot; title=&quot;Rich Web Application Backplane&quot;&gt;Rich Web Application Backplane&lt;/a&gt;, which again I hadn&amp;#8217;t heard of. Even having read the W3C Note, I still don&amp;#8217;t get it. Ho hum.&lt;/p&gt;

&lt;p&gt;Anyway, Chimezie made the point that while entering data using XForms is great, it&amp;#8217;s too heavy-weight for viewing that data. Exhibit gives a lot more flexibility (take a look at the &lt;a href=&quot;http://simile.mit.edu/exhibit/examples/presidents/presidents.html&quot; title=&quot;US Presidents in Exhibit&quot;&gt;US presidents&lt;/a&gt; example), which enables users to explore data more freely. In Exhibit pages, you provide a JSON schema for your data, a number of lenses/views/widgets that you can use to view the data, then you embed the widgets in the HTML page and point it at the data source. The JSON schema indicates the type of a particular property (eg &amp;#8220;country&amp;#8221;), and gives labels for it (including a plural label (&amp;#8220;countries&amp;#8221;) and a reverse label (&amp;#8220;country of&amp;#8221;)) that it uses in the widgets.&lt;/p&gt;

&lt;p&gt;But that requires JSON, right? Chimezie showed how easy it is (and it&amp;#8217;s &lt;em&gt;really&lt;/em&gt; easy) to transform data-oriented XML into JSON using XSLT.&lt;/p&gt;

&lt;p&gt;You know, there are all these cool ways out there for viewing information, I just wish I had some really meaty data to use them on! &lt;a href=&quot;http://simile.mit.edu/timeline/&quot; title=&quot;SIMILE Timelines&quot;&gt;Timelines&lt;/a&gt; are one thing, but I&amp;#8217;d also love to find some data to employ in &lt;a href=&quot;http://www.gapminder.org/&quot; title=&quot;Gapminder&quot;&gt;Gapminder&lt;/a&gt; or even in an interface like the one for &lt;a href=&quot;http://www.philipglass.com/glassengine/&quot; title=&quot;Philip Glass Engine&quot;&gt;the music of Philip Glass&lt;/a&gt;. Perhaps I should just mine &lt;a href=&quot;http://base.google.com/&quot; title=&quot;Google Base&quot;&gt;Google Base&lt;/a&gt;, but I&amp;#8217;d like it to be something personally or collectively useful.&lt;/p&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/97&quot; title=&quot;Real-time user-to-user web with Mozilla and XMPP&quot;&gt;Real-time user-to-user web with Mozilla and XMPP&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://blog.hyperstruct.net/&quot; title=&quot;Massimiliano Mirra&#039;s Website&quot;&gt;Massimiliano Mirra&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;This talk was strong on motivation &amp;#8212; the requirement to enhance basic instant messaging functionality &amp;#8212; and strong on demonstration, with Massimiliano chatting and playing with a pre-programmed bot, but really weak on the technical details. It was only through the post-talk questions that we learned that what we&amp;#8217;d seen was based on &lt;a href=&quot;http://www.xmpp.org/&quot; title=&quot;XMPP Standards Foundation&quot;&gt;XMPP (the Extensible Messaging and Presence Protocol)&lt;/a&gt;, which allowed DOM events to be passed between clients. Have to read the paper if you want to learn more.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/21#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/19">google</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/22">rest</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/4">xtech</category>
 <pubDate>Sun, 27 May 2007 23:03:24 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">21 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>XTech 2007: Thursday 17th May Morning</title>
 <link>http://www.jenitennison.com/blog/node/20</link>
 <description>&lt;p&gt;On Thursday morning, I was down to chair the first session in the &amp;#8220;Core Technologies&amp;#8221; track. Two interesting papers: one on XForms and one on Google Base. Then I snuck on to the &amp;#8220;Applications&amp;#8221; track to hear about scientific Wikis and the trials of managing schema repositories.&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/114&quot; title=&quot;XForms, REST, XQuery... and skimming&quot;&gt;XForms, REST, XQuery&amp;#8230; and skimming&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://internet-apps.blogspot.com/&quot; title=&quot;Mark Birbeck&#039;s Blog&quot;&gt;Mark Birbeck&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;Mark Birbeck, one of the developers of &lt;a href=&quot;http://www.formsplayer.com/&quot; title=&quot;formsPlayer Website&quot;&gt;formsPlayer&lt;/a&gt; (and an invited expert on the XForms and XHTML WGs), discussed the rationale behind using &lt;a href=&quot;http://www.w3.org/MarkUp/Forms/&quot; title=&quot;XForms W3C Page&quot;&gt;XForms&lt;/a&gt;. The only thing that really stood out for me was the fact that he used an XML document to provide the &lt;em&gt;labels&lt;/em&gt; for the form controls (in just the same way as you can use XML documents to provide the &lt;em&gt;data&lt;/em&gt; in the form controls). That was quite neat, and made me think of the different requirements of data entry and data presentation: a topic that returned in &lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/155&quot; title=&quot;XML-powered Exhibit: A Case Study of JSON &amp;amp; XML Coexistence&quot;&gt;Chimezie Ogbuji&amp;#8217;s talk&lt;/a&gt; later that afternoon.&lt;/p&gt;

&lt;p&gt;Another theme here, for me, was the use of declarative programming: you write a form, which is just some XML and leave all the technical stuff about submitting a PUT HTTP request to the XForms player. Mark talked about using &lt;a href=&quot;http://en.wikipedia.org/wiki/WebDAV&quot; title=&quot;Wikipedia: WebDAV&quot;&gt;WebDAV&lt;/a&gt; and &lt;a href=&quot;http://exist.sourceforge.net/&quot; title=&quot;eXist&quot;&gt;eXist&lt;/a&gt; on the server to store the XML documents, and demonstrated using &lt;a href=&quot;http://www.oxygenxml.com/&quot; title=&quot;oXygen XML editor&quot;&gt;&amp;lt;oXygen/&amp;gt;&lt;/a&gt; to load and save documents. Hmm&amp;#8230; I wonder if I should experiment with XForms and that Unicode database browser I was thinking about&amp;#8230;&lt;/p&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/104&quot; title=&quot;Google Base, a mashups database for the REST of us&quot;&gt;Google Base, a mashups database for the REST of us&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;Jeffrey Scudder&lt;/h3&gt;

&lt;p&gt;A very popular, thought-provoking, and slightly disturbing, talk on &lt;a href=&quot;http://base.google.com/&quot; title=&quot;Google Base&quot;&gt;Google Base&lt;/a&gt;. So Google are asking us to upload data on &lt;em&gt;anything&lt;/em&gt; (jobs, personals, cars, etc.) into their huge databases. And then they&amp;#8217;ll serve us back that information (and other people&amp;#8217;s information) in formats such as &lt;a href=&quot;http://en.wikipedia.org/wiki/Atom_(standard)&quot; title=&quot;Wikipedia: Atom&quot;&gt;Atom&lt;/a&gt;, &lt;a href=&quot;http://en.wikipedia.org/wiki/RSS_(file_format)&quot; title=&quot;Wikipedia: RSS&quot;&gt;RSS&lt;/a&gt; and &lt;a href=&quot;http://www.json.org/&quot; title=&quot;JSON&quot;&gt;JSON&lt;/a&gt;, as well as standard web pages.&lt;/p&gt;

&lt;p&gt;The thought-provoking bit, for me, was the fact that they don&amp;#8217;t have any particular schema for each of these kinds of items. Now, I come from a knowledge engineering background where we&amp;#8217;re very into ontologies and creating conceptual models and all that stuff. But Google don&amp;#8217;t bother: you create categories and structure your data the way you want to, and they&amp;#8217;ll serve it back in that way. But they look at &lt;em&gt;all&lt;/em&gt; the data they have their hands on in order to decide how to display and serve information. So, for example, if I define cars with the property &amp;#8216;shade&amp;#8217; but a hundred other people define them with the property &amp;#8216;colour&amp;#8217; then on a feed that includes all our items, we&amp;#8217;ll see the &amp;#8216;colour&amp;#8217; property.&lt;/p&gt;

&lt;p&gt;This is a kind of bottom-up ontology design: the properties of an item are the properties that other people think are important about an item. One thing that surprised me was that it looks like it&amp;#8217;s not very intelligent yet: simple differences in case (like &amp;#8216;color&amp;#8217; vs. &amp;#8216;Color&amp;#8217;) don&amp;#8217;t seem to be detected, so I guess nothing else is. Time to dig out my old research on automated comparison of ontologies&amp;#8230;&lt;/p&gt;

&lt;p&gt;The slightly disturbing part? Well, Google are trying to get us to upload our data to their servers. And they&amp;#8217;re not putting any limit on how much we upload. One member of the audience asked &amp;#8220;What&amp;#8217;s in it for you?&amp;#8221;; Jeffrey seemed to have a hard time understanding the question and said something like &amp;#8220;Better indexed information means we can give you better information&amp;#8221;, but that doesn&amp;#8217;t really answer the question. Presumably it&amp;#8217;s all about being able to advertise to us better: the more data we upload, the more They know about us, the better targeted Their adverts can be.&lt;/p&gt;

&lt;p&gt;What I found strange was the idea of &lt;em&gt;uploading&lt;/em&gt; data to a &lt;em&gt;central&lt;/em&gt; &lt;em&gt;server&lt;/em&gt;. Surely the whole point of the web is that I put my data on my machine. I don&amp;#8217;t have a problem putting the data together in a nice Atom feed so that Google can index it easily and pointing them at it, but I want to own it, y&amp;#8217;know?&lt;/p&gt;

&lt;p&gt;By the way, one thing that was apparent to me during this talk was how important it is that web pages look good with large font sizes, not just for people with poor eyesight, but also for when you&amp;#8217;re &lt;em&gt;demoing&lt;/em&gt; your cool web applications! The Google Base drop-down menus were impossible to see with increased font size because their height is fixed in pixels.&lt;/p&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/134&quot; title=&quot;An Augmented Wiki for Interactive Scientific Visualization and Evolutionary Collaboration&quot;&gt;An Augmented Wiki for Interactive Scientific Visualization and Evolutionary Collaboration&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://csis.pace.edu/~marchese&quot; title=&quot;Frank Marchese&#039;s Website&quot;&gt;Frank Marchese&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;On to the less well-attended &amp;#8220;Applications&amp;#8221; track. This talk was about supporting scientists (specifically biochemists) in providing side-by-side visualisation (of complex molecules) and textual analysis. Frank talked about a Wiki in which &lt;a href=&quot;http://jmol.sourceforge.net/&quot; title=&quot;Jmol molecule viewer&quot;&gt;Jmol&lt;/a&gt; Java applets for visualising molecules are arranged side-by-side with standard journal articles. The articles themselves have links in them that animate the Jmol visualisation: highlighting particular groups of atoms, moving it to show a particular view, and so on.&lt;/p&gt;

&lt;p&gt;It was kind of neat, as pretty pictures of molecules often are, but I didn&amp;#8217;t think the Wikiness of the whole enterprise was really explored: I got the impression that the textual articles were basically static: you could add comments, but not collaboratively create an article about the molecule. Also, the link between the text and the animation of the molecule was through Javascript, as far as I could tell: I&amp;#8217;d expect a declarative method of defining animations would make it a lot more accessible.&lt;/p&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/176&quot; title=&quot;Real-world metadata registries; sharing concepts, schemas and semantics&quot;&gt;Real-world metadata registries; sharing concepts, schemas and semantics&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://www.ukoln.ac.uk/&quot; title=&quot;UKOLN Website&quot;&gt;Emma Tonkin&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;This talk took me back to the trials of creation of top-down conceptual models, focusing on the definition of metadata schemas. Unfortunately, there was a lot of philosophy and not many practical guidelines in the talk, and I didn&amp;#8217;t get a lot out of it. One thing that Emma touched on, though, was the way that the meaning of a term can change over time, through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extension or generalisation&lt;/li&gt;
&lt;li&gt;narrowing or specialisation&lt;/li&gt;
&lt;li&gt;amelioration (when a term gains approval)&lt;/li&gt;
&lt;li&gt;deterioration or perjoration (when a term gains disapproval)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The latter two are particularly demonstrated by political correctness, whereby terms like &amp;#8220;Eskimo&amp;#8221; fall out of favour and &amp;#8220;Inuit&amp;#8221; becomes more acceptable (all highly culture-specific; see the &lt;a href=&quot;http://en.wikipedia.org/wiki/Eskimo&quot; title=&quot;Wikipedia: Eskimo&quot;&gt;Wikipedia Eskimo page&lt;/a&gt; for more discussion on what term to use).&lt;/p&gt;

&lt;p&gt;The advantage of a principled conceptual model is that the concept itself and the term(s) you use for that concept are loosely coupled, so if a given term falls out of favour or becomes inappropriate, you can always decouple it. On the other hand, bottom-up tagging tends (I think) to have a 1:1 relationship between term and concept, so if the use of terminology changes you might be left with inaccurate tagging of legacy data. Maybe.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/20#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/19">google</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/20">ontologies</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/21">wikis</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/17">xforms</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/4">xtech</category>
 <pubDate>Fri, 25 May 2007 22:34:18 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">20 at http://www.jenitennison.com/blog</guid>
</item>
</channel>
</rss>
