<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jenitennison.com/blog" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>rest</title>
 <link>http://www.jenitennison.com/blog/taxonomy/term/22</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>Opaque URIs != Unreadable URIs</title>
 <link>http://www.jenitennison.com/blog/node/114</link>
 <description>&lt;p&gt;I&amp;#8217;ve been &lt;a href=&quot;http://www.jenitennison.com/blog/node/112&quot;&gt;talking about URIs&lt;/a&gt; a lot recently. One of the things that has bothered me about some of the conversations is the conflation of the concepts of &amp;#8220;opaque URIs&amp;#8221; and &amp;#8220;non-human-readable URIs&amp;#8221;. This is my argument for keeping the concepts separate.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;http://www.w3.org/DesignIssues/Axioms.html#opaque&quot;&gt;opacity of URIs&lt;/a&gt; is an important axiom in web architecture. It states that web applications must not try to pick apart URIs in order to work out information from them. Applications must not, for example, use the fact that a URI has &lt;code&gt;.html&lt;/code&gt; at the end to infer that it resolves to an HTML document. It&amp;#8217;s closely related to &lt;a href=&quot;http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven&quot;&gt;hypertext as engine of application state&lt;/a&gt;, in that opaque URIs should not be generated by web applications either: they must be discovered through links and the submission of forms.&lt;/p&gt;

&lt;p&gt;But this has nothing to do with readability or hackability, both of which are &lt;a href=&quot;http://www.useit.com/alertbox/990321.html&quot;&gt;extremely important for human users&lt;/a&gt;. Readable URIs help human users understand something about the resource that the URI is pointing to. Hackable URIs (by which I mean ones that people might manipulate by altering or removing portions of the path or query) enable human users to locate other resources that they might be interested in.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;Before I go further, a couple of caveats:&lt;/p&gt;

&lt;p&gt;I am not saying that every URI must contain a natural language identifier. An example is the URI for a school, which could include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the name of the school&lt;/li&gt;
&lt;li&gt;the unique reference number for the school&lt;/li&gt;
&lt;li&gt;the record number for the school in the database that is being published on the web&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using the name of the school, as &lt;a href=&quot;http://www.jenitennison.com/blog/node/112&quot;&gt;I&amp;#8217;ve discussed&lt;/a&gt;, is probably a bad idea because of its lack of longevity. Using the record number for the school within the particular database that&amp;#8217;s being published is entirely non-human-readable because there is simply no way of finding out what that would be for a given school. The unique reference number for the school, on the other hand, may be an obscure series of digits, but it is a meaningful one which renders the URI readable and hackable.&lt;/p&gt;

&lt;p&gt;There are also times when uniquely identifying a resource using natural identifiers within the URI leads to incredibly long and complex URIs, in which case the &amp;#8216;human readable&amp;#8217; version isn&amp;#8217;t actually human readable. Introducing non-human-readable components is then the only option.&lt;/p&gt;

&lt;p&gt;Back to my argument:&lt;/p&gt;

&lt;p&gt;Why should URIs support humans doing things that applications must not? Because humans are intelligent. When humans hack a URI, they are aware that they are making a guess, taking a chance and might or might not end up at something useful. If they get a 404, or even more importantly if they get to information about something that they weren&amp;#8217;t expecting, they are intelligent enough to recognise that the chance they took didn&amp;#8217;t pay off. Applications aren&amp;#8217;t intelligent. They can&amp;#8217;t tell the difference between a right guess and a wrong guess, so it&amp;#8217;s best not to let them guess at all.&lt;/p&gt;

&lt;p&gt;Let me give an example. Let&amp;#8217;s say that I&amp;#8217;m creating a URI for a particular house. Here are two possible URIs:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/house/NG9_3HZ/4
http://id.example.org/house/0aef0218
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The first is readable and hackable. A human could change the house number or the postcode. They could remove the house number and expect a list of houses within the postcode. The second is not readable or hackable: there is no way to know what you would get if you changed the identifier within the URI.&lt;/p&gt;

&lt;p&gt;Now it is true that an application accessing a site that used the URIs like the first could create those URIs programmatically whereas it couldn&amp;#8217;t (perhaps) create a URI like the second. But if it did create the URIs programmatically it would be the fault of the application, not the fault of the URI.&lt;/p&gt;

&lt;p&gt;As publishers, it is our responsibility to provide humans URIs that are meaningful and hackable, and to provide applications with the means of creating or identifying these URIs through forms and links. But it is not our responsibility to prevent applications from doing things that they should not do by deliberately obfuscating our URIs.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/114#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/22">rest</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/48">uri</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/12">web</category>
 <pubDate>Sat, 25 Jul 2009 20:41:34 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">114 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>My Perfect XML-Based Publishing Platform</title>
 <link>http://www.jenitennison.com/blog/node/105</link>
 <description>&lt;p&gt;For the last several months, I&amp;#8217;ve been working on a project at &lt;a href=&quot;http://www.tso.co.uk/&quot;&gt;TSO&lt;/a&gt; for publishing &lt;a href=&quot;http://www.opsi.gov.uk/legislation&quot;&gt;UK legislation&lt;/a&gt; using a native XML database (eg &lt;a href=&quot;http://www.exist-db.org/&quot;&gt;eXist&lt;/a&gt; or &lt;a href=&quot;http://www.marklogic.com/&quot;&gt;MarkLogic Server&lt;/a&gt;) with some middleware (eg &lt;a href=&quot;http://www.orbeon.com/&quot;&gt;Orbeon&lt;/a&gt; or &lt;a href=&quot;http://cocoon.apache.org/&quot;&gt;Cocoon&lt;/a&gt;). It&amp;#8217;s a powerful and flexible approach that&amp;#8217;s built on declarative languages like XQuery, XSLT, and XML pipelines; you can see it in action with the &lt;a href=&quot;http://sandbox.opsi.gov.uk/&quot;&gt;Command and House Papers demo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But the killer platform isn&amp;#8217;t quite here yet, partly because the specs aren&amp;#8217;t quite done. Both Orbeon and Cocoon use XML pipelines, but they use different languages to define them; &lt;a href=&quot;http://www.w3.org/TR/xproc/&quot;&gt;XProc&lt;/a&gt; is just around the corner. XML databases are all over the place in their conformance to XQuery, its optional features and the not-quite-finalised specs for free-text searching and updating.&lt;/p&gt;

&lt;p&gt;People talk about how productive you can be using &lt;a href=&quot;http://rubyonrails.org/&quot;&gt;Ruby on Rails&lt;/a&gt; or &lt;a href=&quot;http://www.djangoproject.com/&quot;&gt;Django&lt;/a&gt;, and they work great for publishing data you can store in a relational database. What &lt;em&gt;we&lt;/em&gt; need is a similarly easy-to-use platform for document-oriented, XML-based content. This is my wish-list.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;The killer platform would have a configuration mechanism for mapping HTTP requests that it receives onto XProc pipelines. The pipeline that would be used could be based on one or more of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the HTTP method&lt;/li&gt;
&lt;li&gt;the requested URI&lt;/li&gt;
&lt;li&gt;any HTTP header&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pipelines would have a signature like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a primary &amp;#8216;source&amp;#8217; input that encodes the HTTP method, headers and body of the request; this would use the &lt;code&gt;&amp;lt;c:request&amp;gt;&lt;/code&gt; element used within the &lt;a href=&quot;http://www.w3.org/XML/XProc/docs/langspec.html#c.http-request&quot;&gt;&lt;code&gt;p:http-request&lt;/code&gt; XProc step&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;a parameter input populated by parsing the URI against a simplified version of &lt;a href=&quot;http://tools.ietf.org/html/draft-gregorio-uritemplate-03&quot;&gt;URI templates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;a primary &amp;#8216;result&amp;#8217; output that is an XML version of the response body&lt;/li&gt;
&lt;li&gt;a &amp;#8216;response&amp;#8217; output that encodes the HTTP status code and headers of the response; this would use the &lt;code&gt;&amp;lt;c:response&amp;gt;&lt;/code&gt; element used within the &lt;code&gt;p:http-request&lt;/code&gt; XProc step&lt;/li&gt;
&lt;li&gt;a &amp;#8216;serialize&amp;#8217; output that holds a &lt;code&gt;&amp;lt;c:parameters&amp;gt;&lt;/code&gt; element containing parameters for serializing the result body; possible serialisations would include serialising XSL-FO as PDF and SVG as JPEG, for example.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pipeline engine would of course include efficient implementations of all the required steps, most importantly XSLT 2.0.&lt;/p&gt;

&lt;p&gt;The platform would have an easy mechanism for invoking queries on its XML store through an implementation-defined step that was similar to the &lt;a href=&quot;http://www.w3.org/XML/XProc/docs/langspec.html#c.xquery&quot;&gt;&lt;code&gt;p:xquery&lt;/code&gt; XProc step&lt;/a&gt;. The step might have the signature:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a primary &amp;#8216;query&amp;#8217; input for the query itself (like the &amp;#8216;query&amp;#8217; input for &lt;code&gt;p:xquery&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;a parameter input for specifying the values of external variables within the query&lt;/li&gt;
&lt;li&gt;a &amp;#8216;database&amp;#8217; option for specifying the database to query&lt;/li&gt;
&lt;li&gt;a primary &amp;#8216;result&amp;#8217; output for the result of the query, this being a sequence of documents resulting from the query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The XML store itself would support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://www.w3.org/TR/xquery/&quot;&gt;XQuery 1.0&lt;/a&gt;, with no extensions to the syntax except those permitted by that specification&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.w3.org/TR/xpath-full-text-10/&quot;&gt;XQuery and XPath Full Text 1.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.w3.org/TR/xquery-update-10/&quot;&gt;XQuery Update Facility 1.0&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It would also support setting up indexes on any expression for a particular kind of context node (usually an element); these would work like keys in XSLT, except that the XQuery engine would automatically detect when the index could be applied. For example, it would be possible to set up a key on a document for the expression &lt;code&gt;substring(//dc:identifier, 7, 2)&lt;/code&gt; and if the query used exactly this expression, the index would be used.&lt;/p&gt;

&lt;p&gt;The platform would provide an extensible architecture such that it would be possible to set up replicated XML store(s) on separate servers from the main pipeline engine. It would cache the results of queries against the XML store. It would serve up static content such as images and scripts bypassing the pipeline. It would be configured using files, so that it was easy to transfer a configuration between development and production platforms and to version control configurations through normal means.&lt;/p&gt;

&lt;p&gt;Have you used (or developed!) anything that comes close? What&amp;#8217;s on your wish-list?&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/105#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/6">pipelines</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/22">rest</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/29">xquery</category>
 <pubDate>Fri, 29 May 2009 21:40:20 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">105 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Your Website is Your API: Quick Wins for Government Data</title>
 <link>http://www.jenitennison.com/blog/node/100</link>
 <description>&lt;p&gt;&lt;em&gt;This is the talk I prepared for the UKGovWeb Barcamp, in blog form. It&amp;#8217;s probably better this way. Most of what&amp;#8217;s written here seems blindingly obvious to me, and probably to most readers of this blog, but maybe Google will direct someone here who finds it useful.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Working with public-sector information on the web, one of the things that I take an interest in is making government data freely available for anyone to re-present, mash-up, analyse and generally do whatever they want to do. This post is born out of a feeling that the people who control data don&amp;#8217;t realise that the smallest changes can be beneficial: they don&amp;#8217;t need to do &lt;em&gt;everything&lt;/em&gt; right now, just &lt;em&gt;something&lt;/em&gt;.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;There are three fundamental things that you need to do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;identify&lt;/strong&gt; the data that you control&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;represent&lt;/strong&gt; that data in a way that people can use&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;expose&lt;/strong&gt; the data to the wider world&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;but you can choose the degree to which you do each of these things.&lt;/p&gt;

&lt;h2&gt;Identify&lt;/h2&gt;

&lt;p&gt;Take a look at what data you have some kind of responsibility for or control over. You might be a PDF containing a table of schools in the local area and their intakes over the last couple of years. You might have a spreadsheet of the amount of money assigned to maintaining the playgrounds within the borough. You might have a database of company information. You might have a set of HTML agendas for court cases.&lt;/p&gt;

&lt;p&gt;The first step is simply to identify what the information is &lt;em&gt;about&lt;/em&gt;. Schools, playgrounds, companies, court cases &amp;#8212; each row in your table or spreadsheet or database, or each section in your document will be about something. We call this a &lt;strong&gt;resource&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To play nicely with the web, every resource should have an &lt;strong&gt;identifier&lt;/strong&gt;. A Uniform Resource Identifier. A URI. That URI tells us where we can find information about the resource (we&amp;#8217;ll get to what those look like later). So your second step is to work out URIs for each of your resources.&lt;/p&gt;

&lt;p&gt;Now, there are actually three levels of URIs that you can care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;identifier URIs&lt;/li&gt;
&lt;li&gt;document URIs&lt;/li&gt;
&lt;li&gt;representation URIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You probably already have document and representation URIs on your web server. Representation URIs are URIs for particular formats and languages and views of the information that you make available. Document URIs are typically the same URI without an extension; web servers use &lt;strong&gt;content negotiation&lt;/strong&gt; to work out which representation to serve up when a web browser asks for the page at a particular document URI.&lt;/p&gt;

&lt;p&gt;So you already have a URI for the PDF that contains the table of schools, for the Excel spreadsheet about the playgrounds. You already have URIs for the results of a particular query on your database, and of course the HTML pages that you deliver have URIs already. That&amp;#8217;s all in place. You don&amp;#8217;t want to change it.&lt;/p&gt;

&lt;p&gt;But identifier URIs are what are really important when it comes to opening up your data. They shift the focus from the documents that you serve to the resources that they are about. &lt;strong&gt;By assigning URIs to resources, you enable other people to talk about them. Even if that&amp;#8217;s all you do, you have done good.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For example, if &lt;a href=&quot;http://www.companieshouse.co.uk/&quot; title=&quot;Companies House&quot;&gt;Companies House&lt;/a&gt; stated that companies could be referred to using URIs of the form &lt;code&gt;http://www.companieshouse.co.uk/id/company/{registeredNumber}&lt;/code&gt; then other people who needed to talk about companies (websites containing customer feedback, monitoring companies going into receivership, displaying stock price information, whatever) could use these URIs whenever they referred to a company. If all websites that make data available about companies point to the same identifier for a company, then it&amp;#8217;s possible to pull that data together very easily.&lt;/p&gt;

&lt;p&gt;Now the URIs that you use should be short, clean, readable, hackable, hierarchical and so on. If you can, &lt;strong&gt;you should use a natural identifier for the resource within the URI for that resource&lt;/strong&gt;. So URIs for registered companies should use their registered number. URIs for schools should use the school&amp;#8217;s unique reference number (URN). URIs for playgrounds could use the name of the playground (scoped within the council responsible for the playground). URIs for court cases should include the court, the year, and the case number. And so on.&lt;/p&gt;

&lt;p&gt;Remember as you&amp;#8217;re creating these identifier URIs that they are nothing to do with the structure of your website or the user&amp;#8217;s experience of navigating through your website. For navigation, you might want to group schools into primary, secondary and sixth-form, but you shouldn&amp;#8217;t do that in the identifier URIs. To help decide, imagine someone wanting to construct a URI and the information that they need to do so. If any of the information they need can be derived from other information (as a school&amp;#8217;s type can be derived from its URN), leave it out.&lt;/p&gt;

&lt;p&gt;When you&amp;#8217;re doing this, you might realise that actually you shouldn&amp;#8217;t be the one in control of these URIs. If you&amp;#8217;re not the one assigning the registered number, URN or case number then there&amp;#8217;s probably a higher authority that does assign those (real-world) identifiers. Don&amp;#8217;t let that stop you creating URIs &amp;#8212; you&amp;#8217;ll still find them useful for identifying &lt;em&gt;your&lt;/em&gt; information about that particular resource &amp;#8212; but do look to see if there are existing URIs that you could point to and reuse whatever scheme they&amp;#8217;re using if there are.&lt;/p&gt;

&lt;h2&gt;Represent&lt;/h2&gt;

&lt;p&gt;So I said in the last section that assigning URIs to resources was useful. And it is. But it&amp;#8217;s even more useful if you provide some kind of response when someone &lt;strong&gt;requests&lt;/strong&gt; those URIs. A request for a URI can be done by a web browser or one of those search-engine-spider-things that crawls the web looking for data. Requests are done on the web using HTTP (hypertext transfer protocol), specifically using a &lt;strong&gt;GET&lt;/strong&gt; request, which means &amp;#8220;get this resource&amp;#8221;.&lt;/p&gt;

&lt;p&gt;When a web server receives a request, it sends back a &lt;strong&gt;response&lt;/strong&gt;. The first part of the response is a &lt;strong&gt;status code&lt;/strong&gt; that tells the browser, spider, or whatever issued the request, generally what kind of response it is. Now when a browser says &amp;#8220;get this company&amp;#8221; or &amp;#8220;get this school&amp;#8221; a web server should either respond with a &lt;code&gt;404 Not Found&lt;/code&gt; response or a &lt;code&gt;303 See Other&lt;/code&gt; response.&lt;/p&gt;

&lt;p&gt;If the company or school doesn&amp;#8217;t exist, a web server should respond with a &lt;code&gt;404 Not Found&lt;/code&gt; response. It&amp;#8217;s actually really useful to give appropriate &lt;code&gt;404 Not Found&lt;/code&gt; responses, because it tells whoever made the request that the resource (company/school/playground/court case) doesn&amp;#8217;t exist. This can act as simple validation: if I&amp;#8217;m building a site that parents can use to rate schools, and a parent enters a URN into a form, I can construct a URI based on that URN, try to GET the information about that school, and if I get a &lt;code&gt;404 Not Found&lt;/code&gt; response then I know that the parent has entered an invalid URN.&lt;/p&gt;

&lt;p&gt;If the company or school exists, a web server should respond with a &lt;code&gt;303 See Other&lt;/code&gt; response that points the browser to a &lt;em&gt;document URI&lt;/em&gt; that contains information about the company or school. After all, the web server can&amp;#8217;t very well deliver the company or school itself into your lap; all it can do is give you &lt;em&gt;information&lt;/em&gt; about it. &lt;code&gt;303 See Other&lt;/code&gt; means &amp;#8220;if you want information about that, see that other thing over there instead&amp;#8221;. The &amp;#8220;other thing over there&amp;#8221; will be a document of some kind. It might be the PDF that contains information about the school, or the spreadsheet that contains information about the playground.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simply giving a yes-this-exists or no-this-doesn&amp;#8217;t-exist response is useful. Even if that&amp;#8217;s all you do, you have done good.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s even more useful, though, if you can make the information that you have about the school, playground, company, court case or whatever, available in a format that can be processed by a computer reasonably easily. PDFs are really really hard to extract information from, so do everything you can not to use PDFs. Word documents and Excel spreadsheets are next worse; if you have to use them, keep them really really simple and definitely don&amp;#8217;t use Word Art or embed images to display your data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You should always make your data available in HTML.&lt;/strong&gt; Try to make it as clean and regular as you can; use &lt;a href=&quot;http://www.microformats.org/&quot; title=&quot;microformats&quot;&gt;microformats&lt;/a&gt; to indicate information about people, places and events. If you want to push the boat out, use &lt;a href=&quot;http://www.w3.org/TR/xhtml-rdfa-primer/&quot; title=&quot;W3C: RDFa Primer&quot;&gt;RDFa&lt;/a&gt; to mark up the data in your page even more explicitly.&lt;/p&gt;

&lt;p&gt;The great thing about HTML is that it&amp;#8217;s human readable as well as (if you do it well) machine readable. You can also make your data available in explicitly machine-readable forms as well if you want: XML, JSON, RDF/XML, whatever floats your boat. If there are already standard formats or ontologies for the kind of data that you&amp;#8217;re making available, then use them, certainly, but it&amp;#8217;s very likely that there aren&amp;#8217;t. And in comparison to the nightmare of extracting anything useful from a PDF, it&amp;#8217;s easy to transform between different formats, so you only have to concern yourself with different formats if you want to.&lt;/p&gt;

&lt;p&gt;If you do provide multiple formats for your data, you should use server-driven content negotiation to deliver the data in an appropriate format to whatever&amp;#8217;s requesting it. So a web browser will request HTML; a semantic web crawler will request RDF/XML; a Javascript program will request JSON and so on. The &lt;code&gt;200 OK&lt;/code&gt; response that the web server sends with your data should include a &lt;code&gt;Content-Location&lt;/code&gt; header that gives the representation URI of whichever format is being returned, and a &lt;code&gt;Vary&lt;/code&gt; header that tells caches how it&amp;#8217;s decided which representation to serve up.&lt;/p&gt;

&lt;h2&gt;Expose&lt;/h2&gt;

&lt;p&gt;All the good work identifying resources and representing them comes to naught if you don&amp;#8217;t expose it. You can (and should!) tell other people about the URIs that you&amp;#8217;ve developed, but the best way to give them exposure is to use them yourself, within your website. &lt;strong&gt;Simply using the URIs within your website gives them exposure. Even if that&amp;#8217;s all you do, you have done good.&lt;/strong&gt; People who are interested in linking to you will look at your site and they will learn about your URI scheme from your use of it.&lt;/p&gt;

&lt;p&gt;The identifier URIs that you&amp;#8217;ve created might not be particularly easy to generate. For example, with the URI scheme that I suggested above for Companies House, unless you happen to know that Tesco Plc&amp;#8217;s registered company number is &lt;code&gt;00445790&lt;/code&gt;, you&amp;#8217;re not going to be able to get to information about them. So &lt;strong&gt;you should have a way of searching&lt;/strong&gt; based on something that people &lt;em&gt;will&lt;/em&gt; know, such as the name of the company. Use an HTML search form that makes GET requests like&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.companieshouse.gov.uk/company?name=Tesco Plc
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The response should be a &lt;code&gt;302 Found&lt;/code&gt; that redirects (using the &lt;code&gt;Location&lt;/code&gt; header) to the true identifier URI for the company (&lt;code&gt;http://www.companieshouse.gov.uk/id/company/00445790&lt;/code&gt;). If it&amp;#8217;s not possible to identify a single resource from the search string (for example, there are lots of companies with &amp;#8216;Tesco&amp;#8217; in their name), then the correct response is a &lt;code&gt;300 Multiple Choices&lt;/code&gt; that provides a list of links to the possible URIs (in HTML).&lt;/p&gt;

&lt;p&gt;There are other ways to help people find your data. If there aren&amp;#8217;t gazillions of resources, you can list the URIs within your &lt;strong&gt;sitemap&lt;/strong&gt;, which will make them discoverable by search engines. You can also list them on web pages and, especially for data that&amp;#8217;s constantly updating, in (Atom) &lt;strong&gt;feeds&lt;/strong&gt; which you link to from your HTML pages. Use metadata within the pages and feeds to help the consumers of your data work out what&amp;#8217;s relevant to them.&lt;/p&gt;

&lt;p&gt;To help even more, slice your Atom feeds into portions that different consumers of your data are going to be interested in. Slice by type, by area, by subject. That way people can stay up to date with just the resources that they&amp;#8217;re interested in, and not be bothered with information about those that are irrelevant to them.&lt;/p&gt;

&lt;h2&gt;That&amp;#8217;s It&lt;/h2&gt;

&lt;p&gt;What I&amp;#8217;ve tried to describe here is the minimum that you need to do to help people use the information you have, and some of the other things that you can do to make it even more useful. Here are some things that you shouldn&amp;#8217;t do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;don&amp;#8217;t wait for someone else to define a URI scheme for the things that you want to talk about&lt;/li&gt;
&lt;li&gt;don&amp;#8217;t wait for someone else to define an XML schema or RDF ontology for your data&lt;/li&gt;
&lt;li&gt;don&amp;#8217;t wait until you can find the time and money to do it all &amp;#8220;properly&amp;#8221;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just do what you can, now.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/100#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/42">rdfa</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/22">rest</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/43">ukgc09</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/12">web</category>
 <pubDate>Sun, 01 Feb 2009 09:28:57 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">100 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>URL design for searches and queries</title>
 <link>http://www.jenitennison.com/blog/node/47</link>
 <description>&lt;p&gt;Another &lt;a href=&quot;http://www.dehora.net/journal/2007/08/web_resource_mapping_criteria_for_frameworks.html&quot; title=&quot;Bill de hÓra: Web resource mapping criteria for frameworks&quot;&gt;fascinating post from Bill de hÓra&lt;/a&gt;, this time on URL design for resources:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Let&amp;#8217;s take editing some resource, like a document, and let&amp;#8217;s look at browsers and HTML forms in particular, which don&amp;#8217;t a do a good job of allowing you to cleanly affect resource state. What you would like to do in this suboptimal environment is provide an &amp;#8220;edit-uri&amp;#8221; of some kind. There are basically 5 options for this; here they are going from most to least desirable&lt;/p&gt;
  
  &lt;ol&gt;
  &lt;li&gt;Uniform method. Alter the state by sending a PUT to the document&amp;#8217;s URL. The edit-uri is the resource URL. URL format: http://example.org/document/xyz&lt;/li&gt;
  &lt;li&gt;Function passing. Allow the document resource to accept a function as an argument. URL format: http://example.org/document/xyz?f=edit&lt;/li&gt;
  &lt;li&gt;Surrogate. Create another resource that will accept edits on behalf of the document. URL format: http://example.org/document/xyz/edit&lt;/li&gt;
  &lt;li&gt;CGI/RPC explicit: send a POST to an &amp;#8220;edit-document&amp;#8221; script passing the id of the document as a argument. URL format: http://example.org/edit-document?id=xyz&lt;/li&gt;
  &lt;li&gt;CGI/RPC stateful: send a POST to an &amp;#8220;edit-document&amp;#8221; script and fetch the id of the document from server state, or a cookie. URL format: http://example.org/edit-document&lt;/li&gt;
  &lt;/ol&gt;
&lt;/blockquote&gt;

&lt;!--break--&gt;

&lt;p&gt;My current task at work is to look at how to add &lt;a href=&quot;http://www.w3.org/TR/xhtml-rdfa-primer/&quot; title=&quot;W3C: RDFa Primer&quot;&gt;RDFa&lt;/a&gt; to a website that is completely driven by &amp;#8220;CGI/RPC explicit&amp;#8221; URLs. That includes URLs for the resources themselves, by the way, we&amp;#8217;re not even talking about edit URLs here. Take a look at the URL for &lt;a href=&quot;http://www.statutelaw.gov.uk/content.aspx?LegType=All+Legislation&amp;amp;title=wine&amp;amp;Year=2007&amp;amp;searchEnacted=0&amp;amp;extentMatchOnly=0&amp;amp;confersPower=0&amp;amp;blanketAmendment=0&amp;amp;sortAlpha=0&amp;amp;TYPE=QS&amp;amp;PageNumber=1&amp;amp;NavFrom=0&amp;amp;parentActiveTextDocId=3032571&amp;amp;ActiveTextDocId=3032571&amp;amp;filesize=16218&quot; title=&quot;Statute Law Database Legislation&quot;&gt;this page&lt;/a&gt;, for example (this isn&amp;#8217;t the actual website that I&amp;#8217;m working on, but it&amp;#8217;s more or less the same in terms of URL design). The URL is&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.statutelaw.gov.uk/content.aspx?LegType=All+Legislation&amp;amp;title=wine&amp;amp;Year=2007&amp;amp;searchEnacted=0&amp;amp;extentMatchOnly=0&amp;amp;confersPower=0&amp;amp;blanketAmendment=0&amp;amp;sortAlpha=0&amp;amp;TYPE=QS&amp;amp;PageNumber=1&amp;amp;NavFrom=0&amp;amp;parentActiveTextDocId=3032571&amp;amp;ActiveTextDocId=3032571&amp;amp;filesize=16218
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So here I am trying to construct RDF examples, and all the URLs look like this mess. What URI am I supposed to use in RDF to talk about the resource itself, rather than a particular view (table of contents, actual content, etc) of that resource?&lt;/p&gt;

&lt;p&gt;In this case, the thing that identifies the resource in the URL is the value of the &lt;code&gt;ActiveTextDocId&lt;/code&gt; request parameter: you can do&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.statutelaw.gov.uk/content.aspx?ActiveTextDocId=3032571
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and see the same legislation; this could be mapped to a resource-oriented URL such as&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.statutelaw.gov.uk/legislation/3032571
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;very easily. It isn&amp;#8217;t, but it could.&lt;/p&gt;

&lt;p&gt;However, doing that, you do lose some context about what the original search was that led you to this page. In the case of the above URI, the fact that I searched for all 2007 legislation with &amp;#8220;wine&amp;#8221; in the title gets lost. And this is important because the &lt;a href=&quot;http://www.statutelaw.gov.uk/content.aspx?LegType=All+Legislation&amp;amp;title=wine&amp;amp;Year=2007&amp;amp;searchEnacted=0&amp;amp;extentMatchOnly=0&amp;amp;confersPower=0&amp;amp;blanketAmendment=0&amp;amp;sortAlpha=0&amp;amp;TYPE=QS&amp;amp;PageNumber=1&amp;amp;NavFrom=0&amp;amp;parentActiveTextDocId=3032571&amp;amp;ActiveTextDocId=3032571&amp;amp;filesize=16218#breadcrumb&quot; title=&quot;Breadcrumb on legislation page&quot;&gt;breadcrumb&lt;/a&gt; on the page has to take me back to that original search.&lt;/p&gt;

&lt;p&gt;Now, you could argue that this is bad website design: after all, you can navigate back to a search page using *gasp* the &lt;strong&gt;Back&lt;/strong&gt; button, and not doing so just adds unnecessary items to your history. But what about providing &lt;strong&gt;previous&lt;/strong&gt; and &lt;strong&gt;next&lt;/strong&gt; links for navigating through the items found in a search? There, surely, you do need some state information that indicates how we got to this particular item?&lt;/p&gt;

&lt;p&gt;Well, no. When you&amp;#8217;re navigating through the results of a search, the primary resource that you&amp;#8217;re viewing is the &lt;em&gt;collection&lt;/em&gt; of items that have been identified by the search. Even if you&amp;#8217;re just viewing one of the items in that collection, if the collection still matters then that item should be viewed as just a subresource of the collection.&lt;/p&gt;

&lt;p&gt;In this case, the search has three fields &amp;#8212; title, year and (legislation) number &amp;#8212; so the search URL has three parts after the initial one. The general scheme (using &lt;a href=&quot;http://bitworking.org/projects/URI-Templates/draft-gregorio-uritemplate-01.html&quot; title=&quot;URI Template Internet-Draft&quot;&gt;URI template syntax&lt;/a&gt;) is&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.statutelaw.gov.uk/search/{title}/{year}/{number}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So here, I could use&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.statutelaw.gov.uk/search/wine/2007/_
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;as a URL that would give me a list of all the 2007 legislation with any number whose title contained &amp;#8220;wine&amp;#8221;, and then&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.statutelaw.gov.uk/search/wine/2007/_/1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;would show me the first piece of legislation found in the context of that search, with a &lt;strong&gt;next&lt;/strong&gt; button taking me to&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.statutelaw.gov.uk/search/wine/2007/_/2
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;An individual resource itself could occur in many searches, and thus you would get many URLs for the resource in the context of those particular searches, but that&amp;#8217;s OK so long as there is one URI that identifies the resource itself. You should link to the actual resource from the page, of course, both in a &lt;code&gt;&amp;lt;link&amp;gt;&lt;/code&gt; in the header and a &lt;code&gt;&amp;lt;a&amp;gt;&lt;/code&gt; in the body (it feels like there should be a &amp;#8220;this is the real resource&amp;#8221; value for the &lt;code&gt;rel&lt;/code&gt; attribute of &lt;code&gt;&amp;lt;link&amp;gt;&lt;/code&gt; or &lt;code&gt;&amp;lt;a&amp;gt;&lt;/code&gt;, but I don&amp;#8217;t know of one).&lt;/p&gt;

&lt;p&gt;The kind of URL above works fine when you have a fixed number of fields for searches, but what if you&amp;#8217;re doing more complicated searches: something that requires a proper query language? Well, you can stuff a query language into a URL. See &lt;a href=&quot;http://www.25hoursaday.com/weblog/2007/07/13/GoogleBaseDataAPIVsAstoriaTwoApproachesToSQLlikeQueriesInARESTfulProtocol.aspx&quot; title=&quot;Dare Obasanjo: Google Base Data API vs. Astoria: Two Approaches to SQL-like Queries in a RESTful Protocol&quot;&gt;Dare Obasanjo&amp;#8217;s comparison of Google and Astoria APIs for queries&lt;/a&gt; to see what that looks like.&lt;/p&gt;

&lt;p&gt;Alternatively, several years ago Paul Prescod introduced me to the notion that the query itself is a resource &amp;#8212; it&amp;#8217;s something that you&amp;#8217;ll probably want to save and edit &amp;#8212; and can be assigned a unique identifier in the same way as other resources. So you visit&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.example.com/queries/new
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;to create a new query, which gets assigned the URL&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.example.com/queries/4328
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and you can then visit that page to see a list of the results of the query. Unlike with a simple search, the query parameters themselves don&amp;#8217;t get used in the URL: they&amp;#8217;re stored on the server. So you can&amp;#8217;t hack the URL to change the query, but you do have a simple URL that you can easily share with other people if you want to.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/47#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/22">rest</category>
 <pubDate>Sun, 12 Aug 2007 22:00:00 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">47 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>View-Model-Template</title>
 <link>http://www.jenitennison.com/blog/node/45</link>
 <description>&lt;p&gt;I don&amp;#8217;t know anything about Struts 1, but &lt;a href=&quot;http://www.dehora.net/journal/2007/07/struts_1_problems.html&quot; title=&quot;Bill de hÓra: Struts 1 Problems&quot;&gt;Bill de hÓra&amp;#8217;s recent post&lt;/a&gt; has got some interesting web-application-design tips. There were two particular bits that spoke to me:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;struts-config.xml&lt;/strong&gt; struts-config tries to capture primarily the flow of application state on the server, by being an awkward representation of a call graph. In doing it misses a key aspect of the web - hypertext. In web architecture, HTML hypertext on the client is the engine of application state, not an XML file on the server.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In other words (I think) in web applications your state in the page you&amp;#8217;re on and taking action is about following the links (or submitting the forms) on the page. Your actions (and therefore the transitions between different states) are determined by what links and forms are on the page. But in fact, URLs should be hackable, and transitions unlimited. When you design the application what you really need to think about are the tasks the users want to achieve (and therefore the transitions that they might &lt;em&gt;want&lt;/em&gt; to make) rather than the &lt;em&gt;possible&lt;/em&gt; state transitions.&lt;/p&gt;

&lt;!--break--&gt;

&lt;blockquote&gt;
  &lt;p&gt;On the web, a suitable pattern is View, Model, Template [rather than Model, View, Controller (MVC)]. A request to a URL is dispatched to a View. This View calls into the Model, performs manipulations and prepares data for output. The data is passed to a Template that is rendered an [sic] emitted as a response. ideally [sic] in web frameworks, the controller is hidden from view. Note that this framework style is often called MVC anyway, confusing matters somewhat; The key differences are that Views and Templates are cohesive and Controllers are pushed down into the framework infrastructure.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I&amp;#8217;ve been thinking recently about whether and how XSLT might fit into a &lt;a href=&quot;http://www.rubyonrails.org/&quot; title=&quot;Ruby on Rails&quot;&gt;Ruby on Rails&lt;/a&gt; set-up. In &lt;abbr title=&quot;Ruby on Rails&quot;&gt;RoR&lt;/abbr&gt;, the controller usually either queries the database (via the model) to set up instance variables, and then renders a (template) view, or updates the database (via the model) and redirects to another view. The templates (for (X)HTML) use fairly standard &lt;code&gt;&amp;lt;% ... %&amp;gt;&lt;/code&gt; placeholders to hold code and insert values.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve spent most of my professional life cursing (X)HTML documents with &lt;code&gt;&amp;lt;% ... %&amp;gt;&lt;/code&gt; in, because they use unescaped less-than-signs and therefore can&amp;#8217;t be generated or processed by XML tools, particularly XSLT. There&amp;#8217;s an advantage of having templates that are themselves well-formed, not least that you can easily process the templates themselves (for example to generate, update or document them). Plus if your templates are declarative, rather than containing embedded code, you aren&amp;#8217;t tied to a particular framework: I could move templates from Ruby on Rails to Django and they wouldn&amp;#8217;t need modification. When I think &amp;#8220;declarative templates&amp;#8221;, I think &amp;#8220;XSLT&amp;#8221;.&lt;/p&gt;

&lt;p&gt;The other advantage of using XSLT is that it can be used on the client side as well as the server side. So there&amp;#8217;s the possibility of moving that rendering from one server to client completely or using it on particular clients, perhaps in an AJAX set-up, while having the same stylesheets on the server for those browsers that don&amp;#8217;t support client-side XSLT.&lt;/p&gt;

&lt;p&gt;You still need a way of getting the data from the model into the stylesheet, which can be done through a combination of XML and parameters. The XML is itself a view of the model, of course, but if you&amp;#8217;ve got any kind of intention to make your web application mashable, you&amp;#8217;re going to want to generate XML, probably Atom, anyway (yeah, or JSON, but it&amp;#8217;s easy enough to get from XML to JSON using XSLT too). If you add caching to the equation, this approach might help reduce database requests.&lt;/p&gt;

&lt;p&gt;So I think that using XSLT as a templating language, even within a RoR framework, has at least something going for it. What I hope is that I&amp;#8217;m not falling into the &amp;#8220;when you&amp;#8217;ve got a hammer everything looks like a nail&amp;#8221; trap.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/45#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/5">xslt</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/22">rest</category>
 <pubDate>Fri, 27 Jul 2007 10:02:04 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">45 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>XTech 2007: Thursday 17th May Afternoon</title>
 <link>http://www.jenitennison.com/blog/node/21</link>
 <description>&lt;p&gt;&lt;strong&gt;UPDATE:&lt;/strong&gt; Dare Obasanjo has written &lt;a href=&quot;http://www.25hoursaday.com/weblog/2007/06/09/WhyGDataAPPFailsAsAGeneralPurposeEditingProtocolForTheWeb.aspx&quot; title=&quot;Why GData/APP Fails as a General Purpose Editing Protocol for the Web&quot;&gt;an interesting critique&lt;/a&gt; on using the &lt;a href=&quot;http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-15.html&quot; title=&quot;Atom Publishing Protocol (v15)&quot;&gt;Atom Publishing Protocol&lt;/a&gt; as the basis for general purpose sharing of data in the way that the &lt;a href=&quot;http://code.google.com/apis/gdata/index.html&quot; title=&quot;Google Data API&quot;&gt;Google Data API&lt;/a&gt; does.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Thursday afternoon had a few really interesting talks. I learned about the Google Data API (no longer called gData); Oracle&amp;#8217;s use of XLink to represent relationships between documents, and the requirements that entails; using XSLT to create JSON to use Exhibit widgets; and using XMPP to enhance instant messaging.&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/33&quot; title=&quot;Google Data API (Talk)&quot;&gt;Google Data API&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;Frank Mantek&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;http://code.google.com/apis/gdata/index.html&quot; title=&quot;Google Data API&quot;&gt;Google Data API&lt;/a&gt; is the unified API that Google offers to all its services, such as Google Base, Blogger, Google Calendar, Google Spreadsheets and so on.&lt;/p&gt;

&lt;p&gt;Frank talked about how awful SOAP/WSDL is, in particular how two services developed in different platforms can&amp;#8217;t talk to each other (which one might imagine is rather the point of Web Services). (Later, when challenged by a Microsoft guy about this claim, he revealed that he&amp;#8217;d been a major developer of the SOAP/WSDL stuff at Microsoft, so knew exactly what he was talking about from bitter experience.)&lt;/p&gt;

&lt;p&gt;So the Google Data API is a RESTful API, using the &lt;a href=&quot;http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-15.html&quot; title=&quot;Atom Publishing Protocol (v15)&quot;&gt;Atom Publishing Protocol&lt;/a&gt; with a few additions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extra data model&lt;/li&gt;
&lt;li&gt;querying&lt;/li&gt;
&lt;li&gt;concurrency control&lt;/li&gt;
&lt;li&gt;extra authentication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What this basically means is that you can query any of the Google services using HTTP, and get back an Atom document. The URI can contain queries (the precise nature of which depend on the service; &lt;a href=&quot;http://base.google.com/&quot; title=&quot;Google Base&quot;&gt;Google Base&lt;/a&gt;, for example, uses a single URI request parameter that has a complex internal query syntax), and you get back the feed with the items that you&amp;#8217;d requested. The Atom items themselves have the basic Atom elements, but then a bunch of service-specific elements that provide the extra information you need.&lt;/p&gt;

&lt;p&gt;Listening to this talk I finally got what &lt;a href=&quot;http://www.tbray.org/ongoing/&quot; title=&quot;ongoing&quot;&gt;Tim Bray&lt;/a&gt; was talking about at the &lt;a href=&quot;http://www.xmlsummerschool.com/&quot; title=&quot;XML Summer School, Oxford&quot;&gt;XML Summer School&lt;/a&gt; a couple of years ago: REST gives us verbs and Atom gives us objects and lists of objects. I didn&amp;#8217;t get it before, because, after all, aren&amp;#8217;t all XML documents objects? But I think the point is that Atom has a lot of the mechanics that you need for talking about objects built into it, and the extensibility necessary for adding your own information to it (which is what each of Google&amp;#8217;s services are doing).&lt;/p&gt;

&lt;p&gt;The really interesting part of the talk was where Frank started talking about what the problems (still) are. The problems I noted were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Atom&amp;#8217;s verbose&lt;/li&gt;
&lt;li&gt;Google have to use &lt;code&gt;&amp;lt;category&amp;gt;&lt;/code&gt; to indicate the kind of thing they&amp;#8217;re representing (as opposed to using the document element which is what you&amp;#8217;d do with normal XML documents)&lt;/li&gt;
&lt;li&gt;the &lt;code&gt;rel&lt;/code&gt; attribute is too vague&lt;/li&gt;
&lt;li&gt;they made up their own markup languages, rather than reusing existing standards&lt;/li&gt;
&lt;li&gt;they should be using &lt;a href=&quot;http://en.wikipedia.org/wiki/HTTP_ETag&quot; title=&quot;Wikipedia: HTTP ETags&quot;&gt;ETags&lt;/a&gt; for concurrency control&lt;/li&gt;
&lt;li&gt;they haven&amp;#8217;t got any versioning (eek)&lt;/li&gt;
&lt;li&gt;incremental updates are a problem; they don&amp;#8217;t want to serve the whole Atom feed (to a mobile device) when only a small amount has changed, so what they do is have several feeds, each of which reveals a different part of the information&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/81&quot; title=&quot;From Trees to Graphs: Evolving XML for building enterprise applications&quot;&gt;From Trees to Graphs: Evolving XML for building enterprise applications&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;Ravi Murthy&lt;/h3&gt;

&lt;p&gt;Ravi Murthy talked about the provision for defining links between documents in &lt;a href=&quot;http://www.oracle.com/&quot; title=&quot;Oracle&quot;&gt;Oracle&lt;/a&gt;&amp;#8217;s database, and their consequent requirements. Information Oracle&amp;#8217;s XML database has a file system abstraction (every XML &amp;#8216;object&amp;#8217; has a file path) with access control, versioning, metadata and protocol access. Within an XML &amp;#8216;object&amp;#8217; stored in the database, they use XLink to represent the relationships with other objects. When you export the XML, the XLinks get resolved to create the XML document.&lt;/p&gt;

&lt;p&gt;Using XLink to represent relationships between documents brings a whole new set of constraints that you might want to express in a schema language, or annotations that you can use to describe the links (depending on how you look at it):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;type&lt;/strong&gt; of the linked resource (eg the document element&amp;#8217;s name, substitution group or XSD type)&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;scope&lt;/strong&gt; of a particular reference, similar to the scoping of XSD&amp;#8217;s identity constraints&lt;/li&gt;
&lt;li&gt;That a particular link is &lt;strong&gt;acyclic&lt;/strong&gt; (eg, given an XPath expression, keep evaluating it and make sure you never get back to where you started)&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;kind&lt;/strong&gt; of a link, one of:
&lt;ul&gt;&lt;li&gt;&lt;strong&gt;hard&lt;/strong&gt;: the target of the link must exist, and cannot be deleted while this resource exists (but can be renamed) &amp;#8212; these are similar to links in normal databases&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;symbolic&lt;/strong&gt;: trust the file path specified by the link and only resolve it on demand&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;weak&lt;/strong&gt;: like a hard link, except the target can be deleted, in which case the link becomes symbolic&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;versioning&lt;/strong&gt; of a link, whether it points to the &amp;#8220;current&amp;#8221; version of a resource or a specific version&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These extra constraints are expressed as annotations on the definitions of &lt;code&gt;xlink:href&lt;/code&gt; attributes in XSD schemas for the documents held in the database.&lt;/p&gt;

&lt;p&gt;Ravi also talked a bit about expressing decomposition rules: how an XML document should be shredded when it gets put into the database. They use XPath to specify rules that indicate that particular elements should be placed at a particular filepath.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;I was really flattered in the tea break. Chatting with a guy called &lt;a href=&quot;http://philwilson.org/blog/&quot; title=&quot;Phil&#039;s Blog&quot;&gt;Phil&lt;/a&gt; working at the University of Bath, who politely asked about my presentation, and after I&amp;#8217;d explained how it was all to do with overlapping markup and that kind of hard-core theory he said: &amp;#8220;You don&amp;#8217;t &lt;em&gt;look&lt;/em&gt; like a markup geek&amp;#8221;. Me: &amp;#8220;What, because I&amp;#8217;m a girl?&amp;#8221;. Him: &amp;#8220;No, no, that&amp;#8217;s not what I meant. You just look more Web 2.0-ey.&amp;#8221; &lt;a href=&quot;http://lapin-bleu.net/riviera/&quot; title=&quot;Max&#039;s Blog&quot;&gt;Max&lt;/a&gt; was there at the time, and labelled me &amp;#8220;the Geekess of XSLT&amp;#8221;, which I think clarified things. (Actually most of the people at XTech this year were Web 2.0-ey rather than markup geeks, but I&amp;#8217;m glad I &lt;em&gt;looked&lt;/em&gt; as though I fitted in.) &lt;/p&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/155&quot; title=&quot;XML-powered Exhibit: A Case Study of JSON &amp;amp; XML Coexistence&quot;&gt;XML-powered Exhibit: A Case Study of JSON &amp;amp; XML Coexistence&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://metacognition.info/&quot; title=&quot;Chimezie Ogbuji&#039;s Website&quot;&gt;Chimezie Ogbuji&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;&amp;#8220;What&amp;#8217;s &lt;a href=&quot;http://simile.mit.edu/wiki/Exhibit&quot; title=&quot;Exhibit Wiki&quot;&gt;Exhibit&lt;/a&gt;?&amp;#8221; I hear you ask. Or maybe you&amp;#8217;re more with-it than I am, but that&amp;#8217;s what I was asking. Chimezie never really explained, but I kinda gathered that it&amp;#8217;s a funky AJAX toolset for creating views of data by importing scripts and using magical IDs and extension attributes within web pages. The other phrase that Chimezie dropped in was &lt;a href=&quot;http://www.w3.org/TR/backplane/&quot; title=&quot;Rich Web Application Backplane&quot;&gt;Rich Web Application Backplane&lt;/a&gt;, which again I hadn&amp;#8217;t heard of. Even having read the W3C Note, I still don&amp;#8217;t get it. Ho hum.&lt;/p&gt;

&lt;p&gt;Anyway, Chimezie made the point that while entering data using XForms is great, it&amp;#8217;s too heavy-weight for viewing that data. Exhibit gives a lot more flexibility (take a look at the &lt;a href=&quot;http://simile.mit.edu/exhibit/examples/presidents/presidents.html&quot; title=&quot;US Presidents in Exhibit&quot;&gt;US presidents&lt;/a&gt; example), which enables users to explore data more freely. In Exhibit pages, you provide a JSON schema for your data, a number of lenses/views/widgets that you can use to view the data, then you embed the widgets in the HTML page and point it at the data source. The JSON schema indicates the type of a particular property (eg &amp;#8220;country&amp;#8221;), and gives labels for it (including a plural label (&amp;#8220;countries&amp;#8221;) and a reverse label (&amp;#8220;country of&amp;#8221;)) that it uses in the widgets.&lt;/p&gt;

&lt;p&gt;But that requires JSON, right? Chimezie showed how easy it is (and it&amp;#8217;s &lt;em&gt;really&lt;/em&gt; easy) to transform data-oriented XML into JSON using XSLT.&lt;/p&gt;

&lt;p&gt;You know, there are all these cool ways out there for viewing information, I just wish I had some really meaty data to use them on! &lt;a href=&quot;http://simile.mit.edu/timeline/&quot; title=&quot;SIMILE Timelines&quot;&gt;Timelines&lt;/a&gt; are one thing, but I&amp;#8217;d also love to find some data to employ in &lt;a href=&quot;http://www.gapminder.org/&quot; title=&quot;Gapminder&quot;&gt;Gapminder&lt;/a&gt; or even in an interface like the one for &lt;a href=&quot;http://www.philipglass.com/glassengine/&quot; title=&quot;Philip Glass Engine&quot;&gt;the music of Philip Glass&lt;/a&gt;. Perhaps I should just mine &lt;a href=&quot;http://base.google.com/&quot; title=&quot;Google Base&quot;&gt;Google Base&lt;/a&gt;, but I&amp;#8217;d like it to be something personally or collectively useful.&lt;/p&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/97&quot; title=&quot;Real-time user-to-user web with Mozilla and XMPP&quot;&gt;Real-time user-to-user web with Mozilla and XMPP&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://blog.hyperstruct.net/&quot; title=&quot;Massimiliano Mirra&#039;s Website&quot;&gt;Massimiliano Mirra&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;This talk was strong on motivation &amp;#8212; the requirement to enhance basic instant messaging functionality &amp;#8212; and strong on demonstration, with Massimiliano chatting and playing with a pre-programmed bot, but really weak on the technical details. It was only through the post-talk questions that we learned that what we&amp;#8217;d seen was based on &lt;a href=&quot;http://www.xmpp.org/&quot; title=&quot;XMPP Standards Foundation&quot;&gt;XMPP (the Extensible Messaging and Presence Protocol)&lt;/a&gt;, which allowed DOM events to be passed between clients. Have to read the paper if you want to learn more.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/21#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/19">google</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/22">rest</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/4">xtech</category>
 <pubDate>Sun, 27 May 2007 22:03:24 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">21 at http://www.jenitennison.com/blog</guid>
</item>
</channel>
</rss>

