<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jenitennison.com/blog" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>atom</title>
 <link>http://www.jenitennison.com/blog/taxonomy/term/18</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>Your Website is Your API: Quick Wins for Government Data</title>
 <link>http://www.jenitennison.com/blog/node/100</link>
 <description>&lt;p&gt;&lt;em&gt;This is the talk I prepared for the UKGovWeb Barcamp, in blog form. It&amp;#8217;s probably better this way. Most of what&amp;#8217;s written here seems blindingly obvious to me, and probably to most readers of this blog, but maybe Google will direct someone here who finds it useful.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Working with public-sector information on the web, one of the things that I take an interest in is making government data freely available for anyone to re-present, mash-up, analyse and generally do whatever they want to do. This post is born out of a feeling that the people who control data don&amp;#8217;t realise that the smallest changes can be beneficial: they don&amp;#8217;t need to do &lt;em&gt;everything&lt;/em&gt; right now, just &lt;em&gt;something&lt;/em&gt;.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;There are three fundamental things that you need to do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;identify&lt;/strong&gt; the data that you control&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;represent&lt;/strong&gt; that data in a way that people can use&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;expose&lt;/strong&gt; the data to the wider world&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;but you can choose the degree to which you do each of these things.&lt;/p&gt;

&lt;h2&gt;Identify&lt;/h2&gt;

&lt;p&gt;Take a look at what data you have some kind of responsibility for or control over. You might be a PDF containing a table of schools in the local area and their intakes over the last couple of years. You might have a spreadsheet of the amount of money assigned to maintaining the playgrounds within the borough. You might have a database of company information. You might have a set of HTML agendas for court cases.&lt;/p&gt;

&lt;p&gt;The first step is simply to identify what the information is &lt;em&gt;about&lt;/em&gt;. Schools, playgrounds, companies, court cases &amp;#8212; each row in your table or spreadsheet or database, or each section in your document will be about something. We call this a &lt;strong&gt;resource&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To play nicely with the web, every resource should have an &lt;strong&gt;identifier&lt;/strong&gt;. A Uniform Resource Identifier. A URI. That URI tells us where we can find information about the resource (we&amp;#8217;ll get to what those look like later). So your second step is to work out URIs for each of your resources.&lt;/p&gt;

&lt;p&gt;Now, there are actually three levels of URIs that you can care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;identifier URIs&lt;/li&gt;
&lt;li&gt;document URIs&lt;/li&gt;
&lt;li&gt;representation URIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You probably already have document and representation URIs on your web server. Representation URIs are URIs for particular formats and languages and views of the information that you make available. Document URIs are typically the same URI without an extension; web servers use &lt;strong&gt;content negotiation&lt;/strong&gt; to work out which representation to serve up when a web browser asks for the page at a particular document URI.&lt;/p&gt;

&lt;p&gt;So you already have a URI for the PDF that contains the table of schools, for the Excel spreadsheet about the playgrounds. You already have URIs for the results of a particular query on your database, and of course the HTML pages that you deliver have URIs already. That&amp;#8217;s all in place. You don&amp;#8217;t want to change it.&lt;/p&gt;

&lt;p&gt;But identifier URIs are what are really important when it comes to opening up your data. They shift the focus from the documents that you serve to the resources that they are about. &lt;strong&gt;By assigning URIs to resources, you enable other people to talk about them. Even if that&amp;#8217;s all you do, you have done good.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For example, if &lt;a href=&quot;http://www.companieshouse.co.uk/&quot; title=&quot;Companies House&quot;&gt;Companies House&lt;/a&gt; stated that companies could be referred to using URIs of the form &lt;code&gt;http://www.companieshouse.co.uk/id/company/{registeredNumber}&lt;/code&gt; then other people who needed to talk about companies (websites containing customer feedback, monitoring companies going into receivership, displaying stock price information, whatever) could use these URIs whenever they referred to a company. If all websites that make data available about companies point to the same identifier for a company, then it&amp;#8217;s possible to pull that data together very easily.&lt;/p&gt;

&lt;p&gt;Now the URIs that you use should be short, clean, readable, hackable, hierarchical and so on. If you can, &lt;strong&gt;you should use a natural identifier for the resource within the URI for that resource&lt;/strong&gt;. So URIs for registered companies should use their registered number. URIs for schools should use the school&amp;#8217;s unique reference number (URN). URIs for playgrounds could use the name of the playground (scoped within the council responsible for the playground). URIs for court cases should include the court, the year, and the case number. And so on.&lt;/p&gt;

&lt;p&gt;Remember as you&amp;#8217;re creating these identifier URIs that they are nothing to do with the structure of your website or the user&amp;#8217;s experience of navigating through your website. For navigation, you might want to group schools into primary, secondary and sixth-form, but you shouldn&amp;#8217;t do that in the identifier URIs. To help decide, imagine someone wanting to construct a URI and the information that they need to do so. If any of the information they need can be derived from other information (as a school&amp;#8217;s type can be derived from its URN), leave it out.&lt;/p&gt;

&lt;p&gt;When you&amp;#8217;re doing this, you might realise that actually you shouldn&amp;#8217;t be the one in control of these URIs. If you&amp;#8217;re not the one assigning the registered number, URN or case number then there&amp;#8217;s probably a higher authority that does assign those (real-world) identifiers. Don&amp;#8217;t let that stop you creating URIs &amp;#8212; you&amp;#8217;ll still find them useful for identifying &lt;em&gt;your&lt;/em&gt; information about that particular resource &amp;#8212; but do look to see if there are existing URIs that you could point to and reuse whatever scheme they&amp;#8217;re using if there are.&lt;/p&gt;

&lt;h2&gt;Represent&lt;/h2&gt;

&lt;p&gt;So I said in the last section that assigning URIs to resources was useful. And it is. But it&amp;#8217;s even more useful if you provide some kind of response when someone &lt;strong&gt;requests&lt;/strong&gt; those URIs. A request for a URI can be done by a web browser or one of those search-engine-spider-things that crawls the web looking for data. Requests are done on the web using HTTP (hypertext transfer protocol), specifically using a &lt;strong&gt;GET&lt;/strong&gt; request, which means &amp;#8220;get this resource&amp;#8221;.&lt;/p&gt;

&lt;p&gt;When a web server receives a request, it sends back a &lt;strong&gt;response&lt;/strong&gt;. The first part of the response is a &lt;strong&gt;status code&lt;/strong&gt; that tells the browser, spider, or whatever issued the request, generally what kind of response it is. Now when a browser says &amp;#8220;get this company&amp;#8221; or &amp;#8220;get this school&amp;#8221; a web server should either respond with a &lt;code&gt;404 Not Found&lt;/code&gt; response or a &lt;code&gt;303 See Other&lt;/code&gt; response.&lt;/p&gt;

&lt;p&gt;If the company or school doesn&amp;#8217;t exist, a web server should respond with a &lt;code&gt;404 Not Found&lt;/code&gt; response. It&amp;#8217;s actually really useful to give appropriate &lt;code&gt;404 Not Found&lt;/code&gt; responses, because it tells whoever made the request that the resource (company/school/playground/court case) doesn&amp;#8217;t exist. This can act as simple validation: if I&amp;#8217;m building a site that parents can use to rate schools, and a parent enters a URN into a form, I can construct a URI based on that URN, try to GET the information about that school, and if I get a &lt;code&gt;404 Not Found&lt;/code&gt; response then I know that the parent has entered an invalid URN.&lt;/p&gt;

&lt;p&gt;If the company or school exists, a web server should respond with a &lt;code&gt;303 See Other&lt;/code&gt; response that points the browser to a &lt;em&gt;document URI&lt;/em&gt; that contains information about the company or school. After all, the web server can&amp;#8217;t very well deliver the company or school itself into your lap; all it can do is give you &lt;em&gt;information&lt;/em&gt; about it. &lt;code&gt;303 See Other&lt;/code&gt; means &amp;#8220;if you want information about that, see that other thing over there instead&amp;#8221;. The &amp;#8220;other thing over there&amp;#8221; will be a document of some kind. It might be the PDF that contains information about the school, or the spreadsheet that contains information about the playground.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simply giving a yes-this-exists or no-this-doesn&amp;#8217;t-exist response is useful. Even if that&amp;#8217;s all you do, you have done good.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s even more useful, though, if you can make the information that you have about the school, playground, company, court case or whatever, available in a format that can be processed by a computer reasonably easily. PDFs are really really hard to extract information from, so do everything you can not to use PDFs. Word documents and Excel spreadsheets are next worse; if you have to use them, keep them really really simple and definitely don&amp;#8217;t use Word Art or embed images to display your data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You should always make your data available in HTML.&lt;/strong&gt; Try to make it as clean and regular as you can; use &lt;a href=&quot;http://www.microformats.org/&quot; title=&quot;microformats&quot;&gt;microformats&lt;/a&gt; to indicate information about people, places and events. If you want to push the boat out, use &lt;a href=&quot;http://www.w3.org/TR/xhtml-rdfa-primer/&quot; title=&quot;W3C: RDFa Primer&quot;&gt;RDFa&lt;/a&gt; to mark up the data in your page even more explicitly.&lt;/p&gt;

&lt;p&gt;The great thing about HTML is that it&amp;#8217;s human readable as well as (if you do it well) machine readable. You can also make your data available in explicitly machine-readable forms as well if you want: XML, JSON, RDF/XML, whatever floats your boat. If there are already standard formats or ontologies for the kind of data that you&amp;#8217;re making available, then use them, certainly, but it&amp;#8217;s very likely that there aren&amp;#8217;t. And in comparison to the nightmare of extracting anything useful from a PDF, it&amp;#8217;s easy to transform between different formats, so you only have to concern yourself with different formats if you want to.&lt;/p&gt;

&lt;p&gt;If you do provide multiple formats for your data, you should use server-driven content negotiation to deliver the data in an appropriate format to whatever&amp;#8217;s requesting it. So a web browser will request HTML; a semantic web crawler will request RDF/XML; a Javascript program will request JSON and so on. The &lt;code&gt;200 OK&lt;/code&gt; response that the web server sends with your data should include a &lt;code&gt;Content-Location&lt;/code&gt; header that gives the representation URI of whichever format is being returned, and a &lt;code&gt;Vary&lt;/code&gt; header that tells caches how it&amp;#8217;s decided which representation to serve up.&lt;/p&gt;

&lt;h2&gt;Expose&lt;/h2&gt;

&lt;p&gt;All the good work identifying resources and representing them comes to naught if you don&amp;#8217;t expose it. You can (and should!) tell other people about the URIs that you&amp;#8217;ve developed, but the best way to give them exposure is to use them yourself, within your website. &lt;strong&gt;Simply using the URIs within your website gives them exposure. Even if that&amp;#8217;s all you do, you have done good.&lt;/strong&gt; People who are interested in linking to you will look at your site and they will learn about your URI scheme from your use of it.&lt;/p&gt;

&lt;p&gt;The identifier URIs that you&amp;#8217;ve created might not be particularly easy to generate. For example, with the URI scheme that I suggested above for Companies House, unless you happen to know that Tesco Plc&amp;#8217;s registered company number is &lt;code&gt;00445790&lt;/code&gt;, you&amp;#8217;re not going to be able to get to information about them. So &lt;strong&gt;you should have a way of searching&lt;/strong&gt; based on something that people &lt;em&gt;will&lt;/em&gt; know, such as the name of the company. Use an HTML search form that makes GET requests like&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.companieshouse.gov.uk/company?name=Tesco Plc
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The response should be a &lt;code&gt;302 Found&lt;/code&gt; that redirects (using the &lt;code&gt;Location&lt;/code&gt; header) to the true identifier URI for the company (&lt;code&gt;http://www.companieshouse.gov.uk/id/company/00445790&lt;/code&gt;). If it&amp;#8217;s not possible to identify a single resource from the search string (for example, there are lots of companies with &amp;#8216;Tesco&amp;#8217; in their name), then the correct response is a &lt;code&gt;300 Multiple Choices&lt;/code&gt; that provides a list of links to the possible URIs (in HTML).&lt;/p&gt;

&lt;p&gt;There are other ways to help people find your data. If there aren&amp;#8217;t gazillions of resources, you can list the URIs within your &lt;strong&gt;sitemap&lt;/strong&gt;, which will make them discoverable by search engines. You can also list them on web pages and, especially for data that&amp;#8217;s constantly updating, in (Atom) &lt;strong&gt;feeds&lt;/strong&gt; which you link to from your HTML pages. Use metadata within the pages and feeds to help the consumers of your data work out what&amp;#8217;s relevant to them.&lt;/p&gt;

&lt;p&gt;To help even more, slice your Atom feeds into portions that different consumers of your data are going to be interested in. Slice by type, by area, by subject. That way people can stay up to date with just the resources that they&amp;#8217;re interested in, and not be bothered with information about those that are irrelevant to them.&lt;/p&gt;

&lt;h2&gt;That&amp;#8217;s It&lt;/h2&gt;

&lt;p&gt;What I&amp;#8217;ve tried to describe here is the minimum that you need to do to help people use the information you have, and some of the other things that you can do to make it even more useful. Here are some things that you shouldn&amp;#8217;t do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;don&amp;#8217;t wait for someone else to define a URI scheme for the things that you want to talk about&lt;/li&gt;
&lt;li&gt;don&amp;#8217;t wait for someone else to define an XML schema or RDF ontology for your data&lt;/li&gt;
&lt;li&gt;don&amp;#8217;t wait until you can find the time and money to do it all &amp;#8220;properly&amp;#8221;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just do what you can, now.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/100#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/42">rdfa</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/22">rest</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/43">ukgc09</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/12">web</category>
 <pubDate>Sun, 01 Feb 2009 09:28:57 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">100 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>The Distributed Web</title>
 <link>http://www.jenitennison.com/blog/node/90</link>
 <description>&lt;p&gt;XTech was subtitled &amp;#8220;the mobile web&amp;#8221;, but one of the major themes for me was that of &lt;strong&gt;the distributed web&lt;/strong&gt;. The &lt;a href=&quot;http://assets.expectnation.com/15/event/3/Why%20%22open%22%20matters%20—%20from%20innovation%20to%20commoditisation%20Paper%201.pdf&quot; title=&quot;XTech 2008: Why &amp;quot;open&amp;quot; matters — from innovation to commoditisation&quot;&gt;first keynote&lt;/a&gt;, by &lt;a href=&quot;http://www.gardeviance.org/about-me&quot; title=&quot;Simon Wardley&quot;&gt;Simon Wardley&lt;/a&gt;, gave a vision of a future in which hardware, frameworks and applications are services in the cloud rather than products on machines we own: where we use &lt;a href=&quot;http://www.flickr.com/&quot; title=&quot;flickr&quot;&gt;flickr&lt;/a&gt; to store our photographs, &lt;a href=&quot;http://code.google.com/appengine/&quot; title=&quot;Google App Engine&quot;&gt;Google App Engine&lt;/a&gt; to host our applications, and &lt;a href=&quot;http://www.amazon.com/gp/browse.html?node=16427261&quot; title=&quot;Amazon Simple Storage Service&quot;&gt;Amazon S3&lt;/a&gt; to store our data. In &lt;a href=&quot;http://www.davidrecordon.com/&quot; title=&quot;David Recordon&quot;&gt;David Recordon&lt;/a&gt;&amp;#8217;s keynote (&lt;a href=&quot;http://adactio.com/journal/1461/&quot; title=&quot;Adactio: David Recordon’s XTech keynote&quot;&gt;written up by Jeremy Keith&lt;/a&gt;), he talked about small, specific services provided by sites that aren&amp;#8217;t &amp;#8220;destination sites&amp;#8221;. The same picture was painted by &lt;a href=&quot;http://morethanseven.net/&quot; title=&quot;Gareth Rushgrove&quot;&gt;Gareth Rushgrove&lt;/a&gt; in his talk on &lt;a href=&quot;http://2008.xtech.org/public/schedule/detail/549&quot; title=&quot;XTech 2008: Design Strategies for a Distributed Web&quot;&gt;Design Strategies for a Distributed Web&lt;/a&gt;.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;So I was surprised at how contentious &lt;a href=&quot;http://www.cwi.nl/~steven/&quot; title=&quot;Steven Pemberton&quot;&gt;Steven Pemberton&lt;/a&gt;&amp;#8217;s talk on &lt;a href=&quot;http://2008.xtech.org/public/schedule/detail/545&quot; title=&quot;XTech 2008: Why you should have a Website&quot;&gt;Why you should have a Website&lt;/a&gt; (thankfully again &lt;a href=&quot;http://adactio.com/journal/1468/&quot; title=&quot;Adactio: Why you should have a Website&quot;&gt;documented by Jeremy Keith&lt;/a&gt;) proved to be. Because to me it seemed to be the logical extension to the distribution of hardware, frameworks and application: the distribution of data. In fact, I&amp;#8217;ve &lt;a href=&quot;http://www.jenitennison.com/blog/node/60&quot; title=&quot;Jeni&#039;s Musings: A sketch: personal APP servers and feed-based web apps&quot;&gt;written about the same idea myself&lt;/a&gt;, &lt;a href=&quot;http://www.ldodds.com/blog/archives/000330.html&quot; title=&quot;Lost Boy: Google AppEngine for Personal Web Presence?&quot;&gt;as has Leigh Dodds&lt;/a&gt;, more recently.&lt;/p&gt;

&lt;p&gt;From the session, the main question seems to be &amp;#8220;how could we do flickr without them holding our data?&amp;#8221; I don&amp;#8217;t want to particularly pick on flickr, especially because it&amp;#8217;s not one of the worst offenders, but the problem of serving and sharing images does illustrate a whole range of issues, so I will use it as an example. I could just as easily be talking about ancestry.com. The way I see it, you need three levels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;providers&lt;/strong&gt; which make information available in known formats&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;user interfaces&lt;/strong&gt; which provide the end-user with a way to access and manipulate the information&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;brokers&lt;/strong&gt; which locate information on the web and provide an aggregated interface&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(It occurs to me that this is similar to a model/view/controller architecture: the providers give the model, the user interfaces give the views and the brokers control the flow between the two.)&lt;/p&gt;

&lt;p&gt;Where flickr is at the moment is a conglomeration of the three: to have your photo appear on flickr, and to gain the advantages that it gives you in terms of tag-based aggregations and social networking, you have to upload it. They are then the provider of the image+metadata (perhaps the only place it is located on the web), the user interface on the image+metadata (the interface through which the image is annotated), and the broker (they provide keyword-based retrieval, for example).&lt;/p&gt;

&lt;p&gt;What would it look like to separate those functions?&lt;/p&gt;

&lt;p&gt;First, you, as the owner of the image+metadata, could put your data anywhere: on a home wireless network box, on a webserver hosted by an ISP of your choice, on a site specifically designed for hosting photos. Your data is exposed to the larger web through a standard read/write protocol (I&amp;#8217;m betting on &lt;a href=&quot;http://tools.ietf.org/html/rfc5023&quot; title=&quot;RFC 5023: The Atom Publishing Protocol&quot;&gt;AtomPub&lt;/a&gt;) that allows you to provide metadata both about resources and the links between resources. The point of it being read/write is that it allows other people to add metadata to or links from your resource to others, such as adding a comment on your image.&lt;/p&gt;

&lt;p&gt;Second, an information broker would locate your photos by crawling for them (or perhaps by you submitting the URL somewhere, but mostly that shouldn&amp;#8217;t be necessary). There are already information brokers around: Google provides a &lt;a href=&quot;http://code.google.com/apis/ajaxsearch/documentation/#fonje&quot; title=&quot;Google AJAX Search API&quot;&gt;RESTful API for general search results&lt;/a&gt;, &lt;a href=&quot;http://developer.yahoo.com/search/&quot; title=&quot;Yahoo Search Web Services&quot;&gt;as does Yahoo!&lt;/a&gt;; at XTech, &lt;a href=&quot;http://dowhatimean.net/&quot; title=&quot;Richard Cyganiak&quot;&gt;Richard Cyganiak&lt;/a&gt; talked about &lt;a href=&quot;http://sindice.com/&quot; title=&quot;Sindice&quot;&gt;Sindice&lt;/a&gt;, and &lt;a href=&quot;http://sw.deri.org/~aidanh/&quot; title=&quot;Aidan Hogan&quot;&gt;Aidan Hogan&lt;/a&gt; about the &lt;a href=&quot;http://www.swse.org/&quot; title=&quot;Semantic Web Search Engine&quot;&gt;Semantic Web Search Engine&lt;/a&gt;, both of which crawl for RDF triples and provide an API for querying the results. In an AtomPub-based environment, you&amp;#8217;d want an information broker that located Atom feeds and resources, indexed them, and provided an AtomPub-based API for publishers to use.&lt;/p&gt;

&lt;p&gt;Third, a user interface would provide an attractive and usable front-end that brought together many different sets of information. For example, flickr might combine your friends feed with an image search to provide a view of images recently made available by your friends. There&amp;#8217;s no requirement for your friends to use flickr for this to work: flickr queries a broker for a list of your friends, then queries a broker for images by a particular person, the broker searches its index and points the application to the original resources that are provided by your friends.&lt;/p&gt;

&lt;p&gt;A user interface has another role, though: to add to the web. Flickr wants to make it easy to add tags to photos, to create sets and collections that help you navigate your photos, for others to add comments and so on and on. And that&amp;#8217;s fine, because AtomPub is a read/write API. To add a tag to a photo, flickr simply edits the resource with PUT. To add a comment, it locates the comment feed (which would be referenced from the entry for the particular image) and POSTs to create a new resource. And everyone can see those changes &amp;#8212; the added value that you get from a social network.&lt;/p&gt;

&lt;p&gt;None of this is to say that a single application can&amp;#8217;t act as provider, broker and publisher at the same time, but I&amp;#8217;m certain that users will favour those applications that do &lt;em&gt;all&lt;/em&gt; of each role: provide to the whole web, broker the whole web, provide a user interface to the whole web. Flickr is almost there, but it doesn&amp;#8217;t do the whole brokering job because it only brokers the data it provides, and therefore it doesn&amp;#8217;t provide the whole user interface job.&lt;/p&gt;

&lt;p&gt;This distributed web is a clear win, particularly for users, over walled gardens. They can switch from user interface to user interface, even use more than one at a time (perhaps one application is good for browsing while another is good for categorising), without any cost. They can choose who to use to serve their information on the basis of things that matter when you&amp;#8217;re serving information (low downtime, backups, security, etc.) rather than on how pretty an interface looks or how much functionality it gives you. On the other side of the equation, applications get to do one thing and do it well.&lt;/p&gt;

&lt;p&gt;It seems to me that this is simply how the web works, and the questions we should be asking are about privacy and trust and licensing and revenue models and standards development.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/90#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/12">web</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/39">xtech2008</category>
 <pubDate>Sun, 11 May 2008 21:07:29 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">90 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Partial implementations #2: XSLT in Google Search Appliance</title>
 <link>http://www.jenitennison.com/blog/node/64</link>
 <description>&lt;p&gt;A &lt;a href=&quot;http://www.google.com/enterprise/gsa/&quot; title=&quot;Google Search Appliance&quot;&gt;Google Search Appliance&lt;/a&gt; (GSA) is a box that you plug into your network which crawls and indexes your data, and serves up the results of searches. Search results come in an XML format, and there&amp;#8217;s a built in XSLT engine which means you can convert that XML into as many different views as you like. So you can have HTML-based search results, summaries, feeds, and so on.&lt;/p&gt;

&lt;p&gt;My task recently was to debug some XSLT that transformed the GSA XML into an Atom feed. Easy enough, right? The GSA &lt;a href=&quot;http://code.google.com/apis/searchappliance/documentation/46/xml_reference.html#results_xml&quot; title=&quot;Google Search Appliance Documentation: XML Results Reference&quot;&gt;XML format&lt;/a&gt; is pretty hideous &amp;#8212; most of the elements max out at three capital letters in length (whatever happened to human-readability) &amp;#8212; but logical enough, and the mapping is hardly complex.&lt;/p&gt;

&lt;p&gt;But all was not as it seemed. The GSA&amp;#8217;s XSLT implementation is&amp;#8230; how can I put this politely?&amp;#8230; &amp;#8220;non-standard&amp;#8221;. This post describes some of the problems and workarounds.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;To get the GSA to use your own XSLT, you have to go through its web interface. Basically there&amp;#8217;s a form with a text field in which you can type your XSLT. Or you can upload a file that you develop offline. Naturally you&amp;#8217;re going to do the latter because it means you can use your favourite editor with helpful things like syntax highlighting and validation-as-you-type, but of course that means switching between web browser windows and your IDE as you develop.&lt;/p&gt;

&lt;p&gt;So I upload the transformation, point the browser at a relevant search page, and&amp;#8230; oh&amp;#8230;&lt;/p&gt;

&lt;p&gt;When the GSA doesn&amp;#8217;t like the XSLT that you use, you get a really helpful error message. It says:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Internal server error.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So you know that there&amp;#8217;s been an error. With the server. Internally.&lt;/p&gt;

&lt;p&gt;Back to basics, I thought. Let&amp;#8217;s find out what processor the server&amp;#8217;s using. Then we can develop on that processor and be pretty sure the resulting XSLT will work. So I load up the default XSLT (which is used to create an HTML result) and add the line&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:value-of select=&quot;system-property(&#039;xsl:vendor&#039;)&quot; /&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Save the XSLT, reload the page, and&amp;#8230;&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Internal server error.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Okaaay&amp;#8230; so this is an XSLT processor that doesn&amp;#8217;t support the &lt;code&gt;xsl:vendor&lt;/code&gt; system property. If it doesn&amp;#8217;t support that, I&amp;#8217;m going to have to tread carefully. So let&amp;#8217;s start with something really simple:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet version=&quot;1.0&quot;
   xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;&amp;gt;

&amp;lt;xsl:template match=&quot;/&quot;&amp;gt;
  &amp;lt;xsl:copy-of select=&quot;.&quot; /&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Save the XSLT, reload the page, and&amp;#8230;&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Internal server error.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On a whim, I tried&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot; 
   version=&quot;1.0&quot;&amp;gt;

&amp;lt;xsl:template match=&quot;/&quot;&amp;gt;
  &amp;lt;xsl:copy-of select=&quot;.&quot; /&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;instead. Save the XSLT, reload the page, and&amp;#8230; Success!&lt;/p&gt;

&lt;p&gt;Can you spot the difference? Yes, that&amp;#8217;s right: it&amp;#8217;s the order of the XSLT namespace declaration and the version attribute. Namespace declaration first, you&amp;#8217;re OK, version first, you&amp;#8217;re not.&lt;/p&gt;

&lt;p&gt;Okaaay&amp;#8230; so this is an XSLT processor that doesn&amp;#8217;t support the XML Recommendation (which says that attribute order doesn&amp;#8217;t matter). But heck, why split hairs? At least it&amp;#8217;s working! Now to create some Atom instead:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot; 
   version=&quot;1.0&quot;
   xmlns=&quot;http://www.w3.org/2005/Atom&quot;&amp;gt;

&amp;lt;xsl:template match=&quot;/&quot;&amp;gt;
  &amp;lt;feed /&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Save the XSLT, reload the page, and we&amp;#8217;re back to&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Internal server error&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At least there&amp;#8217;s some &lt;a href=&quot;http://code.google.com/apis/searchappliance/documentation/46/xml_reference.html#results_xslt&quot; title=&quot;Google Search Appliance Documentation: Custom HTML&quot;&gt;documentation&lt;/a&gt; about this one:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;XSL stylesheets that include other files may not be used with the Google search engine. An XSL stylesheet that contains the following tags generates an error result:&lt;/p&gt;
  
  &lt;ul&gt;
  &lt;li&gt;&lt;code&gt;&amp;lt;xsl:import&amp;gt;&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;&amp;lt;xsl:include&amp;gt;&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;xmlns:&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;document()&lt;/code&gt;&lt;/li&gt;
  &lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Read that again. Yes, the third bullet point. That&amp;#8217;s right, it&amp;#8217;s saying that an XSLT that contains a namespace declaration will generate an error result because it &amp;#8220;includes other files&amp;#8221;.&lt;/p&gt;

&lt;p&gt;But, but, but, namespace declarations in XSLT stylesheets (or elsewhere for that matter) do not indicate file inclusion. Namespace URIs are &lt;em&gt;identifiers&lt;/em&gt;, not &lt;em&gt;locations&lt;/em&gt;. They are strings. They are not resolved. You do not need to be connected to the &amp;#8216;net to use them.&lt;/p&gt;

&lt;p&gt;And how am I supposed to serve an Atom feed, since Atom documents use a namespace? Or XHTML for that matter? Fortunately, the GSA only goes so far in banning namespace declarations: you&amp;#8217;re OK as long as you don&amp;#8217;t put them on the &lt;code&gt;&amp;lt;xsl:stylesheet&amp;gt;&lt;/code&gt; element. Moving it to the &lt;code&gt;&amp;lt;feed&amp;gt;&lt;/code&gt; element as in&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot; 
   version=&quot;1.0&quot;&amp;gt;

&amp;lt;xsl:template match=&quot;/&quot;&amp;gt;
  &amp;lt;feed xmlns=&quot;http://www.w3.org/2005/Atom&quot; /&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and you&amp;#8217;re OK. Of course you have to repeat the namespace declaration in every template so you don&amp;#8217;t end up creating elements in no namespace. Tedious, oh so tedious, but workable.&lt;/p&gt;

&lt;p&gt;(I have a vague suspicion that the idea behind banning namespace declarations is something to do with certain XSLT processors using namespace URIs to pull in Java classes. But addressing that problem by banning namespace declarations entirely isn&amp;#8217;t just throwing the baby out with the bathwater, it&amp;#8217;s throwing the whole bathroom suite out of the window. And if you then allow namespace declarations further down the stylesheet, you haven&amp;#8217;t actually solved the problem.)&lt;/p&gt;

&lt;p&gt;Amazingly enough, given the inauspicious beginning, everything else I tried actually worked. I suspect that it&amp;#8217;s some standard XSLT processor underneath with a regex based filter that (among other things) limits what&amp;#8217;s allowed in the &lt;code&gt;&amp;lt;xsl:stylesheet&amp;gt;&lt;/code&gt; start tag. They probably disallow &lt;code&gt;system-property(&#039;xsl:vendor&#039;)&lt;/code&gt; for security &amp;#8212; knowledge is power, after all.&lt;/p&gt;

&lt;p&gt;Anyway, my suggestions to others who might want to create a customised XSLT processor:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use a custom URL resolver to restrict access to documents.&lt;/li&gt;
&lt;li&gt;Restrict external function calls using something like the &lt;code&gt;ALLOW_EXTERNAL_FUNCTIONS&lt;/code&gt; property in JAXP&lt;/li&gt;
&lt;li&gt;Document the restrictions you&amp;#8217;re placing on the stylesheets.&lt;/li&gt;
&lt;li&gt;Produce meaningful error messages that explain the extra restrictions when they&amp;#8217;re broken.&lt;/li&gt;
&lt;/ol&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/64#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/5">xslt</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/19">google</category>
 <pubDate>Fri, 23 Nov 2007 22:22:19 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">64 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Incomplete implementations #1: Atom in IE7</title>
 <link>http://www.jenitennison.com/blog/node/63</link>
 <description>&lt;p&gt;IE7 gives you a really quite nice view of an Atom feed. Take a look at the &lt;a href=&quot;http://www.jenitennison.com/blog/atom/feed&quot; title=&quot;Jeni&#039;s Musings: Atom feed&quot;&gt;one for this blog&lt;/a&gt;, for example. You can filter by category, sort by date or title or author, and search for particular words or phrases. Pretty neat.&lt;/p&gt;

&lt;p&gt;But it&amp;#8217;s only a partial implementation. I&amp;#8217;ve been having to create some Atom feeds recently, and getting them to display nicely in IE7 has proven a bit tricky. I couldn&amp;#8217;t find any documentation about this with a quick google, so thought I&amp;#8217;d blog it for future reference.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;The main things are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Do not use a prefix for the Atom namespace. IE7&amp;#8217;s Atom support is not namespace aware. (I gather this is a problem in a lot of Atom readers. Sigh.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you don&amp;#8217;t embed the content in the entry then it will display the summary; to get a clickable link to the content, you must provide a &lt;code&gt;&amp;lt;link&amp;gt;&lt;/code&gt; element whose &lt;code&gt;href&lt;/code&gt; attribute points to that content. The &lt;code&gt;&amp;lt;link&amp;gt;&lt;/code&gt; element mustn&amp;#8217;t have any other attributes on it, so don&amp;#8217;t add &lt;code&gt;rel=&quot;alternate&quot;&lt;/code&gt; or &lt;code&gt;type=&quot;application/xhtml+xml&quot;&lt;/code&gt;. Having a &lt;code&gt;src&lt;/code&gt; on &lt;code&gt;&amp;lt;content&amp;gt;&lt;/code&gt; will not give you a link.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you want to support viewing the feed in IE6 as well, you need to provide a stylesheet to transform it into something nice. As long as the feed is served as XML, IE6 will interpret an &lt;code&gt;&amp;lt;?xml-stylesheet?&amp;gt;&lt;/code&gt; PI correctly, and both IE7 and Firefox will ignore it (which is probably what you want, since both Firefox and IE7 have decent native rendering of Atom).&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/63#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <pubDate>Fri, 23 Nov 2007 20:42:28 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">63 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>A sketch: personal APP servers and feed-based web apps</title>
 <link>http://www.jenitennison.com/blog/node/60</link>
 <description>&lt;p&gt;OK, so I can&amp;#8217;t remain a Luddite for long. What&amp;#8217;s a technological solution to the &lt;a href=&quot;http://www.jenitennison.com/blog/node/59&quot; title=&quot;Jeni&#039;s Musings: Posterity&quot;&gt;posterity problem&lt;/a&gt;, in particular in regard to web applications that tuck away all your data in their databases, just waiting to be forgotten?&lt;/p&gt;

&lt;p&gt;Well, what if web applications accepted information as feeds rather than through forms? The original data would be distributed rather than centralised. Web applications would use the web as more than a distribution medium: they would be &lt;a href=&quot;http://www.25hoursaday.com/weblog/2007/10/20/IfYouFightTheWebYouWillLose.aspx&quot; title=&quot;Dare Obasanjo: If You Fight the Web You Will Lose&quot;&gt;&lt;em&gt;of&lt;/em&gt; the web rather than simply &lt;em&gt;on&lt;/em&gt; the web&lt;/a&gt;.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;How it would work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Keep your data on your computer (in XML where possible). Use whatever tool you like to create and edit it (by hand, using a dedicated standalone application, using a browser-based application in the manner of &lt;a href=&quot;http://www.tiddlywiki.com/&quot; title=&quot;TiddlyWiki&quot;&gt;TiddlyWiki&lt;/a&gt;, or however), in some common markup language.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Serve your data through a web server that supports &lt;a href=&quot;http://www.ietf.org/rfc/rfc5023.txt&quot; title=&quot;The Atom Publishing Protocol&quot;&gt;APP&lt;/a&gt;, and provide a feed that exposes the data. I&amp;#8217;m not saying that this is easy for &lt;a href=&quot;http://en.wikipedia.org/wiki/Placeholder_name#People&quot; title=&quot;Wikipedia: Placeholder names for people&quot;&gt;Joe Bloggs&lt;/a&gt; to do now, but if we&amp;#8217;re talking about having &lt;a href=&quot;http://dubinko.info/blog/2006/06/04/would-you-run-a-web-server-on-your-phone/&quot; title=&quot;Micah Dubinko:  Would you run a web server on your phone?&quot;&gt;web servers on mobile phones&lt;/a&gt; then surely it&amp;#8217;s not long before having a personal web server is a matter of course, and why not with APP support? Feeds could be generated based on directory structures, or simply created as the main file format for a particular application.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Visit the web application that you want to use and point them at your data feed. They access, store and index your data, and relate it to the other data that they have stored from other people&amp;#8217;s feeds. It&amp;#8217;s their responsibility to keep their database up to date by doing a regular crawl of the feeds they know about; they don&amp;#8217;t have to store &lt;em&gt;all&lt;/em&gt; your data, just the bits that enable them to do their job.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Edit your data through the web application&amp;#8217;s interface; it can update your data for you using APP. (The web application is a APP &lt;em&gt;client&lt;/em&gt;, and you host an APP &lt;em&gt;server&lt;/em&gt;.) That can include adding comments or whatever other community-level annotations you might expect.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why bother?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You have all your data locally (although it might be mirrored elsewhere).&lt;/li&gt;
&lt;li&gt;You can edit your data in whatever tool you want to use, but still use a funky &amp;#8220;rich internet application&amp;#8221; to view it and share it.&lt;/li&gt;
&lt;li&gt;You can provide the same data feed to any number of web applications; you&amp;#8217;re not locked in.&lt;/li&gt;
&lt;li&gt;Because it&amp;#8217;s stored locally, your data is available for viewing and editing even when you&amp;#8217;re offline.&lt;/li&gt;
&lt;li&gt;Because the important parts are cached by a web application, your data is available for viewing and editing even when you&amp;#8217;re away from your normal computer.&lt;/li&gt;
&lt;li&gt;Your friends can access your data directly in a peer-to-peer network.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The obvious problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Setup: Web servers aren&amp;#8217;t that easy to set up and maintain at the moment, because they&amp;#8217;re designed for the use of sysadmins who are quite happy hacking text-based configuration files. That could change (look at WiFi router setup nowadays compared to how it was a few years ago).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Security: you might not want everyone to be able to access the data you keep on your personal web server. So you&amp;#8217;ll need a way of assigning user names and passwords to feeds, and only handing those over to web applications that you trust.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Server load: it&amp;#8217;s not so much DoS attacks (which tend to have big juicy targets), but the fact you&amp;#8217;re more likely to experience large volumes of requests if, say, a picture you host suddenly gets popular, and might not be able to respond to them, or might have to start paying heavily for the bandwidth. So sites that offer hosting will still be really useful, particularly for media that&amp;#8217;s (a) large and (b) likely to be embedded in other people&amp;#8217;s pages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Schemas: you can&amp;#8217;t split creation of XML from processing of XML unless you have some kind of mutual understanding about the markup language that&amp;#8217;s used. This used to seem like a major stumbling block to me, but in fact people seem to be sensible enough to use standards where there are standards, write stylesheets to convert between languages, extend what exists with what they need in more-or-less acceptable ways and generally muddle through without spending years in meetings thrashing out a single consensual model. And Atom and APP are pretty good at supporting extensions and so on that would make that work.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My hope is that we&amp;#8217;ll get round to trying some of these ideas out in our &lt;a href=&quot;http://www.jenitennison.com/blog/node/54&quot; title=&quot;Jeni&#039;s Musings: Web 2.0 Project&quot;&gt;genealogy-based Web 2.0 project&lt;/a&gt;.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/60#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/12">web</category>
 <pubDate>Fri, 19 Oct 2007 21:27:58 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">60 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Web 2.0 Project: Using Atom and XML with Graph Data Structures</title>
 <link>http://www.jenitennison.com/blog/node/54</link>
 <description>&lt;p&gt;&lt;a href=&quot;http://www.louisecrow.com/blog/&quot; title=&quot;Louise Crow&#039;s blog&quot;&gt;A Ruby on Rails specialist friend&lt;/a&gt; and I are building a Web 2.0 application. I would say it&amp;#8217;s &amp;#8220;social networking for the dead&amp;#8221; except that I doubt that description would be attractive to most people (my ex-Goth &lt;a href=&quot;http://en.wikipedia.org/wiki/Domestic_partnership&quot; title=&quot;Wikipedia: domestic partner/common law husband/father of my children etc. etc.&quot;&gt;defacto&lt;/a&gt; being a rare exception), and it can be for the living too. It&amp;#8217;s a bit like &lt;a href=&quot;http://www.ancestry.com/&quot; title=&quot;ancestry.com&quot;&gt;all&lt;/a&gt; &lt;a href=&quot;http://www.familypursuit.com/&quot; title=&quot;familypursuit.com&quot;&gt;those&lt;/a&gt; &lt;a href=&quot;http://www.geni.com/&quot; title=&quot;geni.com&quot;&gt;genalogy&lt;/a&gt; websites, except that our focus is on people&amp;#8217;s social relationships as well as their familial ones.&lt;/p&gt;

&lt;p&gt;(I should say that this is all very casual. We&amp;#8217;re both fitting it in around our other responsibilities, and are mainly interested in working together, learning new things, and trying out all the best practices that everyone keeps talking about. So don&amp;#8217;t think I&amp;#8217;m becoming a dotcom entrepreneur or anything. Its got a very Web 2.0 name, and I&amp;#8217;m only not telling you in case you start hitting our servers. We&amp;#8217;re nowhere near ready for visitors.) &lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;We&amp;#8217;re using the &lt;a href=&quot;http://www.ngsgenealogy.org/ngsgentech/projects/Gdm/Gdm.cfm&quot; title=&quot;GENTECH genealogical data model&quot;&gt;Gentech data model&lt;/a&gt; as the basis for the application (though I expect that we&amp;#8217;ll tweak it a bit). You don&amp;#8217;t really need to know anything about it to follow what I&amp;#8217;m talking about here. The Gentech data model is very much a relational model. They might call it a logical model, but for anyone who &lt;em&gt;isn&amp;#8217;t&lt;/em&gt; a database head, it&amp;#8217;s a physical model. That&amp;#8217;s fine; we&amp;#8217;re storing our data in a database, so a relational model for that is great.&lt;/p&gt;

&lt;p&gt;In the Rails world, the model that Rails is object-oriented rather than relational. So there&amp;#8217;s a certain amount of mapping from the relational world into the OO world, in particular eliding the tables that are created simply for normalisation purposes. Making that mapping is one thing that Rails is very good at, of course.&lt;/p&gt;

&lt;p&gt;Then we&amp;#8217;re into the worlds that I&amp;#8217;m particularly interested in. One of our goals is to use &lt;a href=&quot;http://en.wikipedia.org/wiki/Atom_(standard)&quot; title=&quot;Wikipedia: Atom&quot;&gt;Atom&lt;/a&gt; as an API, on the basis that it&amp;#8217;s a fairly generic way of packaging things (entries) and lists-of-things (feeds) with a bunch of metadata. Plus, the &lt;a href=&quot;http://www.ietf.org/internet-drafts/draft-ietf-atompub-protocol-17.txt&quot; title=&quot;Atom Publishing Protocol&quot;&gt;Atom Publication Protocol&lt;/a&gt; shows you how to do RESTful applications right.&lt;/p&gt;

&lt;p&gt;The trouble, &lt;a href=&quot;http://code.google.com/apis/gdata/overview.html&quot; title=&quot;Google Data (GData) API&quot;&gt;as others&lt;/a&gt; &lt;a href=&quot;http://www.25hoursaday.com/weblog/2007/06/09/WhyGDataAPPFailsAsAGeneralPurposeEditingProtocolForTheWeb.aspx&quot; title=&quot;Dare Obasanjo: Why GData/APP Fails as a General Purpose Editing Protocol for the Web&quot;&gt;have found&lt;/a&gt; is that Atom is designed for a flattish structure, in which you have things, and a list of things. Like blog posts and feeds of posts, or pictures and feeds of pictures. But the model that we&amp;#8217;re starting from is relational, or object-oriented, or anyway it&amp;#8217;s a &lt;strong&gt;graph&lt;/strong&gt;. And that makes things more complicated.&lt;/p&gt;

&lt;p&gt;The first steps are pretty obvious. Objects are equivalent to entries, and lists of objects equivalent to feeds. So every object has its own URL, and every significant feed has its own URL too. There&amp;#8217;s the obvious &lt;code&gt;http://www.example.com/people/DarwinC01&lt;/code&gt; for a person, and &lt;code&gt;http://www.example.com/people/&lt;/code&gt; for a feed of people, but also &lt;code&gt;http://www.example.com/people/DarwinC01/events/&lt;/code&gt; for events that are related to a particular person. An entry&amp;#8217;s content is an XML document that describes the equivalent object. It has attributes and children to represent the properties from the OO model (columns in the database tables).&lt;/p&gt;

&lt;p&gt;Atom defines a bunch of metadata that you can associate with the content in an entry. These are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;id&lt;/li&gt;
&lt;li&gt;title&lt;/li&gt;
&lt;li&gt;summary (optional, as long as there&amp;#8217;s textual or XML content)&lt;/li&gt;
&lt;li&gt;updated&lt;/li&gt;
&lt;li&gt;published (optional)&lt;/li&gt;
&lt;li&gt;category (multiple, optional)&lt;/li&gt;
&lt;li&gt;source (optional)&lt;/li&gt;
&lt;li&gt;author (multiple, optional as long as there&amp;#8217;s a source that specifies one or the entry&amp;#8217;s in a feed that specifies one)&lt;/li&gt;
&lt;li&gt;contributor (multiple, optional)&lt;/li&gt;
&lt;li&gt;link (multiple, optional as long as there&amp;#8217;s some content)&lt;/li&gt;
&lt;li&gt;rights (optional, defaults to the feed&amp;#8217;s rights)&lt;/li&gt;
&lt;li&gt;extension elements (optional)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The metadata properties need to be used to indicate who created/updated the object and when. This gets confusing because some of the information in our system is likely to be &lt;em&gt;about&lt;/em&gt; content that has authors and publishing dates and so on: the Gentech data model is strong on documenting the sources of information about people you&amp;#8217;re reasearching. Even when documenting the source of some information, the Atom metadata should still be metadata about that object in our data model.&lt;/p&gt;

&lt;p&gt;The set of Atom metadata does indicate a place where we&amp;#8217;re going to want to tweak the Gentech data model though: every object should have metadata associated with it, at the very least an updated date, to populate the Atom metadata fields. Also, we need to identify the property of each object that is used in the &lt;code&gt;&amp;lt;atom:title&amp;gt;&lt;/code&gt;, though the title can be something generic if there isn&amp;#8217;t an obvious one.&lt;/p&gt;

&lt;p&gt;Now the question that&amp;#8217;s vexing me: how should we represent relationships to other objects/entries? Let&amp;#8217;s take the example of documenting &lt;a href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; title=&quot;About Darwin: HMS Beagle Voyage&quot;&gt;Charles Darwin&amp;#8217;s voyage on HMS Beagle&lt;/a&gt;. It goes something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry&amp;gt;
  ...
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
      ...
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;&amp;lt;evr:passenger&amp;gt;&lt;/code&gt; element needs to reference a person and a voyage (event), to say that Darwin was a passenger on the voyage.&lt;/p&gt;

&lt;p&gt;Here are the options, I think:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Use &lt;code&gt;&amp;lt;atom:link&amp;gt;&lt;/code&gt; within the &lt;code&gt;&amp;lt;atom:entry&amp;gt;&lt;/code&gt;, with a URL in &lt;code&gt;rel&lt;/code&gt; that indicates the kind of relationship  &lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;atom:link rel=&quot;/link-relationships/assertion-persona&quot;
             href=&quot;/persona/DarwinC01&quot; /&amp;gt;
  &amp;lt;atom:link rel=&quot;/link-relationships/assertion-event&quot;
             href=&quot;/events/BeagleVoyage&quot; /&amp;gt;
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use extension elements within the &lt;code&gt;&amp;lt;atom:entry&amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;persona href=&quot;/persona/DarwinC01&quot; /&amp;gt;
  &amp;lt;event href=&quot;/events/BeagleVoyage&quot; /&amp;gt;
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use child elements in the object&amp;#8217;s XML, referencing the URLs of the related objects&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
      &amp;lt;persona href=&quot;/persona/DarwinC01&quot; /&amp;gt;
      &amp;lt;event href=&quot;/events/BeagleVoyage&quot; /&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use child elements in the object&amp;#8217;s XML, embedding the related objects&amp;#8217; Atom entry or feed&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
      &amp;lt;persona&amp;gt;
        &amp;lt;atom:entry&amp;gt;
          ...
          &amp;lt;atom:title&amp;gt;Charles Darwin&amp;lt;/atom:title&amp;gt;
          &amp;lt;atom:content&amp;gt;
            &amp;lt;persona&amp;gt;
              &amp;lt;name&amp;gt;Charles Darwin&amp;lt;/name&amp;gt;
              ...
            &amp;lt;/persona&amp;gt;
          &amp;lt;/atom:content&amp;gt;
        &amp;lt;/atom:entry&amp;gt;
      &amp;lt;/persona&amp;gt;
      &amp;lt;event&amp;gt;
        &amp;lt;atom:entry&amp;gt;
          ...
          &amp;lt;atom:title&amp;gt;Beagle Voyage&amp;lt;/atom:title&amp;gt;
          &amp;lt;atom:content&amp;gt;
            &amp;lt;event&amp;gt;
              &amp;lt;name&amp;gt;Beagle Voyage&amp;lt;/name&amp;gt;
              &amp;lt;date-range&amp;gt;
                &amp;lt;date&amp;gt;1831-12-27&amp;lt;/date&amp;gt;
                &amp;lt;date&amp;gt;1836-10-02&amp;lt;/date&amp;gt;
              &amp;lt;/date-range&amp;gt;
              ...
            &amp;lt;/event&amp;gt;
          &amp;lt;/atom:content&amp;gt;
        &amp;lt;/atom:entry&amp;gt;
      &amp;lt;/event&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use child elements in the object&amp;#8217;s XML, embedding the related objects&amp;#8217; XML content&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
      &amp;lt;persona&amp;gt;
        &amp;lt;name&amp;gt;Charles Darwin&amp;lt;/name&amp;gt;
        ...
      &amp;lt;/persona&amp;gt;
      &amp;lt;event&amp;gt;
        &amp;lt;name&amp;gt;Beagle Voyage&amp;lt;/name&amp;gt;
        &amp;lt;date-range&amp;gt;
          &amp;lt;date&amp;gt;1831-12-27&amp;lt;/date&amp;gt;
          &amp;lt;date&amp;gt;1836-10-02&amp;lt;/date&amp;gt;
        &amp;lt;/date-range&amp;gt;
        ...
      &amp;lt;/event&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I don&amp;#8217;t think that there&amp;#8217;s any point in using an extension element (#2), given that using &lt;code&gt;&amp;lt;atom:link&amp;gt;&lt;/code&gt; (#1) situates the information in the same place in a more standard way.&lt;/p&gt;

&lt;p&gt;Embedding information (as in #4 and #5) is a good thing because it means fewer requests to the server in order to get some useful information. Providing access to Atom feeds (as in #1, #3 and #4) is a good thing because it means you can get metadata about who created the refenced objects, and additional information about them. So #4 is good, since it does both these things, but I don&amp;#8217;t like embedding Atom in the XML because it&amp;#8217;s a lot of extra weight in the XML (making it harder to read/process).&lt;/p&gt;

&lt;p&gt;In fact, #1, #3 and #5 aren&amp;#8217;t mutually exclusive. It&amp;#8217;s possible to add relevant &lt;code&gt;&amp;lt;atom:link&amp;gt;&lt;/code&gt;s to the metadata, reference the URLs of the other objects &lt;em&gt;and&lt;/em&gt; embed their content at the same time:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;    &amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
      ...
      &amp;lt;atom:link rel=&quot;/link-relationships/assertion-persona&quot;
                 href=&quot;/persona/DarwinC01&quot; /&amp;gt;
      &amp;lt;atom:link rel=&quot;/link-relationships/assertion-event&quot;
                 href=&quot;/events/BeagleVoyage&quot; /&amp;gt;
      &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
        &amp;lt;passenger&amp;gt;
          &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
          &amp;lt;persona src=&quot;/persona/DarwinC01&quot;&amp;gt;
            &amp;lt;name&amp;gt;Charles Darwin&amp;lt;/name&amp;gt;
            ...
          &amp;lt;/persona&amp;gt;
          &amp;lt;event src=&quot;/events/BeagleVoyage&quot;&amp;gt;
            &amp;lt;name&amp;gt;Beagle Voyage&amp;lt;/name&amp;gt;
            &amp;lt;date-range&amp;gt;
              &amp;lt;date&amp;gt;1831-12-27&amp;lt;/date&amp;gt;
              &amp;lt;date&amp;gt;1836-10-02&amp;lt;/date&amp;gt;
            &amp;lt;/date-range&amp;gt;
            ...
          &amp;lt;/event&amp;gt;
        &amp;lt;/passenger&amp;gt;
      &amp;lt;/atom:content&amp;gt;
    &amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We embed the core information for easy access (#5), reference its original URI for more details (#3), and then we may as well add the &lt;code&gt;&amp;lt;atom:link&amp;gt;&lt;/code&gt;s (#1) so that run-of-the-mill Atom readers who have no knowledge about our content can do something useful. We &lt;em&gt;don&amp;#8217;t&lt;/em&gt; get the metadata embedded in the XML, but it&amp;#8217;s retrievable: a client could use the entry as a kind of &amp;#8220;low resolution&amp;#8221; information set, which they can add to by retrieving the &amp;#8220;high resolution&amp;#8221; Atom for the referenced objects, via their URLs, as necessary.&lt;/p&gt;

&lt;p&gt;The problem with using an embedding method rather than a referencing method is that the object model is a graph, not a hierarchy. So you can&amp;#8217;t &lt;em&gt;always&lt;/em&gt; embed an object&amp;#8217;s XML: sometimes you have to only use a reference (#3 without #5) to avoid getting into an endless loop of repeated information. As a publisher, sometimes you might &lt;em&gt;want&lt;/em&gt; to only use a reference, because the information is only tangential to the main subject of the original entry. I&amp;#8217;m imagining that we might serve several different Atom entries for the same object, with different amounts of detail. Maybe.&lt;/p&gt;

&lt;p&gt;As an author, creating this XML, you can&amp;#8217;t include a reference if you&amp;#8217;re constructing XML (either in code or by hand) for new objects because they won&amp;#8217;t have URLs yet. Therefore, for the purpose of &lt;em&gt;creating&lt;/em&gt; objects as defined by the Atom Publishing Protocol, you&amp;#8217;ll use embedded XML (#5) with references to existing objects if necessary. The resource returned will include the references for all the created objects. When updating, you&amp;#8217;ll want to include as little as possible aside from the updated information, I imagine (small updates being less prone to clashes than large ones). &lt;/p&gt;

&lt;p&gt;By the way, I&amp;#8217;m using &lt;code&gt;src&lt;/code&gt; attributes when the information is embedded and &lt;code&gt;href&lt;/code&gt; attributes when the information is purely referenced (or almost purely referenced; the referencing elements might still have some content equivalent to the &lt;code&gt;&amp;lt;atom:title&amp;gt;&lt;/code&gt; element, in the interests of presenting a clickable link).&lt;/p&gt;

&lt;p&gt;So that&amp;#8217;s the plan at the moment, but we&amp;#8217;re open to suggestions. Anybody?&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/54#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <pubDate>Sun, 02 Sep 2007 19:57:36 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">54 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>URL design for searches and queries</title>
 <link>http://www.jenitennison.com/blog/node/47</link>
 <description>&lt;p&gt;Another &lt;a href=&quot;http://www.dehora.net/journal/2007/08/web_resource_mapping_criteria_for_frameworks.html&quot; title=&quot;Bill de hÓra: Web resource mapping criteria for frameworks&quot;&gt;fascinating post from Bill de hÓra&lt;/a&gt;, this time on URL design for resources:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Let&amp;#8217;s take editing some resource, like a document, and let&amp;#8217;s look at browsers and HTML forms in particular, which don&amp;#8217;t a do a good job of allowing you to cleanly affect resource state. What you would like to do in this suboptimal environment is provide an &amp;#8220;edit-uri&amp;#8221; of some kind. There are basically 5 options for this; here they are going from most to least desirable&lt;/p&gt;
  
  &lt;ol&gt;
  &lt;li&gt;Uniform method. Alter the state by sending a PUT to the document&amp;#8217;s URL. The edit-uri is the resource URL. URL format: http://example.org/document/xyz&lt;/li&gt;
  &lt;li&gt;Function passing. Allow the document resource to accept a function as an argument. URL format: http://example.org/document/xyz?f=edit&lt;/li&gt;
  &lt;li&gt;Surrogate. Create another resource that will accept edits on behalf of the document. URL format: http://example.org/document/xyz/edit&lt;/li&gt;
  &lt;li&gt;CGI/RPC explicit: send a POST to an &amp;#8220;edit-document&amp;#8221; script passing the id of the document as a argument. URL format: http://example.org/edit-document?id=xyz&lt;/li&gt;
  &lt;li&gt;CGI/RPC stateful: send a POST to an &amp;#8220;edit-document&amp;#8221; script and fetch the id of the document from server state, or a cookie. URL format: http://example.org/edit-document&lt;/li&gt;
  &lt;/ol&gt;
&lt;/blockquote&gt;

&lt;!--break--&gt;

&lt;p&gt;My current task at work is to look at how to add &lt;a href=&quot;http://www.w3.org/TR/xhtml-rdfa-primer/&quot; title=&quot;W3C: RDFa Primer&quot;&gt;RDFa&lt;/a&gt; to a website that is completely driven by &amp;#8220;CGI/RPC explicit&amp;#8221; URLs. That includes URLs for the resources themselves, by the way, we&amp;#8217;re not even talking about edit URLs here. Take a look at the URL for &lt;a href=&quot;http://www.statutelaw.gov.uk/content.aspx?LegType=All+Legislation&amp;amp;title=wine&amp;amp;Year=2007&amp;amp;searchEnacted=0&amp;amp;extentMatchOnly=0&amp;amp;confersPower=0&amp;amp;blanketAmendment=0&amp;amp;sortAlpha=0&amp;amp;TYPE=QS&amp;amp;PageNumber=1&amp;amp;NavFrom=0&amp;amp;parentActiveTextDocId=3032571&amp;amp;ActiveTextDocId=3032571&amp;amp;filesize=16218&quot; title=&quot;Statute Law Database Legislation&quot;&gt;this page&lt;/a&gt;, for example (this isn&amp;#8217;t the actual website that I&amp;#8217;m working on, but it&amp;#8217;s more or less the same in terms of URL design). The URL is&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.statutelaw.gov.uk/content.aspx?LegType=All+Legislation&amp;amp;title=wine&amp;amp;Year=2007&amp;amp;searchEnacted=0&amp;amp;extentMatchOnly=0&amp;amp;confersPower=0&amp;amp;blanketAmendment=0&amp;amp;sortAlpha=0&amp;amp;TYPE=QS&amp;amp;PageNumber=1&amp;amp;NavFrom=0&amp;amp;parentActiveTextDocId=3032571&amp;amp;ActiveTextDocId=3032571&amp;amp;filesize=16218
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So here I am trying to construct RDF examples, and all the URLs look like this mess. What URI am I supposed to use in RDF to talk about the resource itself, rather than a particular view (table of contents, actual content, etc) of that resource?&lt;/p&gt;

&lt;p&gt;In this case, the thing that identifies the resource in the URL is the value of the &lt;code&gt;ActiveTextDocId&lt;/code&gt; request parameter: you can do&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.statutelaw.gov.uk/content.aspx?ActiveTextDocId=3032571
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and see the same legislation; this could be mapped to a resource-oriented URL such as&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.statutelaw.gov.uk/legislation/3032571
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;very easily. It isn&amp;#8217;t, but it could.&lt;/p&gt;

&lt;p&gt;However, doing that, you do lose some context about what the original search was that led you to this page. In the case of the above URI, the fact that I searched for all 2007 legislation with &amp;#8220;wine&amp;#8221; in the title gets lost. And this is important because the &lt;a href=&quot;http://www.statutelaw.gov.uk/content.aspx?LegType=All+Legislation&amp;amp;title=wine&amp;amp;Year=2007&amp;amp;searchEnacted=0&amp;amp;extentMatchOnly=0&amp;amp;confersPower=0&amp;amp;blanketAmendment=0&amp;amp;sortAlpha=0&amp;amp;TYPE=QS&amp;amp;PageNumber=1&amp;amp;NavFrom=0&amp;amp;parentActiveTextDocId=3032571&amp;amp;ActiveTextDocId=3032571&amp;amp;filesize=16218#breadcrumb&quot; title=&quot;Breadcrumb on legislation page&quot;&gt;breadcrumb&lt;/a&gt; on the page has to take me back to that original search.&lt;/p&gt;

&lt;p&gt;Now, you could argue that this is bad website design: after all, you can navigate back to a search page using *gasp* the &lt;strong&gt;Back&lt;/strong&gt; button, and not doing so just adds unnecessary items to your history. But what about providing &lt;strong&gt;previous&lt;/strong&gt; and &lt;strong&gt;next&lt;/strong&gt; links for navigating through the items found in a search? There, surely, you do need some state information that indicates how we got to this particular item?&lt;/p&gt;

&lt;p&gt;Well, no. When you&amp;#8217;re navigating through the results of a search, the primary resource that you&amp;#8217;re viewing is the &lt;em&gt;collection&lt;/em&gt; of items that have been identified by the search. Even if you&amp;#8217;re just viewing one of the items in that collection, if the collection still matters then that item should be viewed as just a subresource of the collection.&lt;/p&gt;

&lt;p&gt;In this case, the search has three fields &amp;#8212; title, year and (legislation) number &amp;#8212; so the search URL has three parts after the initial one. The general scheme (using &lt;a href=&quot;http://bitworking.org/projects/URI-Templates/draft-gregorio-uritemplate-01.html&quot; title=&quot;URI Template Internet-Draft&quot;&gt;URI template syntax&lt;/a&gt;) is&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.statutelaw.gov.uk/search/{title}/{year}/{number}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So here, I could use&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.statutelaw.gov.uk/search/wine/2007/_
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;as a URL that would give me a list of all the 2007 legislation with any number whose title contained &amp;#8220;wine&amp;#8221;, and then&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.statutelaw.gov.uk/search/wine/2007/_/1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;would show me the first piece of legislation found in the context of that search, with a &lt;strong&gt;next&lt;/strong&gt; button taking me to&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.statutelaw.gov.uk/search/wine/2007/_/2
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;An individual resource itself could occur in many searches, and thus you would get many URLs for the resource in the context of those particular searches, but that&amp;#8217;s OK so long as there is one URI that identifies the resource itself. You should link to the actual resource from the page, of course, both in a &lt;code&gt;&amp;lt;link&amp;gt;&lt;/code&gt; in the header and a &lt;code&gt;&amp;lt;a&amp;gt;&lt;/code&gt; in the body (it feels like there should be a &amp;#8220;this is the real resource&amp;#8221; value for the &lt;code&gt;rel&lt;/code&gt; attribute of &lt;code&gt;&amp;lt;link&amp;gt;&lt;/code&gt; or &lt;code&gt;&amp;lt;a&amp;gt;&lt;/code&gt;, but I don&amp;#8217;t know of one).&lt;/p&gt;

&lt;p&gt;The kind of URL above works fine when you have a fixed number of fields for searches, but what if you&amp;#8217;re doing more complicated searches: something that requires a proper query language? Well, you can stuff a query language into a URL. See &lt;a href=&quot;http://www.25hoursaday.com/weblog/2007/07/13/GoogleBaseDataAPIVsAstoriaTwoApproachesToSQLlikeQueriesInARESTfulProtocol.aspx&quot; title=&quot;Dare Obasanjo: Google Base Data API vs. Astoria: Two Approaches to SQL-like Queries in a RESTful Protocol&quot;&gt;Dare Obasanjo&amp;#8217;s comparison of Google and Astoria APIs for queries&lt;/a&gt; to see what that looks like.&lt;/p&gt;

&lt;p&gt;Alternatively, several years ago Paul Prescod introduced me to the notion that the query itself is a resource &amp;#8212; it&amp;#8217;s something that you&amp;#8217;ll probably want to save and edit &amp;#8212; and can be assigned a unique identifier in the same way as other resources. So you visit&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.example.com/queries/new
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;to create a new query, which gets assigned the URL&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.example.com/queries/4328
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and you can then visit that page to see a list of the results of the query. Unlike with a simple search, the query parameters themselves don&amp;#8217;t get used in the URL: they&amp;#8217;re stored on the server. So you can&amp;#8217;t hack the URL to change the query, but you do have a simple URL that you can easily share with other people if you want to.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/47#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/22">rest</category>
 <pubDate>Sun, 12 Aug 2007 22:00:00 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">47 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>XTech 2007: Thursday 17th May Afternoon</title>
 <link>http://www.jenitennison.com/blog/node/21</link>
 <description>&lt;p&gt;&lt;strong&gt;UPDATE:&lt;/strong&gt; Dare Obasanjo has written &lt;a href=&quot;http://www.25hoursaday.com/weblog/2007/06/09/WhyGDataAPPFailsAsAGeneralPurposeEditingProtocolForTheWeb.aspx&quot; title=&quot;Why GData/APP Fails as a General Purpose Editing Protocol for the Web&quot;&gt;an interesting critique&lt;/a&gt; on using the &lt;a href=&quot;http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-15.html&quot; title=&quot;Atom Publishing Protocol (v15)&quot;&gt;Atom Publishing Protocol&lt;/a&gt; as the basis for general purpose sharing of data in the way that the &lt;a href=&quot;http://code.google.com/apis/gdata/index.html&quot; title=&quot;Google Data API&quot;&gt;Google Data API&lt;/a&gt; does.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Thursday afternoon had a few really interesting talks. I learned about the Google Data API (no longer called gData); Oracle&amp;#8217;s use of XLink to represent relationships between documents, and the requirements that entails; using XSLT to create JSON to use Exhibit widgets; and using XMPP to enhance instant messaging.&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/33&quot; title=&quot;Google Data API (Talk)&quot;&gt;Google Data API&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;Frank Mantek&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;http://code.google.com/apis/gdata/index.html&quot; title=&quot;Google Data API&quot;&gt;Google Data API&lt;/a&gt; is the unified API that Google offers to all its services, such as Google Base, Blogger, Google Calendar, Google Spreadsheets and so on.&lt;/p&gt;

&lt;p&gt;Frank talked about how awful SOAP/WSDL is, in particular how two services developed in different platforms can&amp;#8217;t talk to each other (which one might imagine is rather the point of Web Services). (Later, when challenged by a Microsoft guy about this claim, he revealed that he&amp;#8217;d been a major developer of the SOAP/WSDL stuff at Microsoft, so knew exactly what he was talking about from bitter experience.)&lt;/p&gt;

&lt;p&gt;So the Google Data API is a RESTful API, using the &lt;a href=&quot;http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-15.html&quot; title=&quot;Atom Publishing Protocol (v15)&quot;&gt;Atom Publishing Protocol&lt;/a&gt; with a few additions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extra data model&lt;/li&gt;
&lt;li&gt;querying&lt;/li&gt;
&lt;li&gt;concurrency control&lt;/li&gt;
&lt;li&gt;extra authentication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What this basically means is that you can query any of the Google services using HTTP, and get back an Atom document. The URI can contain queries (the precise nature of which depend on the service; &lt;a href=&quot;http://base.google.com/&quot; title=&quot;Google Base&quot;&gt;Google Base&lt;/a&gt;, for example, uses a single URI request parameter that has a complex internal query syntax), and you get back the feed with the items that you&amp;#8217;d requested. The Atom items themselves have the basic Atom elements, but then a bunch of service-specific elements that provide the extra information you need.&lt;/p&gt;

&lt;p&gt;Listening to this talk I finally got what &lt;a href=&quot;http://www.tbray.org/ongoing/&quot; title=&quot;ongoing&quot;&gt;Tim Bray&lt;/a&gt; was talking about at the &lt;a href=&quot;http://www.xmlsummerschool.com/&quot; title=&quot;XML Summer School, Oxford&quot;&gt;XML Summer School&lt;/a&gt; a couple of years ago: REST gives us verbs and Atom gives us objects and lists of objects. I didn&amp;#8217;t get it before, because, after all, aren&amp;#8217;t all XML documents objects? But I think the point is that Atom has a lot of the mechanics that you need for talking about objects built into it, and the extensibility necessary for adding your own information to it (which is what each of Google&amp;#8217;s services are doing).&lt;/p&gt;

&lt;p&gt;The really interesting part of the talk was where Frank started talking about what the problems (still) are. The problems I noted were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Atom&amp;#8217;s verbose&lt;/li&gt;
&lt;li&gt;Google have to use &lt;code&gt;&amp;lt;category&amp;gt;&lt;/code&gt; to indicate the kind of thing they&amp;#8217;re representing (as opposed to using the document element which is what you&amp;#8217;d do with normal XML documents)&lt;/li&gt;
&lt;li&gt;the &lt;code&gt;rel&lt;/code&gt; attribute is too vague&lt;/li&gt;
&lt;li&gt;they made up their own markup languages, rather than reusing existing standards&lt;/li&gt;
&lt;li&gt;they should be using &lt;a href=&quot;http://en.wikipedia.org/wiki/HTTP_ETag&quot; title=&quot;Wikipedia: HTTP ETags&quot;&gt;ETags&lt;/a&gt; for concurrency control&lt;/li&gt;
&lt;li&gt;they haven&amp;#8217;t got any versioning (eek)&lt;/li&gt;
&lt;li&gt;incremental updates are a problem; they don&amp;#8217;t want to serve the whole Atom feed (to a mobile device) when only a small amount has changed, so what they do is have several feeds, each of which reveals a different part of the information&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/81&quot; title=&quot;From Trees to Graphs: Evolving XML for building enterprise applications&quot;&gt;From Trees to Graphs: Evolving XML for building enterprise applications&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;Ravi Murthy&lt;/h3&gt;

&lt;p&gt;Ravi Murthy talked about the provision for defining links between documents in &lt;a href=&quot;http://www.oracle.com/&quot; title=&quot;Oracle&quot;&gt;Oracle&lt;/a&gt;&amp;#8217;s database, and their consequent requirements. Information Oracle&amp;#8217;s XML database has a file system abstraction (every XML &amp;#8216;object&amp;#8217; has a file path) with access control, versioning, metadata and protocol access. Within an XML &amp;#8216;object&amp;#8217; stored in the database, they use XLink to represent the relationships with other objects. When you export the XML, the XLinks get resolved to create the XML document.&lt;/p&gt;

&lt;p&gt;Using XLink to represent relationships between documents brings a whole new set of constraints that you might want to express in a schema language, or annotations that you can use to describe the links (depending on how you look at it):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;type&lt;/strong&gt; of the linked resource (eg the document element&amp;#8217;s name, substitution group or XSD type)&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;scope&lt;/strong&gt; of a particular reference, similar to the scoping of XSD&amp;#8217;s identity constraints&lt;/li&gt;
&lt;li&gt;That a particular link is &lt;strong&gt;acyclic&lt;/strong&gt; (eg, given an XPath expression, keep evaluating it and make sure you never get back to where you started)&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;kind&lt;/strong&gt; of a link, one of:
&lt;ul&gt;&lt;li&gt;&lt;strong&gt;hard&lt;/strong&gt;: the target of the link must exist, and cannot be deleted while this resource exists (but can be renamed) &amp;#8212; these are similar to links in normal databases&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;symbolic&lt;/strong&gt;: trust the file path specified by the link and only resolve it on demand&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;weak&lt;/strong&gt;: like a hard link, except the target can be deleted, in which case the link becomes symbolic&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;versioning&lt;/strong&gt; of a link, whether it points to the &amp;#8220;current&amp;#8221; version of a resource or a specific version&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These extra constraints are expressed as annotations on the definitions of &lt;code&gt;xlink:href&lt;/code&gt; attributes in XSD schemas for the documents held in the database.&lt;/p&gt;

&lt;p&gt;Ravi also talked a bit about expressing decomposition rules: how an XML document should be shredded when it gets put into the database. They use XPath to specify rules that indicate that particular elements should be placed at a particular filepath.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;I was really flattered in the tea break. Chatting with a guy called &lt;a href=&quot;http://philwilson.org/blog/&quot; title=&quot;Phil&#039;s Blog&quot;&gt;Phil&lt;/a&gt; working at the University of Bath, who politely asked about my presentation, and after I&amp;#8217;d explained how it was all to do with overlapping markup and that kind of hard-core theory he said: &amp;#8220;You don&amp;#8217;t &lt;em&gt;look&lt;/em&gt; like a markup geek&amp;#8221;. Me: &amp;#8220;What, because I&amp;#8217;m a girl?&amp;#8221;. Him: &amp;#8220;No, no, that&amp;#8217;s not what I meant. You just look more Web 2.0-ey.&amp;#8221; &lt;a href=&quot;http://lapin-bleu.net/riviera/&quot; title=&quot;Max&#039;s Blog&quot;&gt;Max&lt;/a&gt; was there at the time, and labelled me &amp;#8220;the Geekess of XSLT&amp;#8221;, which I think clarified things. (Actually most of the people at XTech this year were Web 2.0-ey rather than markup geeks, but I&amp;#8217;m glad I &lt;em&gt;looked&lt;/em&gt; as though I fitted in.) &lt;/p&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/155&quot; title=&quot;XML-powered Exhibit: A Case Study of JSON &amp;amp; XML Coexistence&quot;&gt;XML-powered Exhibit: A Case Study of JSON &amp;amp; XML Coexistence&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://metacognition.info/&quot; title=&quot;Chimezie Ogbuji&#039;s Website&quot;&gt;Chimezie Ogbuji&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;&amp;#8220;What&amp;#8217;s &lt;a href=&quot;http://simile.mit.edu/wiki/Exhibit&quot; title=&quot;Exhibit Wiki&quot;&gt;Exhibit&lt;/a&gt;?&amp;#8221; I hear you ask. Or maybe you&amp;#8217;re more with-it than I am, but that&amp;#8217;s what I was asking. Chimezie never really explained, but I kinda gathered that it&amp;#8217;s a funky AJAX toolset for creating views of data by importing scripts and using magical IDs and extension attributes within web pages. The other phrase that Chimezie dropped in was &lt;a href=&quot;http://www.w3.org/TR/backplane/&quot; title=&quot;Rich Web Application Backplane&quot;&gt;Rich Web Application Backplane&lt;/a&gt;, which again I hadn&amp;#8217;t heard of. Even having read the W3C Note, I still don&amp;#8217;t get it. Ho hum.&lt;/p&gt;

&lt;p&gt;Anyway, Chimezie made the point that while entering data using XForms is great, it&amp;#8217;s too heavy-weight for viewing that data. Exhibit gives a lot more flexibility (take a look at the &lt;a href=&quot;http://simile.mit.edu/exhibit/examples/presidents/presidents.html&quot; title=&quot;US Presidents in Exhibit&quot;&gt;US presidents&lt;/a&gt; example), which enables users to explore data more freely. In Exhibit pages, you provide a JSON schema for your data, a number of lenses/views/widgets that you can use to view the data, then you embed the widgets in the HTML page and point it at the data source. The JSON schema indicates the type of a particular property (eg &amp;#8220;country&amp;#8221;), and gives labels for it (including a plural label (&amp;#8220;countries&amp;#8221;) and a reverse label (&amp;#8220;country of&amp;#8221;)) that it uses in the widgets.&lt;/p&gt;

&lt;p&gt;But that requires JSON, right? Chimezie showed how easy it is (and it&amp;#8217;s &lt;em&gt;really&lt;/em&gt; easy) to transform data-oriented XML into JSON using XSLT.&lt;/p&gt;

&lt;p&gt;You know, there are all these cool ways out there for viewing information, I just wish I had some really meaty data to use them on! &lt;a href=&quot;http://simile.mit.edu/timeline/&quot; title=&quot;SIMILE Timelines&quot;&gt;Timelines&lt;/a&gt; are one thing, but I&amp;#8217;d also love to find some data to employ in &lt;a href=&quot;http://www.gapminder.org/&quot; title=&quot;Gapminder&quot;&gt;Gapminder&lt;/a&gt; or even in an interface like the one for &lt;a href=&quot;http://www.philipglass.com/glassengine/&quot; title=&quot;Philip Glass Engine&quot;&gt;the music of Philip Glass&lt;/a&gt;. Perhaps I should just mine &lt;a href=&quot;http://base.google.com/&quot; title=&quot;Google Base&quot;&gt;Google Base&lt;/a&gt;, but I&amp;#8217;d like it to be something personally or collectively useful.&lt;/p&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/97&quot; title=&quot;Real-time user-to-user web with Mozilla and XMPP&quot;&gt;Real-time user-to-user web with Mozilla and XMPP&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://blog.hyperstruct.net/&quot; title=&quot;Massimiliano Mirra&#039;s Website&quot;&gt;Massimiliano Mirra&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;This talk was strong on motivation &amp;#8212; the requirement to enhance basic instant messaging functionality &amp;#8212; and strong on demonstration, with Massimiliano chatting and playing with a pre-programmed bot, but really weak on the technical details. It was only through the post-talk questions that we learned that what we&amp;#8217;d seen was based on &lt;a href=&quot;http://www.xmpp.org/&quot; title=&quot;XMPP Standards Foundation&quot;&gt;XMPP (the Extensible Messaging and Presence Protocol)&lt;/a&gt;, which allowed DOM events to be passed between clients. Have to read the paper if you want to learn more.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/21#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/19">google</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/22">rest</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/4">xtech</category>
 <pubDate>Sun, 27 May 2007 22:03:24 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">21 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>XTech 2007: Thursday 17th May Morning</title>
 <link>http://www.jenitennison.com/blog/node/20</link>
 <description>&lt;p&gt;On Thursday morning, I was down to chair the first session in the &amp;#8220;Core Technologies&amp;#8221; track. Two interesting papers: one on XForms and one on Google Base. Then I snuck on to the &amp;#8220;Applications&amp;#8221; track to hear about scientific Wikis and the trials of managing schema repositories.&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/114&quot; title=&quot;XForms, REST, XQuery... and skimming&quot;&gt;XForms, REST, XQuery&amp;#8230; and skimming&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://internet-apps.blogspot.com/&quot; title=&quot;Mark Birbeck&#039;s Blog&quot;&gt;Mark Birbeck&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;Mark Birbeck, one of the developers of &lt;a href=&quot;http://www.formsplayer.com/&quot; title=&quot;formsPlayer Website&quot;&gt;formsPlayer&lt;/a&gt; (and an invited expert on the XForms and XHTML WGs), discussed the rationale behind using &lt;a href=&quot;http://www.w3.org/MarkUp/Forms/&quot; title=&quot;XForms W3C Page&quot;&gt;XForms&lt;/a&gt;. The only thing that really stood out for me was the fact that he used an XML document to provide the &lt;em&gt;labels&lt;/em&gt; for the form controls (in just the same way as you can use XML documents to provide the &lt;em&gt;data&lt;/em&gt; in the form controls). That was quite neat, and made me think of the different requirements of data entry and data presentation: a topic that returned in &lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/155&quot; title=&quot;XML-powered Exhibit: A Case Study of JSON &amp;amp; XML Coexistence&quot;&gt;Chimezie Ogbuji&amp;#8217;s talk&lt;/a&gt; later that afternoon.&lt;/p&gt;

&lt;p&gt;Another theme here, for me, was the use of declarative programming: you write a form, which is just some XML and leave all the technical stuff about submitting a PUT HTTP request to the XForms player. Mark talked about using &lt;a href=&quot;http://en.wikipedia.org/wiki/WebDAV&quot; title=&quot;Wikipedia: WebDAV&quot;&gt;WebDAV&lt;/a&gt; and &lt;a href=&quot;http://exist.sourceforge.net/&quot; title=&quot;eXist&quot;&gt;eXist&lt;/a&gt; on the server to store the XML documents, and demonstrated using &lt;a href=&quot;http://www.oxygenxml.com/&quot; title=&quot;oXygen XML editor&quot;&gt;&amp;lt;oXygen/&amp;gt;&lt;/a&gt; to load and save documents. Hmm&amp;#8230; I wonder if I should experiment with XForms and that Unicode database browser I was thinking about&amp;#8230;&lt;/p&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/104&quot; title=&quot;Google Base, a mashups database for the REST of us&quot;&gt;Google Base, a mashups database for the REST of us&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;Jeffrey Scudder&lt;/h3&gt;

&lt;p&gt;A very popular, thought-provoking, and slightly disturbing, talk on &lt;a href=&quot;http://base.google.com/&quot; title=&quot;Google Base&quot;&gt;Google Base&lt;/a&gt;. So Google are asking us to upload data on &lt;em&gt;anything&lt;/em&gt; (jobs, personals, cars, etc.) into their huge databases. And then they&amp;#8217;ll serve us back that information (and other people&amp;#8217;s information) in formats such as &lt;a href=&quot;http://en.wikipedia.org/wiki/Atom_(standard)&quot; title=&quot;Wikipedia: Atom&quot;&gt;Atom&lt;/a&gt;, &lt;a href=&quot;http://en.wikipedia.org/wiki/RSS_(file_format)&quot; title=&quot;Wikipedia: RSS&quot;&gt;RSS&lt;/a&gt; and &lt;a href=&quot;http://www.json.org/&quot; title=&quot;JSON&quot;&gt;JSON&lt;/a&gt;, as well as standard web pages.&lt;/p&gt;

&lt;p&gt;The thought-provoking bit, for me, was the fact that they don&amp;#8217;t have any particular schema for each of these kinds of items. Now, I come from a knowledge engineering background where we&amp;#8217;re very into ontologies and creating conceptual models and all that stuff. But Google don&amp;#8217;t bother: you create categories and structure your data the way you want to, and they&amp;#8217;ll serve it back in that way. But they look at &lt;em&gt;all&lt;/em&gt; the data they have their hands on in order to decide how to display and serve information. So, for example, if I define cars with the property &amp;#8216;shade&amp;#8217; but a hundred other people define them with the property &amp;#8216;colour&amp;#8217; then on a feed that includes all our items, we&amp;#8217;ll see the &amp;#8216;colour&amp;#8217; property.&lt;/p&gt;

&lt;p&gt;This is a kind of bottom-up ontology design: the properties of an item are the properties that other people think are important about an item. One thing that surprised me was that it looks like it&amp;#8217;s not very intelligent yet: simple differences in case (like &amp;#8216;color&amp;#8217; vs. &amp;#8216;Color&amp;#8217;) don&amp;#8217;t seem to be detected, so I guess nothing else is. Time to dig out my old research on automated comparison of ontologies&amp;#8230;&lt;/p&gt;

&lt;p&gt;The slightly disturbing part? Well, Google are trying to get us to upload our data to their servers. And they&amp;#8217;re not putting any limit on how much we upload. One member of the audience asked &amp;#8220;What&amp;#8217;s in it for you?&amp;#8221;; Jeffrey seemed to have a hard time understanding the question and said something like &amp;#8220;Better indexed information means we can give you better information&amp;#8221;, but that doesn&amp;#8217;t really answer the question. Presumably it&amp;#8217;s all about being able to advertise to us better: the more data we upload, the more They know about us, the better targeted Their adverts can be.&lt;/p&gt;

&lt;p&gt;What I found strange was the idea of &lt;em&gt;uploading&lt;/em&gt; data to a &lt;em&gt;central&lt;/em&gt; &lt;em&gt;server&lt;/em&gt;. Surely the whole point of the web is that I put my data on my machine. I don&amp;#8217;t have a problem putting the data together in a nice Atom feed so that Google can index it easily and pointing them at it, but I want to own it, y&amp;#8217;know?&lt;/p&gt;

&lt;p&gt;By the way, one thing that was apparent to me during this talk was how important it is that web pages look good with large font sizes, not just for people with poor eyesight, but also for when you&amp;#8217;re &lt;em&gt;demoing&lt;/em&gt; your cool web applications! The Google Base drop-down menus were impossible to see with increased font size because their height is fixed in pixels.&lt;/p&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/134&quot; title=&quot;An Augmented Wiki for Interactive Scientific Visualization and Evolutionary Collaboration&quot;&gt;An Augmented Wiki for Interactive Scientific Visualization and Evolutionary Collaboration&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://csis.pace.edu/~marchese&quot; title=&quot;Frank Marchese&#039;s Website&quot;&gt;Frank Marchese&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;On to the less well-attended &amp;#8220;Applications&amp;#8221; track. This talk was about supporting scientists (specifically biochemists) in providing side-by-side visualisation (of complex molecules) and textual analysis. Frank talked about a Wiki in which &lt;a href=&quot;http://jmol.sourceforge.net/&quot; title=&quot;Jmol molecule viewer&quot;&gt;Jmol&lt;/a&gt; Java applets for visualising molecules are arranged side-by-side with standard journal articles. The articles themselves have links in them that animate the Jmol visualisation: highlighting particular groups of atoms, moving it to show a particular view, and so on.&lt;/p&gt;

&lt;p&gt;It was kind of neat, as pretty pictures of molecules often are, but I didn&amp;#8217;t think the Wikiness of the whole enterprise was really explored: I got the impression that the textual articles were basically static: you could add comments, but not collaboratively create an article about the molecule. Also, the link between the text and the animation of the molecule was through Javascript, as far as I could tell: I&amp;#8217;d expect a declarative method of defining animations would make it a lot more accessible.&lt;/p&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/176&quot; title=&quot;Real-world metadata registries; sharing concepts, schemas and semantics&quot;&gt;Real-world metadata registries; sharing concepts, schemas and semantics&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://www.ukoln.ac.uk/&quot; title=&quot;UKOLN Website&quot;&gt;Emma Tonkin&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;This talk took me back to the trials of creation of top-down conceptual models, focusing on the definition of metadata schemas. Unfortunately, there was a lot of philosophy and not many practical guidelines in the talk, and I didn&amp;#8217;t get a lot out of it. One thing that Emma touched on, though, was the way that the meaning of a term can change over time, through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extension or generalisation&lt;/li&gt;
&lt;li&gt;narrowing or specialisation&lt;/li&gt;
&lt;li&gt;amelioration (when a term gains approval)&lt;/li&gt;
&lt;li&gt;deterioration or perjoration (when a term gains disapproval)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The latter two are particularly demonstrated by political correctness, whereby terms like &amp;#8220;Eskimo&amp;#8221; fall out of favour and &amp;#8220;Inuit&amp;#8221; becomes more acceptable (all highly culture-specific; see the &lt;a href=&quot;http://en.wikipedia.org/wiki/Eskimo&quot; title=&quot;Wikipedia: Eskimo&quot;&gt;Wikipedia Eskimo page&lt;/a&gt; for more discussion on what term to use).&lt;/p&gt;

&lt;p&gt;The advantage of a principled conceptual model is that the concept itself and the term(s) you use for that concept are loosely coupled, so if a given term falls out of favour or becomes inappropriate, you can always decouple it. On the other hand, bottom-up tagging tends (I think) to have a 1:1 relationship between term and concept, so if the use of terminology changes you might be left with inaccurate tagging of legacy data. Maybe.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/20#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/19">google</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/20">ontologies</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/21">wikis</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/17">xforms</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/4">xtech</category>
 <pubDate>Fri, 25 May 2007 21:34:18 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">20 at http://www.jenitennison.com/blog</guid>
</item>
</channel>
</rss>

