TAG F2F, June 2011

This post was imported from my old Drupal blog. To see the full thing, including comments, it's best to visit the Internet Archive.

As you may know, I accepted an appointment to the W3C’s Technical Architecture Group earlier this year. Last week was the first face-to-face meeting that I attended, hosted in the Stata Center at MIT. As you can tell from the agenda (which was in fact revised as we went along) it was a packed three days.

What I intend to do here is to briefly report on the major areas that we discussed and give a tiny bit of my own personal take on them. In no way should any of what I write here be judged as revealing the official opinion of the TAG, it’s just me saying what I think, and I’m not going to go into anything in depth because they’re all incredibly gnarly and contentious topics and I’d not only be here all year but also end up in a tar pit.

Role of the TAG

Usefully for me as a newcomer, our first session was about the ongoing role of the TAG. The TAG occupies a unique position within the W3C. According to its charter it was set up

To improve the effectiveness of Working Groups, to reduce misunderstandings and overlapping work, and to improve the consistency of Web technologies developed inside and outside W3C

The TAG ultimately has three routes to do this:

by providing specific advice on issues that are brought to its attention
by writing documents on basic web architecture principles that go through community review, particularly through the general review of the W3C standards track and become Recommendations
by advising the W3C Director (Tim Berners-Lee) about what he should do on the extremely rare occasions when there are issues that he is supposed to adjudicate on

In none of these cases is there anything that binds the people receiving the advice of the TAG, or reading Findings or Recommendations made by the TAG, to accept them or do anything about them. The power and authority of the TAG depends solely on the quality and utility of its arguments, which is how it should be in my opinion.

Client-Side Application State

The first technical session was about client-side application state and was a review of the Identifying Application State draft that T.V. Raman began before he left the TAG and that Ashok Malhotra has been working on since. This should in the next few months or so be published as a TAG Finding (though it is currently on the Recommendation track).

This work is essentially about documenting the different ways in which you can identify application state within a URI, why that’s a useful thing to do, and some of the pitfalls of using hash URIs to do so. Most of the discussion was about details to do with wording within the document. One thing I thought particularly interesting was the point that URI-based application state is relevant in all ‘active content’, not just in HTML; for example, scripting in SVG or in PDFs bring the same considerations.

Buffer Bloat

Over lunch on Monday we listened to and discussed a presentation by Jim Gettys on buffer bloat. Basically (and all the errors here are introduced by me), TCP/IP is designed to route around network blockages, but it can only do so if it detects them quickly. When you have big buffers in place, as in the case of all modern operating systems and hardware, blockages aren’t detected quickly; they’re only detected when the buffers fill up. Then buffers empty and the data has to be sent again. The net result is that connections get really slow, not just for upload or download but for both, not just for you but for everyone using the network.

Jim talked about how this is exacerbated by the large amount of web traffic and the design of HTTP, particularly the lack of use of HTTP pipelining (whereby several HTTP requests and responses are sent over one long-term connection), because it leads to lots of small messages which can’t be handled effectively. There’s lots more about this on his blog.

Jim also talked about the failure of certificate authorities and how we should be looking at distributed protocols using digitally signed data, pointing us in particular to CCNx.

Fragment ID Semantics

First thing Tuesday was a session that I led on fragids, in particular the problems that are arising out of the mime type registration of +xml types (3023bis) clashing with those that are used for, say, images, and what happens when these come together in something like SVG.

The same issues arise whenever you have documents with types that ‘inherit’ fragid semantics from two directions. For example, XHTML documents are XML documents, so constraints on +xml mean you shouldn’t use interpreted fragids (eg hash-bangs) on them, but they are also ‘active content’ which makes interpreted fragids useful. Similarly, in linked data you shouldn’t really use a hash URI to mean a Person with a primary resource that provides as a response an XML document with embedded RDFa, because according to XML fragid semantics, such a URI should point to an XML element.

Basically the use of fragids has grown markedly outside their original scope and these situations aren’t really covered in the specs. I am now tasked to create a document that describes the issues and suggests ways forward. So that will be fun.

Telcon with IAB

The second session on Tuesday was a telcon with the IAB which has a similar role within the IETF as the TAG does within the W3C. This was a bit of a ‘getting to know you’ session, covering the work of the two groups on:

versioning and extensibility
security
privacy, including Do Not Track

and talking about opportunities to meet and work together on various topics like these.

URI Definition Discovery and Metadata Architecture

The afternoon session on Tuesday was spent on Jonathan Rees’s work on the Architecture of the World Wide Semantic Web, which covers, amongst other things, what people in semantic web circles call httpRange-14. At core, this is about the kinds of URIs we can use to refer to real-world things, what the response to HTTP requests on those URIs should be, and how we find out information about these resources.

Jonathan has put together a document called Providing and discovering definitions of URIs which covers the various ways that have been suggested over time, including the 303 method that was recommended by the TAG in 2005 and methods that have been suggested by various people since that time.

It’s clear that the 303 method has lots of practical shortcomings for people deploying linked data, and isn’t the way in which URIs are commonly used by Facebook and schema.org, who don’t currently care about using separate URIs for documents and the things those documents are about. We discussed these alongside concerns that we continue to support people who want to do things like describe the license or provenance of a document (as well as the facts that it contains) and don’t introduce anything that is incompatible with the ways in which people who have been following recommended practice are publishing their linked data. The general mood was that we need to support some kind of ‘punning’, whereby a single URI could be used to refer to both a document and a real-world thing, with different properties being assigned to different ‘views’ of that resource.

Jonathan is going to continue to work on the draft, incorporating some other possible approaches. It’s a very contentious topic within the linked data community. My opinion is while we need to provide some ‘good practice’ guides for linked data publishers, we can’t just stick to a theoretical ideal that experience has shown not to be practical. What I’d hope is that the TAG can help to pull together the various arguments for and against different options, and document whatever approach the wider community supports.

Can publication of hyperlinks cause copyright infringment?

The first session on Wednesday was another session that I led, discussing the Publishing and Linking on the Web draft that Dan Appelquist and I have been working on.

The aim of this document is to explain the tensions between terms that are commonly used in legal documents such as “possession”, “adaptation” and “distribution” and the way that publication works on the web, in which multiple servers may have copies of the same document (because they cache copies to make the ‘net go faster), automated agents may make changes to those documents (such as compressing or resizing documents, or merging Javascript) and people may refer others to those documents through linking.

We’re particularly keen to argue that linking to something is not the same thing as distributing it. The web’s power arises through its links, so it’s important that people are able to link to something without being worried about what happens when/if the domain they link to is taken over by something illegal.

Dan and I are going to continue to work on this document in response to various suggestions around organisation and terminology, with a view to getting some ‘friendly legal experts’ to look it over and then seeking wider review. The intention is for it to eventually become a Recommendation as this will give greater weight to it as a document for a legal audience.

API Minimisation and Client-Side Storage

There were then a couple of short sessions.

Dan talked about API Minimisation, which is the design principle that to increase privacy we should design APIs that enable people requesting information to say exactly what information they need, and return only that rather that everything known about a think. Dan’s put together an draft and should be calling for review for it soon.

Ashok then led discussion on client-side storage and we brainstormed around some of the architectural/design issues about which we might want to write if we were to put together a document. This work is at a very early stage.

TAG Priorities

After lunch, we had a session on TAG priorities where we discussed which of the various pieces of work that we’re doing should receive the most attention and had a quick review of who is doing what within the TAG.

Our basic problem is that a lot of this stuff feels quite urgent, and we want to be responsive, but with only 5-6 of us “actively involved” (which means 1 day/week) in drafting documents, and other TAG duties taking up our time, it feels like we have taken on too much work. Our focus for the next little while is going to be on responding to issues where our lack of response might either hold people up or cause longer term problems (for example the publication of contradictory mime type definitions), which means things like the document on publishing and linking on the web will need to bubble in the background rather than being the focus of activity.

HTML5 Last Call

Our final session, for which we were joined by Philippe Le Hégaret, was on the HTML5 Last Call documents. The TAG has raised various issues over the course of HTML5 development and want to follow up on how those issues have been addressed in the documents. Our role means that we’re responsible for making sure there’s consistency with other specifications, and that there isn’t anything that seems like it’s going to cause problems in the long term.

The part that we spent most discussion time on was the relationship between Microdata and RDFa. We talked about the precedents for having two specifications that do very similar things but with different approaches, such as CSS and XSL, and how this isn’t necessarily a bad thing so long as they don’t contradict each other and people can move between them easily (because they have the same conceptual foundations).

I’m going to save my opinion on this topic for another post. Suffice it to say that microdata and RDFa as currently specified don’t work well with each other but it’s not at all clear what the best path forward is. The TAG decided to recommend that the W3C set up a Task Force to look at what the best way forward might be.

Final Words

If you want links to the minutes of the TAG F2F, they’re available within the agenda or on separate pages for:

If you have anything to say on any of these topics, please send email to the TAG mailing list. Or you could comment here or email me directly if you like. Which leads me on to talking about what I’d like to do in the TAG.

One of the guidance notes for new members to the TAG says:

TAG members are elected or appointed not to represent their individual member organizations, but the Web community as a whole. We try to take that responsibility very seriously.

I do take that responsibility seriously. Web architecture has to be a combination of practice and theory, balancing approaches that work right now with a desire to not break anything long term. I do practical work developing web applications with HTML, CSS, Javascript, XML, RDF, XSLT, XQuery and so on and so on every day, but I know I don’t see all the difficult corners of the open web standard space: no one person can.

I can listen though, so that’s what I will try to do: listen, digest, reflect and act.

But I have limited resources. Unlike most of the members of the TAG, I am not employed by a large organisation that pays me for time I take on the work that I do for the TAG. The W3C kindly paid for my flights to and from F2Fs, but not hotels or expenses. I wouldn’t have taken this on if I wasn’t prepared to shoulder the financial burden, but if there is anyone out there who might sponsor my participation, I’d love to hear from you.