XTech was subtitled “the mobile web”, but one of the major themes for me was that of the distributed web. The first keynote, by Simon Wardley, gave a vision of a future in which hardware, frameworks and applications are services in the cloud rather than products on machines we own: where we use flickr to store our photographs, Google App Engine to host our applications, and Amazon S3 to store our data. In David Recordon’s keynote (written up by Jeremy Keith), he talked about small, specific services provided by sites that aren’t “destination sites”. The same picture was painted by Gareth Rushgrove in his talk on Design Strategies for a Distributed Web.
So I was surprised at how contentious Steven Pemberton’s talk on Why you should have a Website (thankfully again documented by Jeremy Keith) proved to be. Because to me it seemed to be the logical extension to the distribution of hardware, frameworks and application: the distribution of data. In fact, I’ve written about the same idea myself, as has Leigh Dodds, more recently.
From the session, the main question seems to be “how could we do flickr without them holding our data?” I don’t want to particularly pick on flickr, especially because it’s not one of the worst offenders, but the problem of serving and sharing images does illustrate a whole range of issues, so I will use it as an example. I could just as easily be talking about ancestry.com. The way I see it, you need three levels:
(It occurs to me that this is similar to a model/view/controller architecture: the providers give the model, the user interfaces give the views and the brokers control the flow between the two.)
Where flickr is at the moment is a conglomeration of the three: to have your photo appear on flickr, and to gain the advantages that it gives you in terms of tag-based aggregations and social networking, you have to upload it. They are then the provider of the image+metadata (perhaps the only place it is located on the web), the user interface on the image+metadata (the interface through which the image is annotated), and the broker (they provide keyword-based retrieval, for example).
What would it look like to separate those functions?
First, you, as the owner of the image+metadata, could put your data anywhere: on a home wireless network box, on a webserver hosted by an ISP of your choice, on a site specifically designed for hosting photos. Your data is exposed to the larger web through a standard read/write protocol (I’m betting on AtomPub) that allows you to provide metadata both about resources and the links between resources. The point of it being read/write is that it allows other people to add metadata to or links from your resource to others, such as adding a comment on your image.
Second, an information broker would locate your photos by crawling for them (or perhaps by you submitting the URL somewhere, but mostly that shouldn’t be necessary). There are already information brokers around: Google provides a RESTful API for general search results, as does Yahoo!; at XTech, Richard Cyganiak talked about Sindice, and Aidan Hogan about the Semantic Web Search Engine, both of which crawl for RDF triples and provide an API for querying the results. In an AtomPub-based environment, you’d want an information broker that located Atom feeds and resources, indexed them, and provided an AtomPub-based API for publishers to use.
Third, a user interface would provide an attractive and usable front-end that brought together many different sets of information. For example, flickr might combine your friends feed with an image search to provide a view of images recently made available by your friends. There’s no requirement for your friends to use flickr for this to work: flickr queries a broker for a list of your friends, then queries a broker for images by a particular person, the broker searches its index and points the application to the original resources that are provided by your friends.
A user interface has another role, though: to add to the web. Flickr wants to make it easy to add tags to photos, to create sets and collections that help you navigate your photos, for others to add comments and so on and on. And that’s fine, because AtomPub is a read/write API. To add a tag to a photo, flickr simply edits the resource with PUT. To add a comment, it locates the comment feed (which would be referenced from the entry for the particular image) and POSTs to create a new resource. And everyone can see those changes — the added value that you get from a social network.
None of this is to say that a single application can’t act as provider, broker and publisher at the same time, but I’m certain that users will favour those applications that do all of each role: provide to the whole web, broker the whole web, provide a user interface to the whole web. Flickr is almost there, but it doesn’t do the whole brokering job because it only brokers the data it provides, and therefore it doesn’t provide the whole user interface job.
This distributed web is a clear win, particularly for users, over walled gardens. They can switch from user interface to user interface, even use more than one at a time (perhaps one application is good for browsing while another is good for categorising), without any cost. They can choose who to use to serve their information on the basis of things that matter when you’re serving information (low downtime, backups, security, etc.) rather than on how pretty an interface looks or how much functionality it gives you. On the other side of the equation, applications get to do one thing and do it well.
It seems to me that this is simply how the web works, and the questions we should be asking are about privacy and trust and licensing and revenue models and standards development.
Comments
Re: The Distributed Web
“And that’s fine, because AtomPub is a read/write API. To add a tag to a photo, flickr simply edits the resource with PUT. To add a comment, it locates the comment feed (which would be referenced from the entry for the particular image) and POSTs to create a new resource. And everyone can see those changes — the added value that you get from a social network.”
There’s another way it might (also) work — a site like flickr could provide the “tag environment” (i.e. metadata framework) for your photos. So they still live on your own server space and even expose their own atom metadata. But instead of flickr editing (put-ing to) a resource on the server, it provides a new feed or set of feeds for your stuff. It’s just like the current situation, but flickr saves a URI instead of an image. This is exactly what happens with a site like REDDIT — pages on the web are re-contextualized in the REDDIT setting. A set of images may have numerous “contexts” in which to be viewed and understood. Even if we move towards a canonicalized digital object, there will be little need for canonicalized metadata — an object (identified by it’s URI) will live many lives simultaneously. Atom/AtomPub will provide a means by which these contexts can “converse”.
In my own experience in the university setting, I see the “institutional repository” as providing little more than a persistent URI for a paper, data set, image, etc. and perhaps some boilerplate metadata. These assets may appear in various contexts on the web: faculty member’s departmental web site, a personal web site, a discipline-specific web site, etc. and may have different metadata in each case. There may be a means by which some of that metadata flows back to the institutional repository (atom/atompub), but it may or may not — and the owner of the various sets of metadata may or may not be the same as the owner of the digital asset.
—peter keane UT Austin
Re: The Distributed Web
Yes, it could certainly work like that. As long as flickr (or whoever) made those feeds available to the wider web, there is hardly any difference between new comment resources being hosted on your own site and them being hosted on a separate site.
So in the Atom Threading Extensions, you can either point from a resource to a comment feed with
repliesor point from a comment to the resource within-reply-to. The only difference between the two is that it’s slightly harder to identify all the comments on a resource if they’re distributed, but that’s precisely what brokers are for.