Schema.org and the Responsibility of Monopoly

Update: This post has been translated to Italian on the Linked Open Data Italia blog.

In this post about schema.org I’m going to speculate about the economic drivers that affect how search engines use structured metadata on the web. I discuss how the technical features and choices within schema.org may cause wider long-term harm, and the role of open standards as a method for responsible companies to avoid the pitfalls of monopoly.

Lessons for Microdata from schema.org

There is (obviously, from the way my tweet stream, feed reader and email have filled up) lots to say at many levels about schema.org, a new collaboration between Google, Microsoft and Yahoo! that describes the next phase in search engines’ extraction of semantics from web pages. In this post I’m going to focus on what we can learn from schema.org about the design of microdata and how it might be improved.

Three challenges for alpha.gov.uk

The new alpha.gov.uk website was launched recently, as a prototype for the “single Government website” described in Martha Lane Fox’s report Directgov 2010 and Beyond: Revolution Not Evolution. Apparently the real deal could go live “in about a year”.

The site is lovely, a far cry from the standard government fare. But this isn’t exactly surprising: it’s been developed using modern technologies by a top team with a set of design rules far removed from those usually applied to government websites, a budget that’s not exactly tight and using an Agile methodology. These factors mark it out from the majority (though not all) government websites. And this is part of the point, to illustrate the gap between what we have and what a revolution could bring.

There are three challenges where I am and have been particularly interested to see the alpha.gov.uk approach. These are in balancing:

  • simplicity and complexity
  • centralisation and distribution
  • end-user and data re-user

It is not currently clear to me whether alpha.gov.uk has decided an approach on any of these — whether the way the site works currently is the way that they have decided it should work — or whether these are areas that are still up in the air at the moment. I’m hoping it’s the latter.

Hash URIs

There’s been quite a bit of discussion recently about the use of hash-bang URIs following their adoption by Gawker, and the ensuing downtime of that site.

Gawker have redesigned their sites, including lifehacker and various others, such that all URIs look like http://{domain}#!{path-to-content} — the #! is the hash-bang. The home page on the domain serves up a static HTML page that pulls in Javascript that interprets the path-to-content and requests that content through AJAX, which it then slots into the page. The sites all suffered an outage when, for whatever reason, the Javascript couldn’t load: without working Javascript you couldn’t actually view any of the content on the site.

This provoked a massive cry of #FAIL (or perhaps that should be #!FAIL) and a lot of puns along the lines of making a hash of a website and it going bang. For analysis and opinions on both sides, see:

While all this has been going on, the TAG at the W3C have been drafting a document on Repurposing the Hash Sign for the New Web (originally named Usage Patterns For Client-Side URI parameters in April 2009) which takes a rather wider view than just the hash-bang issue, and on which they are seeking comments.

All matters of design involve weighing different choices against some criteria that you decide on implicitly or explicitly: there is no single right way of doing things on the web. Here, I explore the choices that are available to web developers around hash URIs and discuss how to mitigate the negative aspects of adopting the hash-bang pattern.

Getting Started with RDF and SPARQL Using Sesame and Python

My previous post talked about how to install 4store as a triplestore, and use the Ruby library RDF.rb in order to process RDF extracted from that store. This was a response to Richard Pope’s Linked Data/RDF/SPARQL Documentation Challenge which asks for documentation of how to install a triplestore, load data into it, retrieve it using SPARQL and access the results as native structures using Ruby, Python or PHP.

I quite enjoyed writing the last one, so I thought I’d try again. As before, I am on Mac OS X, but this time I’m going to use Python, which I have not programmed in before. I like a challenge. You might not like the results!