This post was imported from my old Drupal blog. To see the full thing, including comments, it's best to visit the Internet Archive.
I’ve been talking about URIs a lot recently. One of the things that has bothered me about some of the conversations is the conflation of the concepts of “opaque URIs” and “non-human-readable URIs”. This is my argument for keeping the concepts separate.
The opacity of URIs is an important axiom in web architecture. It states that web applications must not try to pick apart URIs in order to work out information from them. Applications must not, for example, use the fact that a URI has
.html at the end to infer that it resolves to an HTML document. It’s closely related to hypertext as engine of application state, in that opaque URIs should not be generated by web applications either: they must be discovered through links and the submission of forms.
But this has nothing to do with readability or hackability, both of which are extremely important for human users. Readable URIs help human users understand something about the resource that the URI is pointing to. Hackable URIs (by which I mean ones that people might manipulate by altering or removing portions of the path or query) enable human users to locate other resources that they might be interested in.
Before I go further, a couple of caveats:
I am not saying that every URI must contain a natural language identifier. An example is the URI for a school, which could include:
- the name of the school
- the unique reference number for the school
- the record number for the school in the database that is being published on the web
Using the name of the school, as I’ve discussed, is probably a bad idea because of its lack of longevity. Using the record number for the school within the particular database that’s being published is entirely non-human-readable because there is simply no way of finding out what that would be for a given school. The unique reference number for the school, on the other hand, may be an obscure series of digits, but it is a meaningful one which renders the URI readable and hackable.
There are also times when uniquely identifying a resource using natural identifiers within the URI leads to incredibly long and complex URIs, in which case the ‘human readable’ version isn’t actually human readable. Introducing non-human-readable components is then the only option.
Back to my argument:
Why should URIs support humans doing things that applications must not? Because humans are intelligent. When humans hack a URI, they are aware that they are making a guess, taking a chance and might or might not end up at something useful. If they get a 404, or even more importantly if they get to information about something that they weren’t expecting, they are intelligent enough to recognise that the chance they took didn’t pay off. Applications aren’t intelligent. They can’t tell the difference between a right guess and a wrong guess, so it’s best not to let them guess at all.
Let me give an example. Let’s say that I’m creating a URI for a particular house. Here are two possible URIs:
The first is readable and hackable. A human could change the house number or the postcode. They could remove the house number and expect a list of houses within the postcode. The second is not readable or hackable: there is no way to know what you would get if you changed the identifier within the URI.
Now it is true that an application accessing a site that used the URIs like the first could create those URIs programmatically whereas it couldn’t (perhaps) create a URI like the second. But if it did create the URIs programmatically it would be the fault of the application, not the fault of the URI.
As publishers, it is our responsibility to provide humans URIs that are meaningful and hackable, and to provide applications with the means of creating or identifying these URIs through forms and links. But it is not our responsibility to prevent applications from doing things that they should not do by deliberately obfuscating our URIs.