I wrote this earlier in the year but for one reason or another never posted it.
I was at the excellent ORGCon yesterday. I was speaking about open data in my capacity as Technical Director at the Open Data Institute but I stayed for the rest of the conference in my capacity as an interested UK citizen. It is in the latter capacity that I write this.
When my 9-year-old daughter asked me what ORGCon was about, I explained that it was about the rights that we have when we are online. With the recent revelations about Prism in my mind, I asked her what she thought about people being able to read the emails or listen in on the Skype chats of people they thought might do harm.
She replied that she thought it was good, that it made her feel safe to think that the communications of “cyber bullies” (these are the most concerning “bad people” she thinks of) would be watched (and, presumably, punished if they were caught bullying).
From recent conversations I’ve had, this feels like a common reaction. I’d venture that you don’t need much technical knowledge or much paranoia to believe, when you stop and think about it, that the NSA and GCHQ can get access to much of our online communication. We shrug because we assume that the people they will want to listen in on are the baddies. And if it helps to stop those baddies doing bad things, so what?
In his opening keynote, Tim Wu talked about the need for us to have a visceral sense of our online rights — of our ownership of our data, of our privacy — in order to protect those rights.
Reflecting on my daughter, I’m struck that she does have a visceral sense of privacy, but it is not about privacy from the (benevolent, protecting) state. It is privacy from her (nosy, interfering) sister.
And I feel the same. I don’t have a visceral reaction to the NSA or GCHQ having access to my online (private) communication, but I certainly have that reaction when I think about it being seen by my coworkers, my friends, my family. This isn’t because of anything in particular that I’m worried about being discovered, just because I’d prefer to have some control over what I expose to the people with whom I interact most.
In his closing keynote, John Perry Barlow spoke about growing up in a town where everyone knew everything about everyone else, but no one brought up the past because everyone had skeletons. A type of mutually assured destruction. He said something I often hear the over-30s saying, that the young people growing up with Facebook are not concerned about their privacy.
If you read the focus group responses from the Pew Research Center’s Internet & American Life Project, you can see that this isn’t true. Teenagers might be sharing a lot of information through Facebook, but there are definitely some people they don’t want to see it (my emphasis):
[Friending my parents] sucks… Because then they [my parents] start asking me questions like why are you doing this, why are you doing that. It’s like, it’s my Facebook. If I don’t get privacy at home, at least, I think, I should get privacy on a social network.
In the open data world, we worry about publishing information about people who can be identified from that information. The ICO Anonymisation code of practice talks about a ‘motivated intruder’ test:
The ‘motivated intruder’ is taken to be a person who starts without any prior knowledge but who wishes to identify the individual from whose personal data the anonymised data has been derived. This test is meant to assess whether the motivated intruder would be successful.
The approach assumes that the ‘motivated intruder’ is reasonably competent, has access to resources such as the internet, libraries, and all public documents, and would employ investigative techniques such as making enquiries of people who may have additional knowledge of the identity of the data subject or advertising for anyone with information to come forward. The ‘motivated intruder’ is not assumed to have any specialist knowledge such as computer hacking skills, or to have access to specialist equipment or to resort to criminality such as burglary, to gain access to data that is kept securely.
The ‘motivated intruders’ that most people will be concerned about are those who are already known to them. (The main exception would be those people who for whatever reason have come to the attention of the press.) They are the suspicious spouse, the nosy neighbour, the interfering parents, the jealous colleague.
When I think about those people I know who have felt their privacy has been infringed, it has always been by people they know, behaving in extreme ways out of a desire to retain or regain control.
But those engaging on campaigns of harassment do not need additional personal information to make their target feel exposed. They just need to demonstrate knowledge of something.
“I know where you live.”
“I can hear you.”
It doesn’t have to be important. It doesn’t have to be private. The goal is to demonstrate to their victim that they are being monitored, constantly: their movements watched, their tweets read.
To invoke a visceral sense of your right to privacy, think of your friends and family reading your messages. To test anonymisation, think of a suspicious spouse aiming to prove infidelity. Even information we would never think of as private can be used against us.
It isn’t the state’s knowledge we fear, it’s that of those who already know us.
The new membership of the W3C’s Technical Architecture Group (TAG), and some of the recent discussions on the TAG list about polyglot markup, have made me think about what the TAG should stand for and the role the TAG should play.
Fundamentally, the web is for everyone, whatever gender, whatever race, whatever sexual orientation, whatever visual or mental ability and so on. The web community should fight to keep the web open to all. And it should try to be a community that is open to all.
With same-sex marriages shortly being voted on by the UK Parliament, I have been struck, reading the recent threads about polyglot, how similar the arguments against polyglot seem to those used against homosexuality:
- claiming that there is no use for polyglot, in the face of those who say they have use for it, is similar to denying that homosexuality exists, in the face of people saying that they are homosexual
- stating that you can see no use for polyglot and therefore no one else should use it is similar to saying that since you are heterosexual, everyone else must be too
- claiming that creating a Recommendation that describes polyglot will make people use it is similar to saying that talking about homosexuality will make people gay
- saying that you don’t want to implement polyglot in a validator or editor is similar to being a priest who declines to marry gay people
By this comparison, those who argue that polyglot must be the only output anyone generates also has an analogy: someone arguing that all churches must marry only gay people.
I want the web community to be a fair and good society. To me the question about whether there should be a polyglot Recommendation is just the latest example of a need to ensure that our community is equitable.
Just as in wider society, we need to find compromises that balance the needs and desires of different constituencies. We need to balance the rights that everyone has to code as they wish against the rights that everyone has to have a web that works. We need to make sure that the quiet voices are heard, and support the equal rights of those who tread the less worn paths.
When there are conflicts between technologies, developers necessarily think of them in terms of which they would use, given their experience, expertise, environment and so on. I think the TAG needs to judge technologies in a different way. We have to consider the extent to which standardising their use disrupts the fabric of the web and prevents others from operating as they wish to. And, because we want a web that is fair and open and free, if there is no or minimal risk to the fabric of the web, and it does not overly constrain how others act, I believe we should err on the side of supporting diversity.
Making and expressing these judgements is just one of the things that I hope the newly formed TAG will manage to do better.
The purpose of the group is to act as an “intelligent customer” to the government on the release of open data. This is a bit of a misnomer, as the word “customer” implies that the group will in some way buy data that should be made open, which it’s unlikely to do. Perhaps “intelligent consumer” would be more appropriate: our task is to advise the government about which data should be opened up, and (if the commitment has already been made to open it) which should be opened first or how access to it could be improved.
One of the tasks that we face, particularly for datasets that are currently being sold by government (mostly from the Public Data Group: Met Office, Ordnance Survey, Land Registry and Companies House), is making a strong economic argument for opening up data. To do that, it’s useful to understand two things:
- the ways in which open data can be used in the wider economy, to aid innovation, growth and thereby lift the country out of the economic doldrums
- the business models that are being used by open data publishers to support open data releases, to illustrate the benefits that they can bring to publishers themselves
During my keynote at XML Prague (the video might make more sense than the slides on their own; there are notes on the slides but Slideshare doesn’t do well with Keynote), I talked about how the advantages of using chimeras created from two formats with different underlying models are seldom outweighed by the disadvantages. RDF/XML gets knocked so frequently it’s not even much fun to do it any more, but I’ve applied the same arguments to JSON-LD in the past. My argument was that RDF, XML, JSON and HTML should each be used individually for their strengths rather than trying to find a middle ground that rarely satisfies anyone.
Leigh Dodds’ post on principled use of RDF/XML makes the point that RDF/XML can be useful when it is used in a regular, principled way. And in fact, I am using RDF/XML extensively in my work on Expert Participation for legislation.gov.uk, though slightly differently from how Leigh describes. What I want to explore in this post is when and how it makes sense to use RDF/XML and how that might translate into usage of JSON versions of RDF. The key point I want to make is that RDF chimera are roads, not destinations, and when you’re choosing a road you have to think about the destination you’re aiming for.
As part of the TAG’s work on httpRange-14, Jonathan Rees has assessed how a variety of use cases could be met by various proposals put before the TAG. The results of the assessment are a matrix which shows that “punning” is the most promising method, unique in not failing on either ease of use (use case J) or HTTP consistency (use case M).
In normal use, “punning” is about making jokes based around a word that has two meanings. In this context, “punning” is about using the same URI to mean two (or more) different things. It’s most commonly used as a term of art in OWL but normal people don’t need to worry particularly about that use. Here I’ll explore what that might actually mean as an approach to the httpRange-14 issue.