I was at UKGovCamp yesterday for I think my 11th(?) year. Massive thanks to James, Amanda and all the rest of the campmakers for making it a brilliant day.
ODI were a sponsor and there were a bunch of us around. For the first time, I didn’t pitch myself. I was really glad that I encouraged others to instead. I only went to four sessions (rather than five). These are just some of my random thoughts following them (I’m not trying to represent everything that was said; I’ve linked to the notes from the sessions so you can read those if that’s what you want).
A session about the thing I spend my day job doing: working out how to build, or persuade others to build, a better data infrastructure.
Infrastructure is boring. Despite the fact that government maintains so much of our physical infrastructure and understands how to invest in it, it doesn’t understand the link between the services, analysis, visualisations it wants and the data infrastructure that lies beneath. We need to motivate investment in data infrastructure through pointing at the more flashy, sexy, immediate stuff it enables (the websites, the apps). Think about the people who had to demonstrate why we need power lines or sewers or motorways. It’s not for their own sake, it’s to provide light, have flushable toilets, get around the country. We can’t just work on the data infrastructure level.
The flashy and sexy stuff like AI enabled services funded through Govtech catalysts rely on data infrastructure. You can’t expect those efforts to succeed if the data isn’t there to support them. So you can’t just work at the service layer either.
Building data infrastructure through delivering digital services is an art, a discipline, a cultural shift. Why doesn’t every digital service have an API? Why don’t service developers and designers think about all-of-government or even all-of-society needs as well as the immediate needs of their direct users?
We didn’t talk in the session but I had more fundamental conversations around UKGovCamp about government’s attitude to data. There is a reversion from some quarters to the attitude of 10-15 years ago around how to get value from government’s data. If we Brexit, if the Reuse of Public Sector Information Regulations are repealed, there is a real risk of going back to the idea that government should sell access to data it holds. I’m worried.
I love spreadsheets and tabular data. Geek heaven. I spent the session occasionally pointing to things that already exist like Datasette.
We depend so much on spreadsheets for managing data, and they are both extremely well and extremely badly suited to the purposes we put them to, which are manifold. There are also many reinventions, with services like Airtable or Smartsheet, but they’re proprietary and come with risks about portability should the services fail.
Sometimes spreadsheets are used to collaborate in ways where people really need to stick to a schema handed down from on high. But what people really like about spreadsheets (as opposed to databases) is the ease of adding columns to suit their needs. But then too much flexibility breaks applications that have to ingest all the additional data but only care about some of it. It feels to me as if a transclusion mechanism - bringing data from one spreadsheet into another, such that editing it is reflected in the original but you can also add columns that won’t be reflected back to the original - could be a way through this tension.
It’s so powerful to be able to collaborate on the same data as others, as in Google Sheets, rather than passing Excel files back and forth by email. But not all data is shareable or designed to be shareable, and the ability to have space to add your own stuff without explanation is useful too.
I didn’t speak in this session. Although I have thoughts I have no settled Opinions and a niggling sense of unease.
I tried to explain this session, and how I felt about this session, to my 15yo daughter. She’s been learning about feminism for her sociology GCSE and said that it reminded her of the characterisation of radical, liberal and Marxist feminism she’s been learning about. Digitalists probably all agree that the web (and all it entails) is changing society and government has to change too. But while some radical digitalists believe that requires a wholesale reinvention of how government works, I think there are some important pieces of our current system that we should preserve.
I think trusted institutions are important. I think removing their identities and means of expressing their identities undermines them. It leaves us without things we can rely on. That’s scary and people who are scared are “not their best selves”, as the modern euphemism appears to be. Can we preserve institutions, grow people’s trust in them and reimagine the relationship between government and people?
I am afraid of forgetting the large percentage (not a majority, but a large percentage) who through choice or circumstance do not have smartphones or broadband or a working knowledge of how to interact with digital anything. I am afraid of digitalism that facilitates an essentially inhuman and exclusive government. I am afraid of digital supremacy.
It is ok for us to disagree about this. We should be disagreeing. In the session we talked about stories and visions and sci-fi for government. We should describe the futures we want to see, and we should critique the ones others describe, explain our fears, bang our drums. That’s how we’ll get there.
But these are essentially political questions, questions about the role of the state, of communities, of individuals, of the private sector, of the media, of academia. And they highlight questions about the role of politicians and of civil servants, or even of policy and delivery if you like, and the relative power they wield, and the level of accountability and governance there is around them. Has digital changed that power balance? Should it? I know a lot of brilliant civil servants I would trust entirely but I’m not sure I want to live in a technocracy.
I am all for discussing how to improve how government works, but how can we stop that conversation being dominated by white middle class male Londoners, however wonderful, insightful, inspirational, well meaning and right thinking I might find them. I’m part of the problem here, massively privileged, London centric. I want to hear other voices, outside the digital elite. Of course they won’t be at UKGovCamp. And I also recognise conversations have to start somewhere and gradually build coalitions. It’s just a concern that nags at me.
This one’s more personal for me, but the session helped me reconcile some conflicts inside myself and move on my thinking about how I can encourage more openness at ODI.
Communication is hard. So hard. The impact we intend to have, if we even think of our intent at all, is seldom the impact we do have. No one else is in the same context as you. Public communication is even harder, because the audience who sees what you write can be so varied. One way or asynchronous communication is harder again because there’s no feedback until you’re done and posted that can help you adjust or explain or nuance. If you care at all about what people think or feel (and I care about that a lot, probably too much) every piece of communication is a risk.
It is a risk worth taking, and it was good today to be reminded of that. I remember many years ago posting about a civil servant I’d just encountered who I thought (and said) was a little crazy. Turned out he read my blog. Him bringing it up was the first time I encountered the intersection of my work and my online tribe, and when I started being more circumspect and intentional about what I write publicly. But that slightly crazy civil servant was of course John Sheridan, who would become a fantastic colleague and one of my closest friends. There are ways to put things, and sometimes you have to deal with people you have upset intentionally or not, but I cannot think of a time I have regretted writing authentically and letting people see inside my head.
But still it is hard for me to write candidly given where I am now. Because I am CEO at ODI what I say can easily be taken as an institutional rather than personal position. When I communicate I am not just taking risks for myself but for my team and organisation. We are dependent on government money, and on preserving or building good relationships with funders. We and I must been seen to be strong, certain and confident to retain the confidence of our stakeholders. (Everything’s great by the way.)
All this leaves me with is the recognition that anyone who cares about other people or about the organisation they work for will not post about everything, cannot be completely open. And that’s ok. At ODI we talk about data being as open as possible and the shades of grey between open and closed data. It was good to be reminded that as open as possible communication is better than nothing.
Last week I was at the Canada-UK Colloquium on AI in Toronto. These are some things I learned and thoughts I had while there, in no particular order.
On the role of “anchor firms”: Big tech firms help support a startup ecosystem by acting as a backstop for technologists, allowing them to take the risk of working for startups as they know they won’t be left completely high and dry if the startup fails. They also perform a useful role mapping academic approaches into the real world in the form of code, online services and so on that can be plugged together to build new applications quickly: no one else has the resources/capability/motivation to do this mapping. It’s interesting to think about the extent government should be doing this, or subsidising it, and the degree to which this mapping is done for data as well as code.
On the role of third sector: We focus a lot when talking about AI and data on the role of the state, of business and of academia. But the third sector is important too. Consumer rights organisations have a role to play assessing and informing consumers about how services use data about them. Trade unions need to have a vision for how the demands on the workforce will change and workplaces and conditions should adapt. It was striking to me that through all the discussion of bodies supporting good governance of AI and data, the Ada Lovelace Institute was not mentioned.
On the hype cycle: All the AI practitioners urged caution and were concerned about hyperbole in the media narrative about AI. They pointed out that deep learning and reinforcement learning are only suitable for particular tasks and that much of the AI vision we are being fed requires techniques that haven’t been invented yet. There’s a danger that when the current wave of AI (machine learning) fails to meet high expectations we will enter another AI winter of reduced funding for research that slows progress again.
On what your phone can sense about you: Well-intentioned academics in Canada are prototyping applications to monitor levels of social anxiety, in a bid to provide better mental health care. (With permission) they can do things like work out what kind of places you go to, listen to your conversations, monitor movement, light, how much you touch your screen and so on. It felt creepy and invasive but got through the university ethics board. Not news, but to me it highlighted that these APIs and data were available to other Android apps, with the only check being the permissions dialog everyone clicks through. We probably don’t need to worry too much about well-intentioned academics with ethics approval: how do we find out about everyone else?
On diversity: Canada has a strong commitment to increasing (particularly gender) diversity. There are warm words about diversity in the UK too. I have Opinions, highly influenced by Ellen Broad, that appear to be unusual:
Having a diverse team will not necessarily mean you avoid bias in your algorithms/products. Saying you need diversity to create products that work for everyone gives non-diverse teams an excuse for poor practices that they really shouldn’t be allowed to use. What about user research? What about empathy? It is impossible to represent everyone by having someone exactly like them within a team: we should focus on finding good ways of engaging with people outside development teams and hold those teams to a higher standard in using them.
We should be careful to quote local statistics, or statistics relevant to particular subfields, rather than make diversity out to be a general problem across technology. I also have a lingering concern that making a big deal about women being less prevalent in technology makes technology less attractive to women (no one likes to be in places where they’re a minority).
In contrast to software development, there are many women in the field of ethics and algorithmic accountability. Is ethics subtly being thought of as women’s work (emotional labour)? (In the UK, this is even spelled out in the names of our institutes: Alan Turing for computer science, Ada Lovelace for ethics.)
On geopolitics: Canada and the UK have a lot in common. This may become even more true if Brexit goes ahead and Britain becomes a third country to Europe, with similar values but needing to prove data adequacy while having strong surveillance powers. France was other ally most often mentioned by Canadian representatives. The sense was that despite its strong investment in AI research and work by CIFAR, Canada was behind on thinking about data and data governance; there were also hints that its information commissioner’s office was not as helpful (to businesses) as the UK’s. As is common in these fora, there was a lot of talk about China, and state-led AI, but a general feeling that we need to engage and create international norms around AI rather than enter into a race.
On the stories we tell: Quite a lot of debunking went on in the room. There were requests never to treat or talk about Sophia as AI; never to use the trolley problem as if it had anything to do with the choices autonomous cars would make; not to believe Babylon’s figures about triage accuracy; not to spread the falsehood that a sexbot was manhandled at an Australian trade fair; not to mischaracterise how DeepMind Health use patient data in Streams. Even a room of “experts” needed to be corrected on occasion. It is good to challenge each other, the examples we repeat, and the evidence we quote.
On data trusts: Everyone is interested in data trusts. More precisely, everyone is interested in how to get data shared more readily while preserving privacy. When people say “data trusts” they mean very different things; they project their own notion of what well governed data sharing might look like. I really hope our work at ODI, and the concrete pilots we’ll be taking forward over the next few months help to make the notion more tangible, and highlight other models for sharing.
On regulation / government intervention: I find that whenever we start talking about how government should intervene around AI, we get sucked into a personal data ethics black hole. It is hard to see past what should or shouldn’t be done with personal data and into other issues such as public procurement, competition policy or worker rights. Particularly in the UK, where there’s already lots of activity around data & AI ethics, we should avoid the black hole by trying to create venues for discussions that don’t talk about personal data.
On populism & fear of technology: We listened to a fascinating presentation (similar to this recording) about the correlation between populism and fear of technology. Recent displacement from work is more likely to arise from technology than immigration, but immigration is more likely to get blamed. The good news is that those who fear automation, and particularly populists who fear automation, are happy with any policy response, including positive ones like supporting retraining. The lesson is to have a vision.
On the role of humans: Both humans and computers are biased and sometimes make poor decisions. (When people feel there’s too much emphasis on AI being good, they remind us of AI’s failings; when they feel there’s too much emphasis on it being bad, they remind us of human failings.) We are more concerned about the black boxes of silicon-based neural networks than we are about the ones in our heads, or perhaps in our organisations. I lazily insist that decisions are made by humans, informed by data, but that’s because my mental model is medical diagnosis or parole recommendations. In a battle, there’s no time for a system that detects and destroys incoming torpedoes to refer to a human. I have started to think that the same things are needed whatever the decision making entity: transparency, explanation, accountability (a means of recompense for harm and a correction for the future). The trap we need to avoid is thinking any system (human or machine) is faultless.
More on the role of humans: Robots are common in automobile manufacturing, but customers are now demanding more customisation in their cars, which robots aren’t as good at providing. So there are new roles for humans, working with machines. They call them “cobots”. On the railroad, there are now “portals” that photograph every outwardly visible inch of railcars as they drive through, and detect faults in minutes that used to take hours of inspection. Railcar engineers can concentrate on maintenance rather than finding faults. The current crop of AI is good at dull operational tasks, leaving the more interesting work for people (but do some humans like doing dull things some of the time? I know I do.).
On intelligence: People are building more expressive bots, whether physical or virtual, that mimic human emotions through their appearance or behaviour. They are also getting better at reading emotion. At some point the mimicry gets so good we start reacting as if it’s real; that’s the point of the Turing test. On the other hand, knowing that you are talking to a machine rather than a human may be liberating: we learned about a chatbot designed to help people decide to stop smoking - one of its benefits was that people could talk to it without feeling judged. If a bot could fake care, would you prefer to tell a machine your woes?
We live in a world where a few, mostly US-based companies hold huge amounts of data about us and about the world. Google and Facebook, and to a lesser extent Amazon and Apple, (GAFA) make money by providing services, including advertising services, that make excellent use of this data. They are big, rich, and powerful in both obvious and subtle ways. This makes people uncomfortable, and working out what to do about them and their impact on our society and economy has become one of the big questions of our age.
An argument has started to emerge against opening data, particularly government owned data, because of the power of these data monopolies. “If we make this data available with no restrictions,” the argument goes, “big tech will suck it up and become even more powerful. None of us want that.”
I want to dig into this line of argument, the elements of truth it contains, why the conclusion about not opening data is wrong, why the argument is actually being made, and look at better ways to address the issue.
More data disproportionately benefits big tech
It is true that big tech benefits, and benefits disproportionately to smaller organisations, from the greater availability of data.
Big tech have great capacity to work with data. They are geared to getting value from data: analysing it, drawing conclusions that help them grow and succeed, creating services that win them more customers. They have an advantage in both skills and scale when it comes to working with data.
Big tech have huge amounts of data that they can combine. Because of the network effects of linking and aggregating data together, the more data they have, the more useful that data becomes. They have an advantage because they have access to more data than other organisations.
Not opening data disproportionately damages smaller organisations
It is also true that small organisations suffer most from not opening data. Access to data enables organisations to experiment with ideas for innovative products and services. It helps them make better decisions, faster, which is particularly important for small organisations who need to make good choices about where to direct their energies or risk failure.
If data is sold instead of opened, big tech can buy it easily while smaller organisations are less able to afford to. Big tech have cash to spare, in house lawyers and negotiators, and savvy developers used to working with whatever copy or access protection is put around data. The friction that selling data access introduces is of minimal inconvenience to them. For small organisations, who lack these things, the friction is proportionately much greater. So on top of the disproportionate benefits big tech get from the data itself, they get an extra advantage from the barriers selling data puts in the way of smaller organisations.
If data isn’t made available to them (for example because they can’t negotiate to acceptable licensing conditions or the price is too high), big tech have the money and user base that enable them to invest in creating their own versions. Small organisations simply cannot invest in data collection to anywhere near the same scale. The data that big tech collects is (at least initially) lower quality than official versions, but it usually improves as people use it and correct it. Unlike public authorities, big tech have low motivation to provide equal coverage for everyone, favouring more lucrative users.
An example is addresses in the UK. Google couldn’t get access to that data under licensing conditions they could accept, so they built their own address data for use in Google Maps. Professionals think it is less accurate than officially held records. It particularly suffers outside urban and tourist areas because fewer people live there and there’s less need for people to use Google’s services there, which means less data available for Google to use to correct it.
Using different terms & conditions for different organisations doesn’t help
“Ah,” I hear you say, “but we can use different terms & conditions for different kinds of organisations so smaller ones don’t bear the same costs.”
It is true that it is possible to construct licensing terms and differential charging schemes that make it free for smaller firms to access and use data and charge larger firms. You can have free developer licences; service levels that flex with the size of companies (whether in employees or turnover or terminals); non-commercial licences for researchers, not-for-profits and hobbyists.
These are all possible, but they do not eliminate the problems.
First, the barrier for smaller organisations is not just about cash but about time and risk. Differential licensing and charging schemes are inevitably complex. Organisations have to understand whether they qualify for a particular tier and whether they are permitted to do what they want to do with the data. This takes time and often legal fees. The latter is often hard to work out because legal restrictions on particular data processing activities tend not to be black and white. They require interpretation and create uncertainty. This means organisations have to protect themselves against litigation arising from unintended non-compliance with the terms, which adds the cost of insurance. The more complex the scheme, the greater this friction.
Second, the clauses within a free licence always include one that prevents the organisation undercutting the original supplier of the data and selling it on to large organisations. Necessarily, this will place restrictions on the services that an organisation offers and the business model they adopt. They might be unable, for example, to build an API that adds value by providing different cuts of data on demand, or if they do their price might be determined by additional fees from the original supplier. Licensing restrictions limit what kinds of organisations can benefit from the data, and their ability to make money. And, as above, uncertainty about the scope of the restrictions (and whether the originating organisation will ever actually act on them) introduce risk and costs.
Third, while these costs and barriers are bad enough with one set of data, add another from a different supplier with another set of conditions, and you have to make sure you meet both of them. Sometimes this will be impossible (for example combining OpenStreetMap data, available under a share-alike licence, with non-commercial data). Add a third or fourth and you’re dealing with a combinatorial explosion of T&C intersections to navigate.
In part, the problems with differential pricing approach for data arise from the unique characteristics of data and the data economy.
it is endlessly manipulable which makes it necessarily complex to list all the ways in which you can, or can’t, use it, and which are allowed and which not
the greatest opportunities for innovation and growth are within infomediaries who slice and dice and add value to datasets; they need freedom to thrive
added value usually comes from the network effects of combining multiple datasets; but if there’s friction inherent in bringing datasets together, those same network effects will amplify that friction as well
It’s not surprising that people who are used to selling other kinds of things than data reach for “free licences for startups” as a solution to lower costs for smaller organisations. It seems an obvious thing to do. It might work for other kinds of products. It doesn’t work for data.
Opening data is better than not opening data
So far I’ve focused almost exclusively on the impacts of opening and not opening data on innovation and the ability of small businesses to thrive in comparison to big tech. I’ve discussed why selling or restricting access to and use of data favours big tech over and above the advantages they already receive from amassing more data.
If you like to think of playing fields, it’s true that opening data lifts big tech’s end of the pitch, but overall, it lifts the startup’s end more.
There are a few other considerations it’s worth quickly touching on.
Do we want big tech to use high quality data?
Earlier I wrote about how big tech makes its own data when it can’t get hold of official sources. They stitch together information from remote sensors, from what people upload, from explicit corrections, use clever machine learning techniques and come out with remarkably good reconstructions.
But “remarkably good” is not comprehensive. It is often skewed towards areas of high user demand, whether that’s cities rather than countryside or favouring the digitally included.
When big tech uses its own data rather than official data to provide services to citizens, it favours the enfranchised. It exacerbates societal inequalities.
It can also cost lives. I talked about Google’s address data and the doubts about its accuracy particularly outside towns and cities. Ambulances have started using it. When they are delayed because they go to the wrong place, people can die. Restricting access to address data forced Google to spend a bunch of money to recreate it, but who is actually suffering the consequences?
Not all services require the same level of detail in data. The impact of data errors is higher for some products than for others. But in general, we should want the products and services we use to be built on the highest quality, most reliable, most authoritative, timely, and comprehensive data infrastructure that we can provide. When we restrict access to that by not permitting companies with massive user bases amongst our citizenry to use that data, we damage ourselves.
What about big tech’s other advantages with data?
I’ve focused much of this on the advantage big tech enjoys in having access to data. As I touched on earlier, they also have an advantage in capability. If there’s a real desire to equalise smaller companies with big tech, they need support in growing their capability. This isn’t just about skills but also about tool availability and the ease of use of data.
Anything that helps people use data quickly and easily removes friction and gives a disproportionate advantage to organisations who aren’t able to just throw extra people at a problem. Make it available in standard formats and use standard identifiers. Create simple guides to help people understand how to use it. Provide open source tools and libraries to manipulate it. These are good things to do to increase the accessibility of data beyond simply opening it up.
How do we make this benefit society as a whole?
I’ve also been focusing deliberately on the narrow question of how we level the playing field between small organisations and big tech. Of course it’s not the case that everything small organisations do is good and everything big tech does is evil. Making data more open and accessible doesn’t ensure that anyone builds what society as a whole needs, and may mean the creation of damaging tools as well as beneficial ones. There might even (whisper it) be issues that can’t be solved with tech or data.
That said, the charities, community groups, and social enterprises that are most likely to want to build products or produce insights with positive social impact are also likely to be small organisations with the same constraints as I’ve discussed above. We should aim to help them. We can also encourage people to use data for good through targeted stimulus funding towards applications that create social or environmental benefits, as we did in the Open Data Challenge Series that ODI ran with Nesta.
Making it fair
When you dig into why people actually cite increasing inequality between data businesses as a reason for not opening data, it usually comes down to it feeling unfair that large organisations don’t contribute towards the cost of its collection and maintenance. After all, they benefit from the data and can certainly afford to pay for it. In the case of government data, where the public is paying the upkeep costs, this can feel particularly unfair.
It is unfair. It is unfair in the same way that it’s unfair that big tech benefits from the education system that the PhDs they employ went through, the national health service that lowers their cost of employment, the clean air they breathe and the security they enjoy. These are all public goods that they benefit from. The best pattern we have found for getting them, and everyone else who enjoys those benefits, to pay for them is taxation.
Getting the right taxation regime so that big tech makes a fair contribution to public goods is a large, international issue. We can’t work around failures at that level by charging big tech for access to public data. Trying to do so would be cutting off our nose to spite our face.
What can be done from a data perspective, whether a data steward is in the public sector or not, is to try to lower the costs of collection and maintenance. Having mechanisms for other people and organisations to correct data themselves, or even just highlight areas that need updating by the professionals, can help to distribute the load. Opening data helps to motivate collaborative maintenance: the same data becomes a common platform for many organisations and individuals, all of whom also contribute to its upkeep, just like Wikipedia, wikidata and OpenStreetMap. With government data, this requires government to shift its role towards being a platform provider — legislation.gov.uk’s Expert Participation Programme demonstrates how this can be done without compromising quality.
Big tech and data monopolies
I have focused on big tech as if all data monopolies are big tech. That isn’t the case. What makes a data monopoly a monopoly is not that it is big and powerful and has lots of users, it’s that it has a monopoly on the data it holds. These appear as much in the public sector as the private sector. Within the confines of the law, they get to either benefit exclusively or choose the conditions in which others can benefit from the data they hold.
Some of that data could benefit us as individuals, as communities and as societies. Rather than restricting what data of ours data monopolies can access, another way to level the playing field is to ensure that others can access the data they hold by making it as open as possible while protecting people’s privacy, commercial confidentiality and national security. Take the 1956 Consent Decree against Bell Labs as inspiration. That decree forced Bell Labs to license their patents royalty free. It increased innovation in the US over the long term, and particularly that by startups.
There are various ways of making something similar happen around data. At the soft, encouraging end of the spectrum there are collaborative efforts such as OpenActive or making positive noises whenever Uber Movements helps cities gain insights, or Facebook adds more data into OpenStreetMap or supports EveryPolitician. At the hard regulatory end of the spectrum, we see legislation: the new data portability right in GDPR; the rights given under the Digital Economy Act to the Office of National Statistics in the UK to access administrative data held by companies; the French Digital Republic Act’s definition of data of public interest; the Competition & Markets Authority Order on Open Banking.
We should be much more concerned about unlocking the huge value of data held by data monopolies for everyone to benefit from — building a strong, fair and sustainable data infrastructure — than about getting them to pay for access to public data.
Opening up authoritative, high quality data benefits smaller companies, communities, and citizens. There’s no doubt that it also benefits larger organisations. But attempts at ever more complex restrictions about who can use data are likely to be counterproductive. There are other ways of leveling these playing fields.
Earlier in the year I went to an OECD workshop on enhanced access to data. The workshop covered four general themes: open data, data sharing communities, data marketplaces and data portability. The discussion on the implications of data portability were particularly interesting.
Data portability is a new right under the EU-level General Data Protection Regulations (GDPR) due to come into force in May 2018 and a version of which will be written into UK law through the Data Protection Bill currently going through parliament.
The data portability right is a version of the existing data access right (which gives you the right to get hold of data about you held by an organisation). It is both more powerful, in that it gives you the right to have that data given to you or a third party of your choice in a commonly used machine readable format, and has a narrower scope in that it doesn’t apply to everything the organisation captures about you. It only applies to data captured automatically, and when it is either explicitly provided by you (eg when you fill in a form on a website) or generated as part of your activity (eg the records of your bank transactions). It does not apply to data that is inferred about you based on this data (eg if they’ve guessed that you’re gay or pregnant) or that they’ve got about you from other sources (eg your credit rating).
Why should we care about data portability?
There are three main reasons for the data portability right:
Providing more transparency than is currently provided. At the moment, exercising your data access right can simply lead to receiving pages and pages of printed information. With data portability, people will be able to search within and analyse the data that organisations hold about them.
Helping people to switch service providers without losing their histories. For example, if I wanted to switch from tracking my physical activity using Strava to using RunKeeper, the data portability right would guarantee I could get hold of the data held about my activities by Strava for import into RunKeeper.
Supporting the growth of data analytics third party services that provide insights based on data. These include services oriented around providing deeper insights into particular types of activity (eg helping you to reduce your energy usage) or that link together different types of activity (eg bringing together your transport spend with the routes that you travel).
Transparency is the main reason that the data portability right was originally put into place: it is, after all, an extension of data protection legislation. However it’s unknown whether many people will exercise the data portability right for transparency purposes. On the other hand, under GDPR people will no longer have to pay to exercise their data access right. It is likely that this change will have a larger impact on the number of people exercising their right to find out what information organisations hold about them.
Support for switching is seen as a secondary positive effect to reduce lock-in and increase competition. However we switch services only rarely and data portability is only one of the many barriers in place when switching. Analogies with mobile number portability (ie your ability to keep your mobile number when you move supplier) are ill founded: if you switch your bank account you still have to update the information of all those who have your old account details - data portability can only go so far with helping with this (eg in providing a list of standing orders and direct debits to recreate).
The growth of third party analytics services is likely to be the long-term large-scale side effect of the data portability right. The vision is that we could have applications that help us, both directly and through our carers and advisors, make better decisions by integrating data from across our lives. Imagine, for example, a grocery shopping app that takes into account your previous purchases, your travel plans, your current balance and your weight to suggest what to buy that week. Or a service that helps your doctor prescribe the right intervention based on accurate information about your diet, alcohol intake and activity.
It is worth exploring how these tools might manifest in a little more detail, but first let’s have a little reality check.
What makes data portability hard?
Getting the benefits of data portability won’t be quite as straightforward as might be imagined. The extent to which it’s useful depends a lot on how organisations choose to implement it.
First, organisations that receive a request under the data portability right have a month to respond. This is arguably a reasonable period to wait if the request is made for transparency reasons. It would cause some pain when switching suppliers (but people are likely to experience pain doing that anyway). But a delay of this length really undermines the utility of data analytics services to provide a timely and useful services. One could imagine, say, a telecoms company providing up-to-date information about your location and mobile usage on their own site while only providing data that is a month old to competitor analytics services. The month window for response is there to enable smaller organisations respond in an ad hoc way to requests rather than needing to invest in end-to-end technology. Large companies who anticipate a lot of requests will want to invest in automating responses to them, which should enable timely access, but will some deliberately build in a lag to their responses to retain a competitive advantage?
Second, the data portability right covers your right to get hold of data about you from an organisation but it does not provide any guarantees that that data can be imported into other services. One would have thought that competitor services would invest in making it easy for users to move to them by porting data from elsewhere, but this requires investing in tracking many moving targets (as the export formats used by competitors change over time) for a small proportion of potential users (given other switching barriers), particularly in unsaturated markets. Will competitors find it more worthwhile to invest in developing features that retain their existing users and win newcomers to the market? Will new users lower their risks by first trying out services that they know they can’t switch to later?
Third, while the data portability right requires data to be provided in a commonly used format, this by no means guarantees standardisation in data formats across particular sectors. Organisations might reasonably interpret the right as requiring the use of the common syntaxes such as CSV, JSON or XML while leaving semantic interoperability completely untouched. For example, one supermarket might label a field in shopping transaction data “prodName” and another “PID”; each might use a completely different set of names for the same products, different categorisation schemes, different codes for suppliers and so on. Without standardisation, any service that wants to use data from a particular source will have to write a custom parser. Will organisations within particular sectors be motivated to collaborate on creating standards that provide greater interoperability?
Fourth, there are questions about how the data portability right will be implemented securely. It is already common practice for third parties to access, and scrape, password-protected websites by asking users for their usernames and passwords. This is extremely bad practice from a security perspective as access can’t be limited or revoked easily, and because users frequently reuse passwords across multiple sites. Badly implemented, the data portability right could lead to a bonanza for phishers and identity thieves. Will organisations encourage their users to reveal their login details to get access to data under the portability right, or will they take the time to implement more sophisticated and secure ways of authenticating and authorising third party access such as OAuth?
Finally, the data portability right places control into the hands of individuals to decide with whom to share data about and from the services they use. If our experiences with the cookie law, privacy policies and website T&Cs teach us anything, it’s that many people are lazy and will simply click “I agree” on anything that stands in the way of accessing a service. Some of the products that request access to data under the data portability right will be bogus, actively created by identity thieves or to build marketing databases, or they may simply store data badly and thus increase the risks of security breaches. Will people be able to choose wisely which third party services to grant access to through the data portability right? Will existing or new consumer organisations build services to help them do so? Will regulators rise to the new challenges this creates?
How might data portability pan out?
Bearing these limitations in mind, there are a number of potential unintended consequences of the data portability right.
First, the data portability right may push towards a less innovative and competitive market. The creation of standards for data portability might push services towards providing services that fit with the “shape” defined by those standard data formats but truly innovative services might not fit that shape. As a trivial example, traditional energy suppliers might not care about or provide information about who generated the energy they supply whereas innovative energy brokers might consider this a key piece of information for customers who want to buy from a local wind farm. The data portability right requires standards to be useful, but whatever standards get created will need to be flexible to the different kinds of products that services might provide.
Second, rather than promoting competition, the data portability right may place even more power in the hands of the big tech companies who have the capacity, in terms of knowledge and resources, to take most advantage of it. For example, Amazon is already a threat to traditional retailers; it is also well placed to take advantage of the data portability right to import people’s shopping lists to AmazonFresh. Google already infers things about you through your and millions of other people’s search patterns and clickstream; it will be able to give much more personalised insights on your travel habits than a startup that hasn’t got that vast amount of data to draw on. There are many opportunities for startups and SMEs in providing data brokerage and user facing services, but the data portability right isn’t going to suddenly put them on a level playing field.
Third, rather than increasing our privacy and control, the data portability right could make importing data from elsewhere a natural part of signing up for a new service, resulting in data about us proliferating onto multiple services and out of our control. Consumer and privacy rights groups need to combine forces to put pressure on businesses to minimise their data greed and to increase the ability of consumers to understand the implications of and make good choices about porting data into other services.
Fourth, while the European Data Protection Supervisor may think that “one cannot monetise and subject a fundamental right to a simple commercial transaction, even if it is the individual concerned by the data who is a party to the transaction”, the data portability right will undoubtedly lead to the development of personal data markets. People will be encouraged to port data about themselves into personal data brokers, with the promise of control over use and a financial return when it is sold on. This in turn may lead to a future where access to data is determined by who can pay for it, accelerating knowledge, power and financial inequalities.
Finally, on a more positive note, the data portability right could lead to more people making the positive choice to donate data about themselves for good causes such as medical research. Research on public attitudes to data use indicates that people are happy for personal data to be used for societal benefits. Data portability could provide a mechanism for some charities and civil society groups to engage people in collective action.
Where are the gaps in data portability?
Finally, there are a few areas that the data portability right doesn’t tackle, where legislation could perhaps be extended or clarified through guidance.
First, the data portability right applies to natural people, and not to organisations. But organisations are heavy users of services; service providers capture data about them just as they do about people; and organisations would benefit just as much, if not more, from the benefits of being able to switch suppliers or receive data analytics support. The Open Banking initiative, which has data portability at its heart, has focused on benefits to small businesses of being able to find suitable financing more easily. While organisations don’t have a data portability right, the individuals within them do - will organisations start using their staff to front data requests in order to achieve the same benefits?
Second, while the data portability right could result in data donation for societal benefit as described above, it would be far easier to realise those benefits if researchers and statisticians were able to access a data from a representative sample of service users, not just a biased subset of those savvy (and generous) enough to donate data. The Digital Economy Act gives the UK’s Office of National Statistics the power to require data from some public, private and third sector bodies, as long as doing so is consistent with the Data Protection Act. It will be interesting to see how the expectation of individual control over data use granted by GDPR interacts with this.
Third, while many speak about data portability in terms of providing access to “your data”, in reality data shared with third parties may include personal data about other people too. This might include data directly about other people in your social graph, or in your household, or with whom you transact through a peer-to-peer service. Similarly, it may include commercially sensitive data about businesses you frequent or charities you donate to. When analysed in bulk, data about a sample of the population becomes information about people who were not included directly in the analysis. For example, data about my shopping habits may be used to make guesses about the shopping habits of other middle class, middle aged mothers of two. Data about us is never only about us.
As data analytics and machine learning reach further into our individual lives, the choices we make as individuals about how data about us is shared and used, and indeed what we do while that data is being collected, have wider repercussions. They do not just affect the decisions that are made about us individually, but those that are made about others like us.
The data portability right provides us with a powerful positive ability to take advantage of the data others collect about us and new opportunities for innovators and campaigners. But it also pushes ever wider the door to a more surveilled society. It is hard to predict how it will affect the power dynamics between individuals and organisations, between incumbents and new providers, or between big tech and startups. Companies will need to cooperate, particularly around standards, for consumers to benefit. Regulators will need to watch closely how the right is implemented and the effects on the market. And we will need to take an ever more active role in questioning and holding to account everyone who uses data about us.
The discussion here is based heavily on the insights provided by Marc MacCarthy, Ruth Boardman, Lenard Koschwitz, Randi Flesland, John Foster, Babak Jahromi, Robin Wilton and the audience at the session on data portability at the OECD workshop on enhanced access to data. Thanks in particular to Christian Reimsbach Kounatze for organising it.
My eldest daughter is now in secondary school and, while she enjoys and is good at Maths, what she really loves studying is History and English. Watching the critical thinking and analysis skills that she is learning and using for those subjects, I have started to wonder if we should be approaching data literacy from a different angle.
The need for children and adults to be equipped with data skills is well recognised. The Nesta paper Analytic Britain: Securing the Right Skills for the Data-Driven Economy contains some recommendations, for example. However, much of this work focuses on the development of what I would frame as data science skills: the basic skills like the ability to clean data, analyse it, display it in graphs and maps, and the more advanced skills of machine learning and interactive visualisations. Data literacy becomes equated with the ability to do things with data.
But for me, data literacy, and the skills we all need to have in our policymaking, businesses and lives, go beyond handling data. We need to know what data is capable of (even if we can’t do those things ourselves). We need to understand the limits of data, the ways it can be used for both good and ill, the implications that has on our lives and society. Understanding these things would help us use data well in government, business and our day to day lives and have more informed debate about how we use data in society.
You may remember from your own childhood studying both English Language and English Literature. English Language focuses on reading and writing, the production of material, the manipulation of language. English Literature focuses on the study of English in use, the material produced by different authors, their use of different techniques, the context in which they produced their works and the impact their work had. The two areas of study feed on each other: producing poetry enables you to understand poetry as a form, and studying great poems improves your own technique. But the focus of each is distinct. We expect children to be able to read and write when they leave school. We also expect them to understand how others’ writing has contributed to our culture and society.
Could we apply the same approach to data? Children are already taught Data Language as part of the Maths curriculum. They are taught how to collect data, record it, create basic statistics, make charts and graphs from it, even in primary school. But what about Data Literature?
What if children were taught about Florence Nightingale’s use of data? They could unpick the method of collection, the birth of new forms of visualisation and the use of data for argument and persuasion and change. They could examine the context of Nightingale’s work at the time and the repercussions through to the present day. They could create new works from her data, put together new visualisations and invent modern-day newspaper stories.
They could examine the works of great modern day data visualisers and compare and contrast their works around particular key events, such as the Iraq war or the 2016 presidential election, or on thematic topics such as climate change. They could examine commonalities in form - citation of sources, provision of values - as well as differences in style and expression. They could produce their own visualisations in the style of one of the greats, or simply copy a work to see how it’s done.
They could look at the use of data in reports, from official statistical releases, through academic papers, to sports commentary. They could look at how these have evolved over time, and the varying ways in which numbers and statistics can be used to inform and substantiate a story that is being told. They could look at the choices made about what numbers get quoted in such stories, and have exercises where they select different numbers or use different rhetorical devices (eg “almost 20%” vs “less than 20%”) to reach a different conclusion.
Children could be taught the history of census taking, from the Roman census that reportedly led to at least one birth in a stable, through the Doomsday book that redistributed land, to the modern day. They could examine different forms of census taking and the way in which the data is used. But they could also examine the way in which census taking, or indeed the gathering and use of any data, can exert power and change reality.
There are many other topics that would make rich study material: the art of fact checking; the role of open data in government transparency and accountability; the data flows in adtech; conversational interfaces with data such as Siri and Alexa; surveillance and secret data; personalisation and data ownership in smart devices.
I am not an educationalist, but I think that these kinds of topics would equip children with a much better understanding of what data really means to society. And I think it taps into the skills that those who lean towards the arts and social sciences enjoy exercising: skills such as critical thinking, context awareness and artistic appreciation. There are people who are turned off data because they don’t enjoy maths. This provides a different route to reach them.
I am sure there must be people thinking of and doing this already. I know of the Calling Bullshit course, for example. What else is there? Does this idea have legs? How could we advance it? Let me know at firstname.lastname@example.org.