Earlier in the year I went to an OECD workshop on enhanced access to data. The workshop covered four general themes: open data, data sharing communities, data marketplaces and data portability. The discussion on the implications of data portability were particularly interesting.

Data portability is a new right under the EU-level General Data Protection Regulations (GDPR) due to come into force in May 2018 and a version of which will be written into UK law through the Data Protection Bill currently going through parliament.

The data portability right is a version of the existing data access right (which gives you the right to get hold of data about you held by an organisation). It is both more powerful, in that it gives you the right to have that data given to you or a third party of your choice in a commonly used machine readable format, and has a narrower scope in that it doesn’t apply to everything the organisation captures about you. It only applies to data captured automatically, and when it is either explicitly provided by you (eg when you fill in a form on a website) or generated as part of your activity (eg the records of your bank transactions). It does not apply to data that is inferred about you based on this data (eg if they’ve guessed that you’re gay or pregnant) or that they’ve got about you from other sources (eg your credit rating).

Why should we care about data portability?

There are three main reasons for the data portability right:

  1. Providing more transparency than is currently provided. At the moment, exercising your data access right can simply lead to receiving pages and pages of printed information. With data portability, people will be able to search within and analyse the data that organisations hold about them.

  2. Helping people to switch service providers without losing their histories. For example, if I wanted to switch from tracking my physical activity using Strava to using RunKeeper, the data portability right would guarantee I could get hold of the data held about my activities by Strava for import into RunKeeper.

  3. Supporting the growth of data analytics third party services that provide insights based on data. These include services oriented around providing deeper insights into particular types of activity (eg helping you to reduce your energy usage) or that link together different types of activity (eg bringing together your transport spend with the routes that you travel).

Transparency is the main reason that the data portability right was originally put into place: it is, after all, an extension of data protection legislation. However it’s unknown whether many people will exercise the data portability right for transparency purposes. On the other hand, under GDPR people will no longer have to pay to exercise their data access right. It is likely that this change will have a larger impact on the number of people exercising their right to find out what information organisations hold about them.

Support for switching is seen as a secondary positive effect to reduce lock-in and increase competition. However we switch services only rarely and data portability is only one of the many barriers in place when switching. Analogies with mobile number portability (ie your ability to keep your mobile number when you move supplier) are ill founded: if you switch your bank account you still have to update the information of all those who have your old account details - data portability can only go so far with helping with this (eg in providing a list of standing orders and direct debits to recreate).

The growth of third party analytics services is likely to be the long-term large-scale side effect of the data portability right. The vision is that we could have applications that help us, both directly and through our carers and advisors, make better decisions by integrating data from across our lives. Imagine, for example, a grocery shopping app that takes into account your previous purchases, your travel plans, your current balance and your weight to suggest what to buy that week. Or a service that helps your doctor prescribe the right intervention based on accurate information about your diet, alcohol intake and activity.

It is worth exploring how these tools might manifest in a little more detail, but first let’s have a little reality check.

What makes data portability hard?

Getting the benefits of data portability won’t be quite as straightforward as might be imagined. The extent to which it’s useful depends a lot on how organisations choose to implement it.

First, organisations that receive a request under the data portability right have a month to respond. This is arguably a reasonable period to wait if the request is made for transparency reasons. It would cause some pain when switching suppliers (but people are likely to experience pain doing that anyway). But a delay of this length really undermines the utility of data analytics services to provide a timely and useful services. One could imagine, say, a telecoms company providing up-to-date information about your location and mobile usage on their own site while only providing data that is a month old to competitor analytics services. The month window for response is there to enable smaller organisations respond in an ad hoc way to requests rather than needing to invest in end-to-end technology. Large companies who anticipate a lot of requests will want to invest in automating responses to them, which should enable timely access, but will some deliberately build in a lag to their responses to retain a competitive advantage?

Second, the data portability right covers your right to get hold of data about you from an organisation but it does not provide any guarantees that that data can be imported into other services. One would have thought that competitor services would invest in making it easy for users to move to them by porting data from elsewhere, but this requires investing in tracking many moving targets (as the export formats used by competitors change over time) for a small proportion of potential users (given other switching barriers), particularly in unsaturated markets. Will competitors find it more worthwhile to invest in developing features that retain their existing users and win newcomers to the market? Will new users lower their risks by first trying out services that they know they can’t switch to later?

Third, while the data portability right requires data to be provided in a commonly used format, this by no means guarantees standardisation in data formats across particular sectors. Organisations might reasonably interpret the right as requiring the use of the common syntaxes such as CSV, JSON or XML while leaving semantic interoperability completely untouched. For example, one supermarket might label a field in shopping transaction data “prodName” and another “PID”; each might use a completely different set of names for the same products, different categorisation schemes, different codes for suppliers and so on. Without standardisation, any service that wants to use data from a particular source will have to write a custom parser. Will organisations within particular sectors be motivated to collaborate on creating standards that provide greater interoperability?

Fourth, there are questions about how the data portability right will be implemented securely. It is already common practice for third parties to access, and scrape, password-protected websites by asking users for their usernames and passwords. This is extremely bad practice from a security perspective as access can’t be limited or revoked easily, and because users frequently reuse passwords across multiple sites. Badly implemented, the data portability right could lead to a bonanza for phishers and identity thieves. Will organisations encourage their users to reveal their login details to get access to data under the portability right, or will they take the time to implement more sophisticated and secure ways of authenticating and authorising third party access such as OAuth?

Finally, the data portability right places control into the hands of individuals to decide with whom to share data about and from the services they use. If our experiences with the cookie law, privacy policies and website T&Cs teach us anything, it’s that many people are lazy and will simply click “I agree” on anything that stands in the way of accessing a service. Some of the products that request access to data under the data portability right will be bogus, actively created by identity thieves or to build marketing databases, or they may simply store data badly and thus increase the risks of security breaches. Will people be able to choose wisely which third party services to grant access to through the data portability right? Will existing or new consumer organisations build services to help them do so? Will regulators rise to the new challenges this creates?

How might data portability pan out?

Bearing these limitations in mind, there are a number of potential unintended consequences of the data portability right.

First, the data portability right may push towards a less innovative and competitive market. The creation of standards for data portability might push services towards providing services that fit with the “shape” defined by those standard data formats but truly innovative services might not fit that shape. As a trivial example, traditional energy suppliers might not care about or provide information about who generated the energy they supply whereas innovative energy brokers might consider this a key piece of information for customers who want to buy from a local wind farm. The data portability right requires standards to be useful, but whatever standards get created will need to be flexible to the different kinds of products that services might provide.

Second, rather than promoting competition, the data portability right may place even more power in the hands of the big tech companies who have the capacity, in terms of knowledge and resources, to take most advantage of it. For example, Amazon is already a threat to traditional retailers; it is also well placed to take advantage of the data portability right to import people’s shopping lists to AmazonFresh. Google already infers things about you through your and millions of other people’s search patterns and clickstream; it will be able to give much more personalised insights on your travel habits than a startup that hasn’t got that vast amount of data to draw on. There are many opportunities for startups and SMEs in providing data brokerage and user facing services, but the data portability right isn’t going to suddenly put them on a level playing field.

Third, rather than increasing our privacy and control, the data portability right could make importing data from elsewhere a natural part of signing up for a new service, resulting in data about us proliferating onto multiple services and out of our control. Consumer and privacy rights groups need to combine forces to put pressure on businesses to minimise their data greed and to increase the ability of consumers to understand the implications of and make good choices about porting data into other services.

Fourth, while the European Data Protection Supervisor may think that “one cannot monetise and subject a fundamental right to a simple commercial transaction, even if it is the individual concerned by the data who is a party to the transaction”, the data portability right will undoubtedly lead to the development of personal data markets. People will be encouraged to port data about themselves into personal data brokers, with the promise of control over use and a financial return when it is sold on. This in turn may lead to a future where access to data is determined by who can pay for it, accelerating knowledge, power and financial inequalities.

Finally, on a more positive note, the data portability right could lead to more people making the positive choice to donate data about themselves for good causes such as medical research. Research on public attitudes to data use indicates that people are happy for personal data to be used for societal benefits. Data portability could provide a mechanism for some charities and civil society groups to engage people in collective action.

Where are the gaps in data portability?

Finally, there are a few areas that the data portability right doesn’t tackle, where legislation could perhaps be extended or clarified through guidance.

First, the data portability right applies to natural people, and not to organisations. But organisations are heavy users of services; service providers capture data about them just as they do about people; and organisations would benefit just as much, if not more, from the benefits of being able to switch suppliers or receive data analytics support. The Open Banking initiative, which has data portability at its heart, has focused on benefits to small businesses of being able to find suitable financing more easily. While organisations don’t have a data portability right, the individuals within them do - will organisations start using their staff to front data requests in order to achieve the same benefits?

Second, while the data portability right could result in data donation for societal benefit as described above, it would be far easier to realise those benefits if researchers and statisticians were able to access a data from a representative sample of service users, not just a biased subset of those savvy (and generous) enough to donate data. The Digital Economy Act gives the UK’s Office of National Statistics the power to require data from some public, private and third sector bodies, as long as doing so is consistent with the Data Protection Act. It will be interesting to see how the expectation of individual control over data use granted by GDPR interacts with this.

Third, while many speak about data portability in terms of providing access to “your data”, in reality data shared with third parties may include personal data about other people too. This might include data directly about other people in your social graph, or in your household, or with whom you transact through a peer-to-peer service. Similarly, it may include commercially sensitive data about businesses you frequent or charities you donate to. When analysed in bulk, data about a sample of the population becomes information about people who were not included directly in the analysis. For example, data about my shopping habits may be used to make guesses about the shopping habits of other middle class, middle aged mothers of two. Data about us is never only about us.

As data analytics and machine learning reach further into our individual lives, the choices we make as individuals about how data about us is shared and used, and indeed what we do while that data is being collected, have wider repercussions. They do not just affect the decisions that are made about us individually, but those that are made about others like us.

The data portability right provides us with a powerful positive ability to take advantage of the data others collect about us and new opportunities for innovators and campaigners. But it also pushes ever wider the door to a more surveilled society. It is hard to predict how it will affect the power dynamics between individuals and organisations, between incumbents and new providers, or between big tech and startups. Companies will need to cooperate, particularly around standards, for consumers to benefit. Regulators will need to watch closely how the right is implemented and the effects on the market. And we will need to take an ever more active role in questioning and holding to account everyone who uses data about us.

Acknowledgements

The discussion here is based heavily on the insights provided by Marc MacCarthy, Ruth Boardman, Lenard Koschwitz, Randi Flesland, John Foster, Babak Jahromi, Robin Wilton and the audience at the session on data portability at the OECD workshop on enhanced access to data. Thanks in particular to Christian Reimsbach Kounatze for organising it.