January 2021 Month Notes

Jan 31, 2021

This is a month note, which is like a weeknote but for a month. You probably guessed that. My comms colleagues would question me: who is the audience for this? what’s the call to action? My response is: I don’t know and there isn’t one. I wrote it because I thought it would be good to reflect on the month, and not bad to have a record of it beyond scribbles in my bullet journal. I honestly debated whether to publish it, because it gets quite personal near the end, but I think that’s the point of week/month notes: showing others a bit of the person behind the screen. I make no promises about writing or publishing another. Read if you like; take what you wish from it.



Strategic work

Over the last few months we’ve been working through taking a more programmatic approach to our work at ODI. This has been promoted by the success of the data institutions programme; the greater flexibility that we’re expecting to have around how we can spend the money we receive from the UK government, which means we will have the flexibility to be more self-directed; and feedback from our recent (successful!) funding process with Luminate.

We identified five programmes we’re going to focus on at the end of last year: data literacy; data assurance; data for challenges; data institutions; and evidence and foresight. My main strategic focus this month has been on helping the people leading those programmes to think through and articulate their goals and strategies, talk about them with the team, and start putting in place plans for work in Q2. The centre pieces this month were day-long “at-home offsites” with the senior leadership team and then the wider team, to discuss the approach and the programmes themselves, but I also had a number of great 1:1 chats with programme leads.

I’m really pleased with how this is progressing. The people leading the programmes are all really thoughtful and driven by the impact the programmes should have, and the discipline heads - eg in comms and business development - have readily engaged with structuring and focusing their activities around the programmes. Speaking externally about ODI’s work in terms of these programmes has also proved to be really helpful.

There’s still a lot of work to do on strategies and plans, on internal and external comms, and on operational implications. And the planning will surface the perennial challenge of balancing mission and sustainability (eg programme impacts are about changing the world, not creating new revenue streams, but hopefully we can do some of the latter along the way).

Personally, I’m facing a familiar “why didn’t we (I) do this sooner” feeling, given some people have been telling us to take a programme approach for years (it being very common across the non-profit sector). But I’m also reminding myself that context plays a big part in whether good ideas can land and progress successfully. Right now, happily, internal and external stars are aligning.

Other internal work

This month I’ve also been involved in:

  • recruiting a new Head of Consulting from a strong field of candidates - still a couple more steps to go but I’m excited about having new energy, perspectives and experience at a senior level in ODI (we do miss you though, Leigh)

  • reviewing a lot of case studies - we are contractually obliged to produce these as part of our old Luminate grant, but I hope we continue to do so as they are a great summary of work we’ve done and what we learned doing that work, which I think is valuable both for ourselves and others

  • completing our annual report, which should be published soon

  • starting the planning for this year’s ODI Summit - given the huge reach we had last year, and the continuing uncertainty about how much social distancing will be needed in November, we’re going virtual again. I’m not going to give any spoilers about the theme…

  • helping Milly work out how we prioritise our efforts around public policy work

ODI project work

This month I’ve been involved with:

  • getting some support for some work doing a census of UK data institutions and existing organisations that could take on a data institution role - I want to see what some data analysis support can add to the desk research we typically do and this is a good opportunity for that

  • helping the team with some work we’re doing around the World Bank’s upcoming World Development Report on data, examining how to operationalise support for new forms of data institution (eg data trusts, data unions) in low and middle income countries - there’s a lot to unpack but my view is that it’s great to experiment with these but they can and will go wrong, so providing safety and redress is essential, but can be problematic in low capacity settings

  • supporting some work with a big international company to explore policy and practical issues around access to health data - our calls for openness about the work and with the results are being met with internal support within the company, so I’m looking forward to seeing this work becoming more public over the next few months

  • supporting one of our ODI Research Fellows, Sue Chadwick, as she’s exploring issues to do with digital and data ethics in property development planning - she’s done an amazing survey of all legislation that mentioned data/information during 2020 - expect a blog and lunchtime lecture over the next few months, as well as her final report of course

  • we’ve had a frustrating run of rejections in a series of bids that I put a bunch of work into, so I’m feeling a little dejected around that especially as they were really interesting projects - I’d love to work out whether the problem is our experience/expertise, approach/methodology, price and/or presentation and adjust accordingly

Interesting conversations

This month, I’ve chatted with:

  • an international analytics firm about the importance of social scientists to do real life data analysis; how to strike the balance between fixing the plumbing and papering over the cracks; and how to push for openness within public sector contacts

  • XPRIZE about how challenge-focused initiatives can transition to longer lasting institutions

  • Natalie Byrom and others about the need for court data to help tackle the case backlog

  • Anouk Ruhaak about the work she’s doing as a Mozilla fellow in residence around their Data Futures programme, with particular overlaps around how we evaluate the results of experiments with new institutional forms and interests in data institutions to support worker rights

  • the National Data Strategy team within DCMS about their future plans

  • some lawyers about the likelihood of a positive data adequacy decision for the UK (answer: very likely)

I was interviewed six times, by researchers examining:

And I participated in conversations with:


I’m co-chairing the Global Partnership for AI’s Data Governance Working Group with the wonderful Maja Bogataj and the incredible support of Ed Teather. The Working Group is made up of about 30 international experts from a non-representative mix of countries.

Since taking on the role I’ve been trying to work out how to ensure the work of the group is impactful and inclusive. I really don’t want it to be a once-a-month talking shop. We really are in a unique position to influence international research and development work (and policy) around data governance and AI and I want us to make the most of that opportunity.

The process I’ve championed is to spend the next six months developing concept notes for two-year-long international programmes of work that hopefully GPAI, governments, or other organisations might fund and support. I hope at least some of these might bolster (through funding, expertise and attention) existing projects and programmes.

This month, within the Data Governance Working Group we took a long list of around 30 potential concept notes and prioritised them down to 7 ideas that we’re going to flesh out a bit more. They are around:

  • data justice
  • data trusts (etc)
  • balancing innovation and data protection in legal regimes
  • handling co-creation rights
  • international rules on text and data mining
  • dataset documentation and management
  • privacy enhancing technologies

I expect these to get whittled down more, both by the group and by the GPAI steering committee. If we end up with two or three good programmes to take forward by the end of June, I’ll be happy.

We’ve also been engaging with the other co-chairs of other working groups, some of which are taking similar approaches, and trying to find areas where there is overlap or where experts in one group might contribute to others. I’m particularly pleased and grateful that Kim McGrail has been supporting the Pandemic Response Working Group shape their work on the governance of data for the pandemic response into something concrete and manageable.

Other work

I had three meetings where I was focused on giving advice around other people’s work:

  • a mentoring chat arranged through Digital Candle - free digital advice for charities - where I helped someone think through what level of structured data collection was really necessary for their goals, and how to go about doing a user needs assessment and building or buying a solution - it felt a little outside my regular expertise but it seems I knew enough to help, which was very satisfying. If you have any digital/data knowledge, you should join Digital Candle if you’re not already on it

  • a presentation to (some of) the Creative Commons team - I was asked to chat to them about anything by their CEO Catherine Stihler, so obviously I talked about data institutions (and their relationship to cultural institutions) but ended up chatting about the wonders of TikTok as a creative commons (yay Wellerman and the Ratatouille Musical) and the challenges of being organisations like CC and ODI, particularly in balancing being opinionated and constructive

  • attended the GOV.UK advisory board and advised on some of their plans, which I can’t talk about but no doubt they will soon

I also attended GovCamp, and it was wonderful to have serendipitous and stimulating conversations with old and new friends. I would not be where or who I am today without GovCamp and the GovCamp tribe, so I was really pleased it ran again, though sad I could only attend on Saturday and missed what I gather were excellent sessions due to work clashes. Without going into too much detail, the sessions I went to were:

  • Simon’s session on data collaboratives and ways of supporting self-sovereign identity, particularly with the focus on the work being done on creating a digital service around lasting powers of attorney - really interesting work being done here and a great real example where digital identity is important and difficult

  • Sam and Alex’s session on working with organisations outside government when you’re creating services inside government, particularly those already providing digital and data infrastructure

  • my session on data service design, the features they need, things to do while designing them, risks they raise and so on - I was indulging my inner nerd with this, drawing on my now ancient legislation.gov.uk experience - I hope to write it up in my off hours

  • John’s session on the rule of law, the difference between law and guidance and the need for digital services to refer to the legislation (and case law) that underpins them

  • Gavin’s session on how you would create an index assessing public sector / departmental data maturity/activities

Do take a look at the notes; they were great discussions and it was lovely to see people.

In a similar vein, I couldn’t really afford the time but as an intellectual treat to myself, I applied to join the Data and Society workshop on Trust and Doubt in Public-Sector Data Infrastructures at the end of March. But my application was rejected, which made me sad.

Thoughts I had

A random collection of things that have been in my mind, usually as a result of things I’ve read or the conversations above:

  • while data institutions steward and act as intermediaries in data flows, they can also act as useful monitors of those flows, which can reveal additional information, such as which datasets are most useful, what kinds of things do people want to do with them, and how successful those uses are - which can then help to direct policy and investment; this should be factored into their design

  • with great help from Ed Parkes, we’ve been looking at the kinds of support that early stage data institutions need and where it’s useful for that to come from ODI (or organisations like ODI); it feels to me as if there’s a stage in the common transition from project to stand-alone institution where some operational support is useful, but it’s more like incubation (the focus being on enabling them to stand on their own two feet) than long term hosting

  • a few conversations have been interesting around data literacy - we published a blog about where we see the gaps at ODI - there’s a difference between the data capabilities that companies like DeepMind need and those required by more run-of-the-mill businesses and public sector / civil society organisations

  • it’s so easy to lose sight of the fact that what feels like accepted wisdom in data governance circles (eg about the inappropriateness of data ownership as a concept wrt personal data, or the limitations of consent as a mechanism for data governance) are new and revolutionary outside those circles and still need explaining

  • it’s been helpful in thinking through how we want to develop some of the tools we’ve been working on in our R&D projects to recognise that there needs to be a stage of experimenting and working out propositions around those tools before productising them

  • I learned from the team behind the Internet & Jurisdiction Policy Network framing study on data sovereignty a useful way of breaking down concerns that countries have about the free flow of data, which make them introduce data localisation (or data sovereignty) policies and laws, namely concerns about their security, about economic impacts, and about their citizen’s human rights


Work life

As is traditional, I tried to start some new habits in January in my work life, which have mostly been successful:

  • email: I inbox zero’d myself by marking everything as read at the beginning of the year; now I have two modes of email interaction: triage where I scan through and label everything I need to actually action; and action where I go through that list and respond as appropriate. I have mostly been able to stick to this (but sometimes cheat and respond when I should be triaging) and have started to set aside chunks of time specifically for actioning emails, because they really build up! I am currently on zero unread emails, but about five unactioned emails.

  • work pattern #1: I realised that I really like to get down to some proper work at the beginning of the day, so I have set aside 8-9am each day for heads down work (not looking at emails), and I decide what I’m going to do in that slot ahead of time so I’m not dilly-dallying. This has been working great for me - it gives me some guaranteed daily time for writing, reviewing and thinking and means I go into the rest of the day (which is often full of meetings) feeling I’ve achieved something. BUT it means an early start and given my day typically lasts until 6pm with only 30 mins for lunch means my work days are very long (though being strict about stopping work at 6pm means they’re not as long as they could be)

  • work pattern #2: I have a lot of meetings, and as an introvert (and normal human being; I think we are all suffering from Zoom fatigue) I need time to recharge from them. The worst are days with multiple 30 minute back-to-back meetings (last Thursday my 12 noon meeting was the 6th meeting of the day). I’m trying to purposefully book heads down time into my calendar to break up the day and remind myself (and others) not to book meetings over them.

  • work pattern #3: I’m using a physical bullet journal for notes again, having tried using Google Docs and index cards during last year. I have a layout for each day that includes one priority task and a timeline for the day that I create at the end of the previous day, and four areas of reflection that I complete at the end of the day: what I achieved, what I learned, what I experienced and what I’m grateful for. These have been helpful in putting together this month note, but also gives just a little moment of reflection and recognition time which I think boosts my mood and helps me let go of things at the end of the day.

  • exercise: I did not exercise enough last year (makes me realise how just the daily grind of commuting, racing across London for different meetings, and walk-and-talks gave a reasonably good activity baseline), especially after the kids returned to school and the weather turned nastier so I started skipping morning walks. I now have a standing desk and wobble board, which I try to use for brief meetings (it’s a bit too tiring being on my feet all through longer ones). I’ve been trying to go on a one-hour walk every day with my 15yo, and managed that for the first couple of weeks but the combination of their school day and my work commitments has made it hard in the last couple of weeks. I’ve booked in a mid-afternoon walk in my calendar from now on, but I don’t think I’m going to be able to manage it every day. 

The one thing that I’m not getting/making time for is proper reading (of things other than ODI reports etc that I’m reviewing, or scans through blog posts and articles). During February, I want to try to introduce a reading slot each week.

I was approached this month about another CEO role, which prompted a bit of career reflection with the conclusion that I’ve really got it good right now: work that interests me, a team I love working with, doing things aligned with my values and the impact I want to have in the world. I was really quite miserable when I was CEO of ODI, and while it’s hard to unpick how much of that was to do with the context and how much to do with the role, I (a) still feel in need of some recovery time to build up the energy and confidence to make another (hopefully more successful) attempt at being a CEO anywhere and (b) would need something pretty special to attract me away from ODI.

Home life

At home, we had a bit of stress at the beginning of January through uncertainty about whether the kids would be learning at home or in school. When it was clear they were going to be at home, my 17yo, who’s in sixth form now, was worried because she feared it would be like the first lockdown, when they just received a bunch of worksheets to do at the beginning of each day and got hardly any interaction with their teachers or friends. On the other hand, my 15yo was delighted at this prospect as they really like to learn at their own (rapid) pace.

But the school have decided to do things differently this time, and are running fully virtual full school days: Microsoft Teams meetings from registration at 8:40 to the end of day at 15:20, with 10 minute breaks between six 45 minute lessons and 30 minutes for lunch. 17yo (who also gets free periods so doesn’t have quite so intense a schedule) was delighted. 15yo was in despair, so anxious at the prospect of having their camera on during lessons that they couldn’t sleep and were have panic attacks in the middle of the night. Fortunately the school have been understanding, letting them attend without the camera on. They seem to be coping well and are particularly excited about the prospect of extra Further Maths lessons starting up again. What a nerd.

We are extremely lucky that the kids are old enough to mostly look after themselves - the biggest disruption is that I’ve had to set up a desk in the corner of the living room so the 17yo can work from the dining room, and occasional interruptions to ask for help scanning something. I cannot imagine how difficult the lockdown must be for those with younger children.

News came through today that both my parents have now had their first dose vaccinations, as have Bill’s. There’s light at the end of the tunnel.

TV I’ve been watching:

  • How to Get Away with Murder - so insanely convaluted that we shout “plot twist!” at the end of every episode but I love seeing the combination of competence and vulnerability in Viola Davis’ portrayal of Annalise Keating
  • The Expanse - love Amos
  • Wandavision - 1st episode hmm, 2nd episode huh, 3rd episode ok…, 4th episode oohhhh
  • Criminal Minds - completed series 14 with 15yo
  • Law & Order - needed a replacement for Criminal Minds, just started from the very first episode; “this really frosts my cookies” has entered into our lexicon

Films we watched remotely with my family (we have had weekly remote film nights since the start of the pandemic):

TikToks I enjoyed:

Video games I’ve been playing or watching others play:

  • Wandersong - my 17yo enjoyed playing this and it was a really lovely story
  • Wilmot’s warehouse - great fun coop with 17yo
  • Eastshade - really enjoyed this, and loved how this made me think about light and framing
  • Morkredd - starts off as a fairly standard cooperative puzzle game and slowly descends into something amusingly disturbing
  • Yes, Your Grace - haven’t finished this yet, but I’m enjoying the mix of story and resource management, and the feeling of real choices

Non-video games I’ve been playing:

  • I bought Bill a subscription to Boxed Locks - escape rooms in a box - for his birthday just before Xmas; we managed to skip some clues in this one due to managing to decode something before being given the key to do so, but enjoyed it

  • Leigh’s started running Masks for me and a few others, and we had a hilarious first session talking through our characters (I’m playing a Transformed called Myco, who is entirely made of mycelium, and interfaces with plants and electronics through her hyphae) and working out how we came together as a team

Mental health

I did have a bit of a blip in the middle of the month when I got upset about some continuing (outside of work) interactions that land with me like trolling/stalking/harassment but probably aren’t intended as such and probably no one else would see as such. I chatted with my 17yo about it and she was very wise, kind and validating, urging me to see that it’s ok for me to feel how I feel regardless of how it’s meant or what others might perceive, and consider the options about what to do about it. This is one reason I prefer to spend time on TikTok than other social media platforms at the moment.

More happily, I won’t go too far into it, but my 15yo self-diagnosed as autistic at the end of 2020, and it has honestly been brilliant over the last month, seeing them be happier in themselves, helping them to explore how their autism manifests, and recognising what they need to thrive. We’ve sometimes chatted on our walks (and sometimes not), more often through Signal messages, and have a nice evening routine involving stilton, procedural crime dramas and Bananagrams. I’ve learned to not get concerned about their stimming or worried about their need to be alone a lot of the time. I spent a lot of last year being really worried about their mental health (and feeling like a rubbish mum for not being able to reach them) so this has been a big change for the better. I’m really proud of them and glad that we live in a world where there’s increasing recognition, acceptance and support of neurodiversity.

Individual, collective and community interests in data

Dec 27, 2020

At the start of 2020, I wrote about community consent. I detailed some of the ways in which individual consent is failing, and why there’s a need for a more community-oriented approach to consent around the processing of data about people.

In the conclusion, I wrote:

Currently the media narrative about the use of personal data is almost entirely negative - to the extent that Doctor Who bad guys are big tech monopolists. Sectors, like health, where progress such as new treatments or better diagnosis often requires the use of personal data, are negatively impacted by this narrative, and can’t afford the widespread information campaigns that would shift that dial.

2020 wasn’t exactly the information campaign for community interests in data about people that I had in mind. But it has turned out that way: symptom tracking; testing, case, hospitalisation and death rates; research use of patient data; and mobility reports have become hugely significant as we navigate the Covid-19 pandemic. If nothing else, it has given us a set of examples to use when we talk about public benefit uses of data that everyone can relate to.

Community-level concerns are starting to be reflected at a policy level. For example, in July, an expert committee published a draft Non-Personal Data Governance Framework for India. Their definition of “non-personal” data includes anything that isn’t personally identifiable information (PII), which includes anonymised (de-identified) data about people. Their recommendations include defining data trustees to exercise community rights in this data.

The proposed European Data Governance Act is also significant in that it defines “data-altruism organisations” whose purpose is to make (usually personal) data available at scale for public good. The central idea is that people will donate data to these not-for-profits, who can then share it for research and other general interest purposes.

Population-level data

Salomé Viljoen’s paper Democratic Data: A Relational Theory For Data Governance really clarified for me why community interests are so important in an age of big data and machine learning. It tells a story of how existing data protection rights are built around the vertical relationship between ourselves and a data controller or processor. These made sense in a 1970s context, when an organisation (whether a bank deciding whether to grant you a loan or a government working out what tax you owe or benefits to give you) would primarily base their decision on what it knew about you and you alone.

Simple relationship between person and organisation: personal data goes to an organisation from a person; a decision by the organisation affects the person.

But nowadays, organisations make a lot more decisions using data, including much more trivial ones like what film they recommend you watch or what news and ads they show you. And these decisions are made using data about lots of other people as well as you: population-level data. As Salomé describes it, this sets up a horizontal relationship between a person affected by a decision (let’s call them victims) and the people whose data has also contributed to the results of that decision (let’s call them donors).

Complex relationship between people providing data to an organisation. Personal data goes from a donor to a database, to create population-level data that feeds into the organisation alongside personal data from a victim. There is a horizontal relationship between the donor and the victim.

Economists have also started to recognise these effects. For example, Dirk Bergemann, Alessandro Bonatti and Tan Gan term this “social data” in their paper The Economics of Social Data. (I don’t particularly like “social data” as a term because it’s also frequently used to mean specifically data from social media usage, when these horizontal relationships can occur with data from other sources, such as financial or health information.) In economic terms, the use of data about me to make decisions about you is an externality: a cost or benefit imposed on a third party (you) without their (your) agreement.

The big drawback of individualistic models of governance over data is that the interests of donors and victims in the above diagram are not necessarily aligned, which means that individual donor decisions can adversely affect the victims and they have no say in the matter. In Salomé’s paper, she gives a couple of examples: a community maintained database of tattoos used to infer gang membership of someone who has never contributed to that database; and a water usage database set up to benefit poorer households suffering from water shortages, that richer households might refuse to contribute to because they won’t benefit from it. (In this latter case, the term “victim” is misleading, as they would benefit from data use rather than being damaged by it.)

It’s worth noting that in a lot of cases, personal data from victims is used both as input for a specific decision and stored within a database for use in future decision making about both them and other people. The two roles – donor and victim – get wrapped up with each other. But this is not always the case, and I think separating them clarifies what’s going on and the way suggested interventions might work.

Individual interests

Seen like this, there is a clear argument for data governance at a population level, to manage how population-level data is used to make decisions about victims. But there are still individual interests at play too.

First, while an organisation might make a decision based on data about lots of other people as well, most decisions about you will still involve them using some information about you, to personalise their results. For example, Netflix might recommend films that lots of other people like to you, but they will also use your viewing habits to work out which people have similar tastes to you, and favour their preferences in the recommendations it makes. A government might use data about other people’s job seeking experiences to work out how much support to give you, but they will still factor in your characteristics, such as your profession and how long you’ve been out of work. Your rights over the data about you that is being used to personalise the outcome of this data processing are still significant and important.

Second, you will have an interest in and rights over data even if it’s only used to make decisions about other people. You are affected by what and how data is collected (the chilling effect of surveillance). You are affected if there is a data breach. You might also feel responsible for and a moral obligation around the downstream uses of that data (leading to both data altruism and an urge to delete data so that it can’t be used in damaging ways). You might feel exploited if an organisation profits excessively from its use, leading to a call for profit-sharing or monetisation of data.

Considering these different sets of individual interests and rights separately might be useful when thinking about the shape of data protection regulations for personal data.

Community and collective interests

Community interests kick in whenever a community (whether a small group or a whole society) is affected by the use of population-level data.

Consideration of community interests is not new, and they are recognised in existing legal frameworks such as GDPR, at least when it comes to data held by the public sector. Under GDPR, personal data processing is lawful when it is necessary to fulfil a public task. These are defined tasks that are in the public interest as specified in law. The fact that they have to be defined in law provides democratic data governance over population-level data, in Salomé’s terms. Good examples are the census and data collected to monitor public health.

As Anouk Ruhaak described in her piece on collective consent earlier in the year, there are also a growing number of non-public-sector organisations that are specifically set up to steward and share population-level data. The data-altruism organisations of the EC’s Data Governance Act will be among them. At ODI, we would frame these as data institutions that steward personal data.

There are important decisions to make about how to govern these data institutions.

First, just as there are two sets of individual interests at play, as described above, there are also two sets of collective interests: those of the donors and those of the victims. These will not necessarily align with each other. For example, victims may want sufficient detailed data to be collected to make decisions more accurate; donors may want to minimise the amount of data that is collected about them, to reduce intrusive surveillance.

Second, there are different potential relationships between the set of donors represented in a database and the set of victims who are affected by a particular use of that data:

  1. Donors and victims could be the same set of people, eg the use of data about Netflix customers by Netflix to customise its recommendations.

  2. Donors could be a subset of victims, eg a biobank containing genetic data about a sample of the population, that is then used to draw conclusions about the wider population.

  3. Donors could be a superset of victims, eg where population-level public health data is used specifically to identify and target vulnerable people with help and support.

  4. Donors could overlap with victims, eg if the mobility patterns of some users of Google Maps – perhaps those people that use public transport – are used to draw conclusions about everyone who uses public transport.

  5. Donors and victims could be different sets, eg if a facial recognition application trained on faces from one country is used to guess the emotions of people from another country.

Third, within each set of donors and victims there are likely to be divergent interests. It is in the nature of collective decisions they might not be in the interests of all the individuals belonging to that collective:

  • Donors will all naturally be concerned about the security of the data in the database. But they might have varying levels of concern about their obligations to contribute data over time (ie what and how data is collected about them); the value they receive in exchange for donating data (whether monetary, services, or a feel good factor); and the downstream uses of that data (different donors will find different causes more or less worthy). The mix of these concerns will change as new donors donate data into the database.

  • The set of victims will vary based on what the data is being used for: the population impacted by a particular set of decisions. Different sets of victims, generated by different sets of uses of the data, may have different interests. And within any given set of victims, there will likely be winners and losers – for example, if data is used to determine insurance premiums, some people who have to pay higher contributions and some who can pay lower ones – leading to different individual interests.

Data institutions stewarding personal data have to decide which of these varying sets of interests are going to take priority. For example:

  • A personal data store platform might choose to ignore the interests of anyone but individual donors, giving each donor the choice to opt in or out of particular uses, with each application therefore getting access to only a subset of a wider database. Victims have no control under this regime, although the information provided to donors to help them decide what to do could include information about victim benefits and protections.

  • A data union might be set up such that donors are a subset of victims – for example those Uber drivers who choose to donate their ride data into a joint dataset – with donors voting on uses of that data and assumed to represent the wider interests of all Uber drivers (victims). Sometimes these votes will result in outcomes that some donors are not satisfied with. The donors might not in fact be representative of the larger set of Uber drivers, and thus make decisions about the use of that data that favours them over others.

  • A biobank might be set up to have some decisions – such as the data collection process and the charges made for access to data – being made by donor representatives, while other decisions – such as who gets access to the data and for what purposes – are made by patient and public (victim) representatives. Both donor and patient/public representatives will need to exercise judgement about how they mirror the interests of the sets of people they are supposed to represent.

  • Most censuses are in a situation where the donors and victims largely overlap (they’re not quite identical because as time goes on, some donors die, they are no longer victims; as babies are born, decisions are made about them without them having contributed data – there are victims that were not donors). The interests of both – in particular weighing the intrusiveness and depth of the questions asked and the utility of that data – are balanced through democracy and consultation, and there will always be groups who are unhappy about the outcome.

Being explicit about whose interests are considered and prioritised through data governance is essential to avoid surprises. Promising both full individual control and public interest outcomes is likely misleading.

Final thoughts

A few random other thoughts that arise from this way of breaking down interests in personal data.

First, I think that recognising that individuals and sets of donors and victims have different, and sometimes conflicting, interests has implications for fiduciaries, such as the trustee of a Data Trust. As I understand it, a fiduciary is bound to act in the best interests of the beneficiaries that they represent. In many ways, they have less flexibility than we do as individuals to sometimes choose to sublimate our selfish interests in favour of others. So any fiduciary arrangement will have to be particularly specific about who is supposed to benefit. It would be useful, when discussing the benefits and drawbacks of fiduciary arrangements, to be explicit about the target beneficiaries as I believe different choices have different consequences.

Second, I wrote above that promising full individual control and public interest outcomes is likely misleading, but there are many fields – including public health medicine and opinion polling for example – where self-selected samples are sufficient and commonly used for drawing broader conclusions about a population. To make this work, you need to have a large enough sample; meet some minimum level of representativeness across important characteristics; collect enough demographic information to be able to weight and adjust data from different subgroups; and ensure anyone using the data has the statistical skills to do this adjustment to avoid faulty conclusions. (You also have to assume that the willingness to contribute data is not itself a correlate of the outcome you’re trying to understand or predict.) This implies that individual control and public interest outcomes can co-exist if you collect sufficient detailed and sensitive demographic information (age, gender, ethnicity, location etc) to assess representativeness and do the necessary weighting. The characteristics that matter will vary by domain, as will the ease of collecting them. For example, demographic data is collected quite naturally in health, but still might be important but less likely to be gathered when looking at energy usage.

Third, in a fuller picture of rights and interests, there needs to be consideration of the interests of the organisations stewarding and using population-level personal data. There are costs involved in collecting, maintaining, securing and sharing data; there is income to be made from creating services and from automating decisions. These organisational interests aren’t necessarily aligned with the interests of donors or victims. I’ve focused on the interests of people, communities and society here, but it is necessary to think about organisational/private interests as well as public ones, and how they are balanced.

Finally, preferences around data governance arrangements are personal and political. I know I have a tendency to favour data governance that favours the victims rather than the donors, and collective control over individual control. So for example, I would look at the use of population-level data with society-level effects, such as micro-targeting of political advertising, and believe we need to have society-level governance (eg regulation via democratic government). I know others weight individual control and choice much higher than I tend to. This is one of the reasons why the design process for the data governance for a particular data institution (not just its operation) must itself be participative.

In summary, there seems to be growing recognition that our individualistic data rights frameworks are not on their own sufficient for dealing with population-level data, but there are still a lot of choices about what the alternatives look like. Thinking about patterns of data governance in terms of whose interests they promote – individuals or collectives, donors or victims – should help us to understand the consequences of those choices.

Missing data

Dec 22, 2020

I learned a lot this year about the impact of missing data or information about data from governments in a fraught environment.

Sometimes governments worry about making data or information public because they have concerns about the impact of doing so. Publishing data and information has a cost to it; technical costs may be low but person costs frequently aren’t, if you want to do it well. Governments might worry about the quality of data or the level of uncertainty around some figures. They might feel that the public or journalists will not be able to understand or communicate the reasons behind the numbers. Or the data might tell a story that they are themselves concerned about, and they know there will be follow up questions that they’ll have to handle.

But when governments make those decisions, they frequently fail to consider the impact of not making that data or information available, or doing so after a delay or in response to a fuss. They tend to think that doing nothing is cost free. It isn’t.

How not sharing data lands

People take the lack of data and information about data as a signal. The interpretation of that signal varies depending on the person and context. Here are some examples of the assumptions people will make.

The incompetence assumption

Some people will assume governments aren’t sharing data because they aren’t collecting it, because they are incompetent. Data is essential to keep a handle on an evolving environment, so people interpret not sharing data as a sign governments don’t have basic facilities in place to handle the situation. This can apply to whole datasets (eg case data, vaccination data) or when there is a lack of granularity within data (eg to low level geographies, or split by significant characteristics such as gender, age or ethnicity).

Concern that government don’t have their hand on the tiller leads people to distrust their judgement and route around your direction and guidance. It also may entail firefighting comms from government to reassure people they do have the situation under control.

Missing data is only one indicator of incompetence, of course, so publishing data is unlikely to entirely fix this problem. But minimising the number of incompetence indicators you’re displaying as a government is a good idea.

The prejudice assumption

When the data involved is most relevant to a particular community, particularly one that is historically neglected or maltreated, some people will assume governments aren’t sharing data because the needs and concerns of that community are not of interest to them, because they don’t care about that community due to prejudice.

People might assume governments aren’t sharing data because they’re not collecting it, which implies they don’t want to see what’s happening to that community. Or they might assume it’s because governments don’t think it’s worth sharing data with that community, because they do not think they are competent to be able to understand or use it. Both imply prejudice.

During the Covid-19 crisis in the UK we have seen this in relation to data about coronavirus in care homes (lack of care about old people), data about the impact on people of colour (racism), and data at local levels (London-centrism).

Concern that governments are prejudiced leads to people from affected communities distrusting their intentions in other areas. This can widen existing inequalities through differential public health and economic impacts when, for example, fewer people from those communities use an official app or they choose not to take up official economic or health support such as screening tests or vaccinations. Again, governments need to shoulder the cost of firefighting comms to counter these concerns when they hit the press.

And again, missing data is only one indicator of prejudice. That doesn’t mean governments shouldn’t try to address it.

The obfuscation assumption

Some people will assume that the reason data or information about data is missing is that there is something nefarious going on that governments are trying to hide, or that the data reflects poorly on them. We saw this in the UK when there were moves to stop publishing data daily early in the pandemic, and in reactions to the NHS’s relationship with Palantir and other big tech firms.

The concern that governments are hiding something leads people to assume the worst. This might be that the figures are really awful, or that they are sharing personal data with firms with poor reputations. Lack of data and information about data creates a Petri dish for misinformation and conspiracy theories, again increasing general distrust, reducing take up of official support, and leading to bad health outcomes and firefighting comms.

Maybe governments are actually doing something they feel ashamed of. It’s unlikely to be worse than the stories that people imagine in a fact vacuum, and governments would be able to manage the consequences a lot better with proactive, controlled comms.

How to manage not being able to share data

As I’ve described, people take the lack of sharing of data or information about data as a signal. So what should governments if they can’t share data because they actually don’t have it, or because it’s too difficult to publish, or because they have to prioritise other tasks?

If it’s not due to incompetence, prejudice, or a cover up, then governments need to communicate the real reason, rapidly and proactively. Ideally this should include a description of what they are doing about it. For example, how are they working to collect missing data? When do they think they will be able to share the data or information? If they are currently deprioritising this work, what are they prioritising instead, and when will they review that decision?

Missing data limits how well we can understand the world, and how informed our decisions can be. But the fact that it’s missing is also problematic because people interpret it as a signal. Governments, and other organisations, should take this into account when assessing when and whether to publish data and information about data, and invest in good proactive comms.

What to do about pro-bono data service offers

May 19, 2020

During the Covid-19 pandemic, many tech companies are offering pro-bono data services to public health organisations, governments, and communities. Their offers include free access to software and to people to help with managing data, analysing it, and building predictive models.

I don’t think it’s particularly worthwhile speculating on the internal motivations for these companies. I choose to believe they genuinely want to support the global effort to help societies and economies through this crisis, and are providing that help in the way they know how. Others believe it’s all about capturing markets, gaining access to personal data for other (nefarious) purposes, or developing their own intellectual property and capability. Probably the truth is somewhere between these two.

But it doesn’t matter: any organisation considering accepting such help has to consider the same set of things regardless, to retain public trust, safeguard its own future operations, and with an eye to the market effects it generates.

Before I dive into detail, just to say that any such arrangement needs to satisfy the basic hygiene factor of being a clear contractual relationship. Informal “partnerships” will leave you extremely exposed on many of the issues discussed here. And be aware that a pro-bono project is not free to you: even if you’re not paying them, you will be putting in time, effort and resources into the project. You have to consider the project – and whether the help you’re being offered is the best way of achieving its goals – in the round.

Retaining public trust

Public trust is important in the best of times, but it’s particularly important in situations where you want the public to pay attention to what you say and do what you tell them (eg stay home, install an app, report their symptoms accurately).

Most companies large enough to offer you pro-bono data services will have a bad reputation about their use of data. This reputation might have arisen due to actual security breaches, enforcement action by data protection regulators or previous dodgy deals. It might simply be that people are frightened because they know those companies have huge amounts of data about them already and they feel powerless. The bad reputation might be even more diffuse: about whether the company pays fair taxes or treats its workers well.

The point is you are not starting from a neutral position with the public: you’re starting from one in which the motivations of the companies offering support will be immediately questioned and treated with suspicion. The fact the services are offered for free makes this worse, not better: to the public and press this is a red flag about a hidden motive which probably involves little guys getting screwed over.

In making your decision about taking up an offer, you have to factor in the fact that countering this trust deficit will take time and effort on your part. This is a cost to weigh against the benefits of the services offered. The only way to counter the trust deficit, and protect your own reputation and the trust the public has in you as an institution, is clear, proactive, transparent communication and effective, representative, accountable governance – and even doing these isn’t guaranteed to work. You need an excellent comms team, proactively communicating about every aspect of the project. You need to draft in trusted external experts to oversee the work, and demonstrably listen and respond to their recommendations. All of this takes substantial effort and time. Don’t overlook or underestimate it.

You should operate under the assumption that every aspect of the deal you make will come to the surface eventually. The more you hide, and the more it feels like people have to fight to get hold of information about what you’re doing, the more time you will spend firefighting as they dig up dirt, and the more trust you will lose in the long term. Make sure you are completely satisfied that you can be open with the public about everything you’re doing. If you’re not comfortable doing that, it probably indicates that there’s an ethical problem somewhere in the deal that you need to resolve.

Deeper, though, than these aspects of reputation management, is the genuine issue of exploitation of data about the public and the efficacy and trustworthiness of the service being provided. Things to question here include:

Is this service genuinely useful? Is it something you would procure off your own bat, because you really need it? Or are you being offered a tech solutionist stab in the dark that might come with huge opportunity costs?

Are there any security issues, outside the control of the data services company, that you need to be aware of? For example, can you ensure data is stored in a way that provides protection against intelligence service intrusion?

How are the biases and discriminatory issues in the data service being handled? What other processes or practices will need to be in place to counter those biases and ensure that you’re not excluding people by relying on this solution?

How have you secured permission or authority for this arrangement? Is it within your standard organisational policies, good practices, or lawful constitution? If some of those constraints have been waived due to the crisis, do you still have a mechanism for consulting with affected people and communities (eg patient or public representatives) about the project?

Who will ultimately benefit from the data service and how? It is inequitable and unjust for data about one group of people to be used to build tools that are then used to benefit another group of people. This includes you as an organisation getting benefits from data that don’t somehow return to the community. How can you ensure that the people and communities the data is about also benefit from the intelligence and services that are built over that data?

Answering these questions should help you identify additional things you need to make clear in the contract (eg where data will be stored) and do as part of the project (eg provide access back to communities).

Ensuring sustainability

You need to think hard about what happens when the arrangement ends. If you become reliant on a data service that’s being offered to you for free now, you may be committing yourself to future costs. If you don’t construct the contract well, you might be left in the situation where your friendly pro-bono supplier suddenly has you over a barrel and can charge what they want for the service you now can’t do without. Don’t kid yourself that everything will disappear when the crisis lifts. Assume that it will stay in place, and create contractual protections around that assumption.

Make sure the contract contains provisions that mean you retain as much intellectual property (IP) as possible. You should get ownership of as much of the code, data and models that get created during the project as you can. Making that IP as open as possible (ie open source, open data or at least open codebooks and schemas, and open models and algorithms) and ensuring everything is publicly documented will help alternative suppliers to be able to understand the system before you even start tendering for it.

One area where data service suppliers may want to retain IP is in any AI or automated services they build. I think this is reasonable: you want alternative suppliers to offer better services, not necessarily exact replicas. Just make sure that you retain enough rights over things like training data such that an alternative supplier will be able to create their own equivalent or improved solutions. (And remember what I said above about thinking through who gets to benefit from what’s being built.)

The contract should also ensure that, at the end of the contract period, an alternative supplier (which could be you, if you take it in house) will have sufficient time and access to existing systems to be able to take over the service. Make allowances for a transition period during which the pro-bono supplier continues to run the service while the alternate supplier builds their solution. Include in the terms that the supplier needs to capture and supply any updates to the data they were originally furnished with. Include public documentation for the logic behind algorithms too, where that’s important.

In other words think deeply about your exit strategy before entering into the deal. Protect your future self.

Building the market

Any supplier who is offering pro-bono support is likely to already be in a good market position. Entering into an arrangement with them might well entrench that position. You’re giving them a reputational boost as well as building their internal capacity and possibly product set. Accepting an offer of pro-bono help from one company without having assessed their offer against those of other suppliers is not fair or open procurement.

However, in an emergency you might feel you don’t have time for a fair and open procurement process: you’re just choosing whether to take up this free offer or not. So it’s worth thinking of some ways to counter the market impact of that decision, and the costs of doing so, as you’re shaping the project and weighing up the deal.

Fortunately, if you’re doing the things described above you have a good foundation in place. To preserve trust, you’ve already dialed the transparency up to max, so other potential suppliers (which might range from smaller data companies through academics and civil society groups) know what you’re up to. You’ve already open sourced code, opened up the models and algorithms underpinning any solution, and made as much data as possible (or at least its descriptors) open for others. Now you need to actively encourage and enable other people to create alternatives to (the fun/innovative/interesting/capacity building parts of) what your pro-bono supplier is giving you.

Here, I’d suggest employing open innovation techniques and more specifically put out a data challenge. Describe what you’re doing and invite others to show you their best ideas and implementations. Provide a prize (perhaps from the money you’re saving because of that lovely pro-bono help) to give some motivation; or link up with a research council or philanthropic funder to provide supporting grants; or just rely on the fact curious hackers love to challenge themselves to find better ways to do things with technology, particularly if that involves saving the world.

If developing the kinds of data services you’re getting for free requires access to personal data, create synthetic datasets that mirror the important characteristics of those datasets without containing any real information about real people. Make those available to the challenge participants.

Showcase the best solutions. Build their reputation. Count it as a success if those solutions get incorporated into the offer of the competitors to your pro-bono supplier. Remember you’re building your future market through this process, as well as everyone else’s.

In summary, an offer of free help is never actually free, but it is possible to construct a project such that everyone gets to benefit from it. If it’s still worth going ahead with those additional costs taken into account, knock yourself out.

Community consent

Jan 17, 2020

If we accept that individual consent for handling personal data is not working, what are the alternatives?

I’m writing this because it’s an area where I think there is some emerging consensus, in my particular data bubble, but also areas of disagreement. This post contains exploratory thinking: I’d really welcome thoughts and pointers to work and thinking I’ve missed.

The particular impetus for writing this is some of the recent work being done by UK regulators around adtech. First, there’s the Information Commissioner’s Office (ICO) work on how personal data is used in real-time bidding for ad space, which has found that many adtech intermediaries are relying on “legitimate interests” as the legal basis for processing personal data, even when this isn’t appropriate. Second, there’s the Competition and Markets Authority (CMA) interim report on the role of Google and Facebook in the adtech market, which includes examining the role of personal data and how current regulation affects the market.

But there’s other related work going on. In health, for example, there are long running debates on how to gain consent to use data in ways to improve healthcare (by Understanding Patient Data, medConfidential, etc). In transport or “Smart Cities” more generally, we see local government wanting to use data about people’s journeys for urban planning.

Personal data is used both for personalising services for individuals and (when aggregated into datasets that contain personal data about lots of people) understanding populations. The things that data is used to do or inform can be both beneficial and harmful for individuals, for societies and for particular groups and communities. And regardless of what it is used for, the collection of data can in itself be oppressive, and its sharing or sale inequitable.

The big question of this data age is how to control and govern this collection, use and sharing of personal data.

Personal data use entails choices

Going back to first principles, the point of collecting, using and sharing data is ultimately to inform decision making and action. There are always a range of options about what data to use for any analysis, with different data providing more or less accuracy or certainty when answering different questions. For example, the decision about what ads to serve within a story in an online newspaper could be based on any combination of the content of the story; knowledge about the general readership of the newspaper; or specific personal data about the reader looking at the page, their demographics and/or browser history.

There are also options when it comes to the architecture of the ecosystem supporting that analysis and use: how data is sourced and by whom, who performs which parts of the analysis and therefore what data gets shared with whom and in what ways.

Curious hackers like me will always think any data analysis is worth doing on its own terms - just because it’s interesting - and aim to use data that gives as accurate an answer as possible to the question posed. But there are damaging consequences whenever personal data is collected, shared and used: risks of data breaches, negative impacts on already excluded groups, the chilling effect of surveillance, and/or reduced competition, for example.

So behind any analysis there is a choice - an assessment that is a value judgement - about which combination of data to use, and how to source, collect and share it, to get value from it.

Our current data protection regime in the UK/Europe recognises some purposes for using data as being inherently worthwhile. For example, if someone needs to use or share personal data to comply with the law, there is an assumption that there is a democratic mandate for doing so (since the law was created through democratic processes). So for example, the vast data collection exercise that is the census has been deemed worthwhile democratically, is enshrined in law, and has a number of accountable governance structures in place to ensure it is done well and to mitigate the risks it entails.

In many other places, though, because people have different values, preferences and risk appetites, our current data protection regime has largely delegated making these value judgements to individuals. If someone consents to data about them being collected, used and shared in a particular way, then it’s deemed ok. And that’s a problem.

There are some very different attitudes to the role individuals should play in making assessments about the use of data about them. At one extreme, individual informed consent is seen as a gold standard, and anything done with personal data about someone that is done without their explicit consent is problematic. At the other extreme, people see basing what organisations can do on individual informed consent as fundamentally broken. There are a number of arguments for this:

  1. In practice, no matter how simple your T&Cs, well designed your cookie notices, or accessible your privacy controls, the vast majority of people agree to whatever they need to in order to access a service they want to access and never change the default privacy settings.
  2. Expecting people to spend time and effort on controlling their data shadows is placing an unfair and unnecessary burden on people’s lives, when that burden should be on organisations that use data to be responsible in how they handle it.
  3. Data value chains are so complex that no one can really anticipate how data might end up being used or what the consequences might be, so being properly informed when making those choices is unrealistic. Similarly, as with the climate crisis, it is unfair to hold individuals responsible for the systemic impacts of their actions.
  4. Data is never just about one person, so choices made by one individual can affect others that are linked to them (eg your friends and family on Facebook) or that share characteristics with them (we are not as unique as we like to think: data about other people like me gives you insights about me).
  5. Data about us and our communities is collected from our ambient environment (eg satellite imagery, CCTV cameras, Wifi signals, Streetview) in ways that it is impractical to provide individual consent for.
  6. The biases in who opts out of data collection and use aren’t well enough understood to compensate for them, which may undermine the validity of analyses of data where opting out is permitted and exacerbate the issue of solutions being designed for majority groups.

The applicability of these arguments varies from case to case. It may be that some of the issues can be mitigated in part through good design: requesting consent in straight forward ways, the availability of easy to understand privacy controls, finding ways to recognise the consent of multiple parties. All these are important both to allow for individual differences and to give people a sense of agency.

But even when consent is sought and controls provided, organisations handling personal data are the ones deciding:

  • What they never do
  • What they only do if people explicitly opt in (defaulting to “never do”)
  • What they do unless people opt out (defaulting to “always do”)
  • What they always do

Realistically, even with all the information campaigns in the world, and even with excellent design around consent and control over data, even with personal data stores and bottom up data trusts, the vast majority of people will neither opt in nor opt out but stick with the defaults. So in practice, even where a bit of individual choice is granted, organisations are making and will continue to make those assessments about the collection, use and sharing of data I talked about earlier.

Individual consent is theatre. In practice it’s no better than the most flexible (and abused) of legal bases for processing data under GDPR - “legitimate interests” - which specifically allows an organisation to make an assessment that balances their own, a third party’s, or broader public interest against the privacy interests of data subjects. Organisations that rely on consent for data processing don’t even have to demonstrate they carry out the balancing tests required for legitimate interests.

I do not think it’s acceptable for organisations to make decisions about how they handle data without giving those affected a voice, and listening and responding to it. But how can we make sure that happens?

Beyond individual consent, mechanisms for ensuring organisations listen to the preferences of people they affect are few and far between. When they rely on legitimate interests as their legal basis for processing data, ICO recommends organisations carry out and record the results of a Legitimate Interests Assessment (LIA). This is a set of questions that prompts an organisation to reflect on whether the processing has a legitimate purpose, that personal data is necessary to fulfill it, and considers the balance of that purpose against individual rights.

But there is no requirement for LIAs to be contributed to, commented on, or even seen by anyone outside the organisation. The only time they become of interest to the ICO is if they carry out an investigation.

I believe that for meaningful accountability, organisations should be engaging with people affected by what they’re doing whenever they’re making assessments about how they handle personal data. And I think (because data is about multiple people, and its use can have systemic, community and society-wide effects) this shouldn’t just include the direct subjects of the data who might provide consent but everyone in the communities and groups who will be affected. Carrying out this community-level engagement is completely compatible with also providing consent and control mechanisms to individuals, where that’s possible.

There are lots of different approaches for engagement:

  • Publishing and seeking comment on LIAs or similar
  • Ethics boards that include representatives from affected groups, or where appropriate elected representatives (eg letting local councils or parliament decide)
  • Carrying out targeted research through surveys, interviews, focus groups or other user research methods
  • Holding citizen juries or assemblies where a representative sample of the affected population is led through structured discussion of the options

I note that the Nuffield Foundation has just put out a call for a review of the evidence on the effectiveness of public deliberation like this, so aware I’m not being rigorous here, but it seems to me that in all cases, it’s important for participation to be and to be seen as legitimate (for people in the affected communities who are not directly involved to feel their opinions will have been stated and heard). It’s vital that organisations follow through in using the results of the engagement (so that it isn’t just an engagement-washing exercise). It’s also important that this engagement continues after the relevant data processing starts: that there are mechanisms for reviewing objections and changing the organisation’s approach as technologies, norms and populations change.

The bottom line is that I would like to see organisations being required to provide evidence that their use of data is acceptable to those affected by it, and to important subgroups that may be differentially affected. For example, if we’re talking about data about the use of bikes, the collection and use of that data should be both acceptable to the community in which the monitoring is taking place, and to the subset who use bikes.

I would like to see higher expectations around the level of evidence required for larger organisations than smaller, and higher expectations for those that are in the advantageous position of people not being able to switch to alternative providers that might have different approaches to personal data. This includes both government services and big tech platform businesses.

Perhaps, at a very basic and crude level, to make this evidence easier to create and easier to analyse, there could be some standardisation around the collection of quantitative evidence. For example, there could be a standard approach to surveying, with respondents shown a description of how data is used and asked questions such as “how beneficial do you think this use of data is?”, “how comfortable would you feel if data about you was used like this?”, “on balance, do you think it’s ok to use data like this?”. There could be a standard way of transparently summarising the results of those surveys alongside the descriptions of processing, and perhaps statistics about the percentage of people who exercise any privacy controls the organisation offers.

Now I don’t think the results of such surveys should be the entirety of the evidence organisations provide to demonstrate the legitimacy of their processing of personal data, not least because reasoning about acceptability is complex and contextual. But the publication of the results of standard surveys (which could work a bit like the reporting of gender pay gap data) would furnish consumer rights organisations, the media and privacy advocates with ammunition to lobby for better behaviour. It would enable regulators like ICO to prioritise their efforts on examining those organisations with poor survey results. (Eventually there could be hard requirements around the results of such surveys, but we’d need some benchmarking first.)

Whether this is realistic or not, I believe we have to move forward from critiquing the role of individual consent to requiring broader engagement and consent from people who are affected by how organisations handle data.

Possible effects

Let’s say that some of these ideas were put into place. What effects might that have?

The first thing to observe is that people are deeply unhappy with how some organisations are using data about them and others. This is particularly true about the tracking and profiling carried out by big tech and adtech and underpinning surveillance capitalism. ODI’s research with the RSA, “Data About Us”, showed people were also concerned about more systemic impacts such as information bubbles. Traditional market drivers are not working because of network effects and the lack of competitive alternatives. If the organisations involved were held to account against current public opinion (rather than being able to say “but our users consented to the T&Cs”), they would have to change how they operated fairly substantially.

It’s clear that both ICO and the CMA are concerned about the impact on content publishers and existing adtech businesses if the adtech market were disrupted too much and are consequently cautious about taking action. It is certainly true that if adtech intermediaries couldn’t use personal data in the way they currently do, many would have to pivot fairly rapidly to using different forms of data to match advertisers to display space, such as the content and source of the ad, and the content of the page in which it would be displayed. Facebook’s business model would be challenged. Online advertisers might find advertising spend provides less bang for buck (though it’s not clear to me what impact that would have on advertising spend or balance across different media). But arguments about how to weigh these potential business and economic consequences against the impacts of data collection on people and communities are precisely those we need to have in the open.

I can also foresee that a requirement for community consent would mean wider information campaigns from industry associations and others about the benefits of using personal data. Currently the media narrative about the use of personal data is almost entirely negative - to the extent that Doctor Who bad guys are big tech monopolists. Sectors, like health, where progress such as new treatments or better diagnosis often requires the use of personal data, are negatively impacted by this narrative, and can’t afford the widespread information campaigns that would shift that dial. If other industries needed to make the case that the use of personal data can bring benefits, that could mean the conversation about data starts to be more balanced.

Finally, and most importantly, people currently feel dissatisfied, that they don’t have control, and resigned to a status quo they are unhappy with. Requiring community consent could help provide a greater sense of collective power and agency over how organisations use data that should increase the levels of trust they have in the data economy. I believe that is essential if we are to build a healthy and robust data future.