Mireille Hildebrandt. Professor of Smart Environments, Data Protection and the Rule of Law at the Institute for Computing and Information Sciences (iCIS) at Radboud University Nijmegen
Slaves of Big Data. Are we?
Big data says that n = all.
We are worshipping big data, believing in it, as if it was “Godspeech” that cannot be contested. But it is developed by governments, businesses and scientists.
Defining Big Data
Things one can do at a large scale that cannot be done at a smaller scale, Mayer-Schönberger & Cukier.
The non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data, Fayyad et al.
We assume that machines can learn. Despite this being true, the question is whether us (humans) can anticipate this learning and make it useful.
Normally, quantitative research means assuming a hypothesis, a test upon a representative sample, and trying to extrapolate the results of the sample on the general population. But big data, as the sample grows, uncertainty reduces. This has implied a movement towards ‘Datafication’, where everything needs to be recorded so that afterwards it can be treated. We translate the flux of life into data, tracking and recording all aspects of life.
Exploiting these data is no more about queries, but about data mining. And creating ‘data derivatives’ which are anticipations,
present futures of the future present (Elena Esposito). And it is also about a certain “end of theory” (Chris Anderson) where a pragmatist approach makes us shift from causality (back) to correlation. We also move away from creating or defining concepts, which in turn shape the way we understand reality. We move from expertise to data science. Are we on the verge of data dictatorship? Is this the end of free will?
There are novel inequalities dues to new knowledge asymmetries between data subjects and data controllers.
What does it mean to skip the theory and to limit oneself to generating and testing hypotheses against ‘a’ population?
What does the opacity of computational techniques mean for the robustness of the outcomes?
Personal data management in the age of Big Data
Should we shift towards data minimisation? Towards blocking access to our data?
New personal data typology needed for data protection: volunteered data, behavioural data, inferred data.
The ones performing profiling after Big Data, for data protection to be a protected right, these agents should provide all kinds of information on how this profiling is being made, with especial attention to procedures and outcomes.
If we cannot have “privacy by design” we should have personal data management, context aware data management.
Personal data management: sufficient autonomy to develop one’s identity; dependent on the context of the transactions; enabling considerations of constraints relevant to personal preferences (Bus & Nguyen).
A rising problem: we will eventually be able to market data and make profits from it. Do we have any ethical approach towards the way these data were obtained? or how and when were they created?
Who are we in the era of Big Data?
Imperfection, ambiguity, opacity, disorder, and the opportunity to err, sin, to do the wrong thing: all of these are constitutive of human freedom, and any attempt to root them out will root out that freedom as well, Morozov.
So, where is the balance between techno-optimism and techno-pessimism?
Luhmann’s Double Contingency explains how there is a double contingency in behaviour and communications, as one can choose from different options how to act, but this action will have different ranges of reactions on third parties. Are we preserving this double contingency in the era of Big Data? Do big data machines anticipate us so much that we get over contingencies at all?
What if it is machines that anticipate us? What if we anticipate how machines anticipate us?
- Datafication multiplies and reduces: N is not all, not at all.
- We create machines like us… do we increasingly are like machines?
- Monetisation as a means to reinstate double contingency. Total transparency by means of monetisation.
Can we move from overexposure to clair obscure? Can we build data buffers to avoid overflow?
Chris Marsden: How can we make policy-makers take into account all these important issues? Hildebrandt: there are two big problems with Big Data (1) enthusiasm that Big Data will solve it all and (2) huge business opportunities of business around Big Data. There sure is a way to “tweak” business models so that privacy and business opportunities and innovations can actually coexist. Capitalism sure has ways to achieve a balance.
Hildebrandt: big data, monetisation, etc. will surely change the ones we are. The question being: should we be always the ones we are? Instead of privacy, we should maybe shift to concentrating on transparency, on knowing what things happen and why things happen. For instance, can we put more attention on what we need to solve and not in what solutions are available? That is also transparency.