HomePostsDigital RightsThe Costs of Training AI and the Impact on Fundamental Rights

Related Posts

The Costs of Training AI and the Impact on Fundamental Rights

Reading Time: 6 minutes
Print Friendly, PDF & Email

1) Introduction

A US-based company, Everalbum, provided customers with a cloud storage service, “Ever.” The company was sanctioned since it was found out that it deceived consumers about its use of facial recognition technology and its retention of the pictures and videos of users who joined the service and created an account.  Therefore, the Federal Trade Commission (FTC) issued an Order where Everalbum violated Section 5 of the FTC Act for misrepresenting its privacy practices. The most important aspect of this decision is the remedy chosen by the Commissioners. It was ordered the deletion of the illicitly collected biometric data (see also the FTC final decision against Cambridge Analytica, see Section IV – ‘Required deletion of data’, p. 5 of Docket no. 9383/2019).

Furthermore, “algorithmic disgorgement” (or algorithmic destruction) was mandated. By this name, the Authority meant an order that tackled the illicitly collected data by demanding the organization to destroy the models and algorithms the company developed by using the photos and videos uploaded by its users who were unaware of further processing (see p. 4 – 5 of the Order). The FTC decision is not just meaningful regarding consumer protection but is shedding light on several issues. The first one is related to the training of this technology, a deep neural network, meaning a complex artificial intelligence that resembles the reasoning of a human mind; the second issue regards the actual deletion of data from the model. As this post aims to demonstrate, ensuring the data’s complete erasure is challenging. Due to technical obstacles, the enforcement of the right to erasure, as protected by article 17 of General Data Protection Regulation (GDPR), is challenged. This aspect appears to be highly relevant in analyzing the Ever case and many other Data Protection Authorities’ sanctions where the deletion of the wrongfully processed or illicitly obtained data is mandated (see Registro dei provvedimenti, Order n. 50, published on the 10th, February 2022, against Clearview AI). In addition to a problem with the practical realization of the request, there looms a “lost in translation” issue, since the Authorities never specify which data set the companies should delete. Therefore, there is a substantial unasked and unanswered problem in all these examples: what happens to the collected data? How can the machine unlearn the data – whether illicitly obtained or not? How much does training an AI with the “wrong” data cost? 

2) “Lost in translation”: a primer on machine unlearning helplessness under article 17 GDPR

Machine learning outcomes are the result of statistical inferences. When machine learning systems are involved, another aspect that makes the request for deletion difficult is identifying the data set that should be ultimately deleted (see Thylstrup, 2022).

 The programmers of modern machine learning systems create data sets to be used as training data. On the basis of this data set – which can be filled with both personal and non-personal data –  the machine is requested to run the algorithm on the training data and to achieve a given goal by finding common patterns and producing a model that can be further deployed to achieve the ultimate goal and outcome. Hence, in this complex pattern, some questions emerge. When the data processor receives a request for erasure from the data subject, which kind of data is to be deleted? From which datasets? And, more importantly, how to obtain the erasure? Notably, in the case of different datasets, it is natural that personal data may be involved both in the training set and in the analysis set.  

It is to be said now that there are no crystal-clear answers to these questions. Some technical remedies are proposed, such as anonymizing data, functional encryption, selective amnesia, and model breaking. However, none of them seems to tackle the core of the problem: clarify the extent of the right to erasure from the machine learning perspective. A huge issue is to be found in the current lack of legal certainty as to how AI can be designed to comply with the regulation, given the specific features of data protection rights. Moreover, this tension between GDPR and machine learning happens since the latter are designed to render data (unilateral) modification difficult. This matter is hard to reconcile with the GDPR’s requirement that personal data be erased when specific circumstances apply. Hence, three main relevant conceptual uncertainties threaten data subjects’ rights and processors’ obligations. 

First and foremost, many uncertainties rely on the term “erasure.” Deleting data from machine learning data sets is burdensome since it implies retraining the entire model. It does not address the underlying problem of making sensitive data disappear or become untraceable. Secondly, it is challenging to demonstrate that the retrained model is fully “corrected”. Namely, it has been cleaned up from the wrongfully obtained data, and the biased ones are not reproduced. Technical factors and governance design thus burden the difficulty of complying with Article 17 GDPR. Indeed, even if there would be a means of ensuring compliance from a technical perspective, reaching out to all the datasets may be organizationally tricky.  Thirdly, because of a certain degree of unpredictability and autonomy is frequently challenging to find the liable party in the case of damage caused by artificial intelligence applications. In particular, the more challenging situations are those in which the outcome of the processing carried out by the artificial intelligence is not fully controllable a priori. Moreover, according to the principle of accountability is the processor’s duty to ensure  «taking into account state of the art, the costs of implementation and the nature, scope, context, and purposes of the processing, as well as the risk of varying likelihood and severity for the rights and freedoms of natural persons, the controller, and the processor, shall implement appropriate technical and organizational measures to ensure a level of security appropriate to the risk.» (art. 32 GDPR). 

Hence, the legislator delegates to the data processor the burden of identifying how to fulfil the requirements dictated by the rule, dropping them into the concrete case, and taking responsibility not only for implementation but also for evaluating the risks. Those aspects emerge when the processing is not linear and involve data controllers and sub-controllers since, often, their contracts establish the execution of some data subject’s rights, including the right to erasure. Therefore, the logic of accountability is challenged not only on the crowded level of responsibilities arising from the regulation but also in the case of assigning responsibilities to the presence of automated decision-making. Lastly, Article 17 reflects a sense of data memory that relies on humans, not the somewhat different machine memory. 

All these aspects listed here involve several uncertainties in interpreting and applying GDPR, especially the right to erasure. This creates a deficiency in the enforcement of articles 7 and 8 of the Charter and provokes a burden on the data processors, typically private actors, which are demanded to find a way to achieve this goal, technically and legally. As was previously mentioned, this opaque situation also affects the rights of the data processor. Particularly regarding facial recognition systems, it is possible to observe a trend followed by many European – and not only – authorities sanctioning companies that collected face data and asking for their deletion. Since, as noted above, it is highly complex to ensure erasure of some kind, such a sanction seems to exacerbate those opacities that already in themselves put the rights of individuals at risk, going to undermine the actual ratio for sanctions: namely, to be effective, proportionate, and dissuasive (see Article 29 Data Protection Working Party, ‘Guidelines on the application and setting of administrative fines for the purposes of the Regulation 2016/679’, adopted on 3 October 2017).

3) Conclusion

The aggressive remedy applied by the FTC seems to represent more problems than solutions by attacking the programmers and chilling innovation. This is not a viable solution, ultimately resulting in an aggressive act that does no good both to companies and data subjects. This remedy is neither practical nor dissuasive and not proportionate. Integration in the European context seems to be in direct contrast with the basis of the European cultural model, firmly based on the commensurate protection of fundamental rights. The further example of sanctions testified that the deletion problem is certainly two-fold and, at the moment, does not fulfil or protect the rights and interests of either the data subject or the data processors. Hence, it is true that, in light of the above, there is a need to develop new regulatory models that, together with privacy, make other values relevant.

The issue of deleting data from AI systems is not only concerning the application of one of the cornerstones of European data protection law, the right to erasure, but it is furthermore shedding light on an overlooked problem: what above was called “the costs of training data.” The massive collection of information that data capitalism leads to scarce consideration of the many costs of training machine learning systems. Monetary costs and climate costs. Training a single AI system can emit over 250,000 pounds of carbon dioxide. Using AI technology across all sectors produces carbon dioxide emissions compared to the aviation industry. Therefore, training AI systems with illicitly obtained data means not only a data protection infringement but also a waste of energy and potential damage to the climate. The fine-tuning of AI training is a comprehensive matter of fundamental rights since it implies the protection of data and privacy, and the ecosystem. However, the AI Act proposal fails to address any issue related to the risks of wrongfully training a system and its climate consequences. Thus, sustainability must be placed at the centre of the Union’s market and regulatory choices regarding the training and the governance of AI.

Federica Paolucci
Ph.D. Candidate at Bocconi University

Federica Paolucci is a Ph.D. Student at Bocconi University, Milan. Her research interests revolve around regulation of AI, biometric technologies, and surveillance.

[citationic]

Featured Artist