HomeSymposia“Ready or not, I come”: Web scraping vs extraterritoriality in the ChatGPT...

Related Posts

“Ready or not, [A]I come[s]”: Web scraping vs extraterritoriality in the ChatGPT and Clearview AI sagas, a tale on Digital Constitutionalism

Reading Time: 7 minutes
Print Friendly, PDF & Email

1. Prologue

Since the beginning of 2023, much attention has revolved around one of the most powerful and promising generative Artificial Intelligence (AI) tools accessible to the public. The large language model Chat Generative Pre-trained Transformer 4 (GPT-4) had just been released, but the capabilities of its predecessor GPT-3.5 were those that had the time to be more explored, sparking expectations, curiosity, and concerns.

To the prompt ‘How would you describe “yourself”?’, a ChatGPT-generated answer reads ‘[t]hink of me as a sophisticated computer program designed to process natural language and provide responses to the best of my abilities […]’ (OpenAI, 2023). The response was accompanied by a disclaimer acknowledging potential imperfections in the text generated. In fact, despite being trained on a ‘large and diverse dataset of text […] collected [scraped] from various sources, such as books, articles, websites, and more’ (OpenAI, 2023), the model still suffers from “hallucinations” – i.e. incorrect, false, or simply “made up” information provided to users.

However, as a machine-learning (ML) system, ChatGPT continuously improves its performance through interactions with OpenAI’s users. During the conversation, ChatGPT maintains a record of previous dialogue strings, thus enhancing the adherence of its outputs to [1] explicit feedback provided by users to one or more of its replies or [2] the general context emerging from the conversation. In this sense, similar to “Samantha”, the operating system (OS) in the sci-fi romance movie “Her” (Jonze, 2013) assisting the protagonist Theodore, users often overlook that while interacting with the system “they are not alone”. ChatGPT’s computational power enables a potentially unlimited number of simultaneous interactions. Thus, as a curious human being interacting with a broad range of different social and cultural contexts would do, also [3] exposure to pluralities of language inputs, patterns, topics, and writing styles ‘helps [ChatGPT to] improve [its] language understanding, generate more accurate and relevant responses, and learn to adapt to different communication styles’ (OpenAI, 2023).

Many average users would be surprised to learn that what they perceive as one-to-one “private conversations” actually are an entanglement of conversations with a super brain capable of chewing and digesting everything we type. And this realisation does not differ much from a key scene of “Her”, where the protagonist questions the OS about the “exclusivity” of their romantic relationship. In this context, some commentators argue that ChatGPT’s users are little more than ‘human guinea pigs in a gigantic global public experiment’ (Eliot, 2023).

While it is true that interactions with ChatGPT might not be “exclusive”, the reverse holds as well. In this sense, the general-purpose chatbot landscape encompasses competitors like Bard or personalised versions of Eliza. The latter has recently faced severe negative media coverage when a Belgian user allegedly committed suicide, having his eco-anxiety exacerbated by disheartening conversations with the chatbot. After some other controversies resulting in the impossibility for data subjects on the Italian territory to access ChatGPT, even alternatives like “Pizza GPT” popped up.

The following sections, viewed through the lenses of “Digital Constitutionalism”, will try to shed light on some aspects of these and other recent events, inviting you to reflect on the role different actors play in regulating emerging technologies with a significant impact on individual rights and society at large.

Here, “Digital Constitutionalism” refers to establishing and implementing measures aimed at ‘limiting abuses of power [deriving from design, development, and use of digital technologies] in a complex system that includes many different governments, businesses, and civil society organisations’ (Suzor, 2019).

2. Main plot and characters

2.1. The good guys – DPAs in action

This tale on “Digital Constitutionalism” starts on March 30 when the Italian Data Protection Authority (or Garante/IT DPA) imposed urgent measures on OpenAI (or the Company). The ground for these measures were potential infringements of Arts 5, 6, 8, 13, and 25 GDPR. The order prescribed an “immediate temporary limitation” on the processing of personal data concerning individuals established in Italy. Once again, as in the “toeslagenaffaire”, the current lack of specific regulations governing the design and use of AI proved the crucial role of data protection in safeguarding fundamental rights. And the Garante’s actions paved the way for further reactions by other EU DPAs, resulting in mid-April 2023 into the ‘launch [by the EDPB of] a dedicated task force to foster cooperation and to exchange information on possible enforcement actions conducted by data protection authorities’ (EDPB, 2023).

However, “web scraping” – involving the systematic collection of data from the internet based on pre-identified selectors – is not new to DPAs. In fact, also the Garante had already adopted different measures against the US-based company Clearview AI in February 2022. The measures included intimations not to process further data of subjects present on their territory, orders for the deletion of unlawfully processed data, and the imposition of significant administrative fines of several million euro. Clearview AI offers – mostly to law enforcement agencies – an “extraordinary” yet controversial Biometric Identification System (BIS) whose performance largely depends on a database of tens of billions of pictures scraped from the web social media and other platforms.

The ChatGPT and Clearview AI cases show similarities and differences, starting from the manner in which they were brought to public attention.

2.2. Some vital helpers – civil society and mixed reactions

In both cases, civil society played a crucial role in ringing the alarm bell and drawing attention to the potential implications of these tools. However, despite the positive engagement of “civic sentinels”, it is disheartening that out of 27 EU Member States, only a limited number of DPAs started investigations motu proprio and/or adopted measures to enforce current data protection standards.

From a slightly different angle, not all the bottom-up reactions these events sparked involved indignation and concerns for possible AI-based violations of fundamental rights. In this understanding and as emerging from the approach of the forthcoming EU AI Act (AIA), a key point is that of striking a fair balance between protecting fundamental rights and fostering innovation in the ongoing “AI race”. Instances of this are partly “dissenting opinions” on the IT DPA’s measures in the ChatGPT case, expedients – like VPN use to bypass imposed limitations – adopted by Italian users, or arguments invoking public safety to “justify” serious and mass-scale violations of the right to data protection in the name of the fight against crime, as exemplified by Clearview AI.

Relevant to this analysis is also that the implications the deployment of similar tools generate largely depend – also – on end users’ practices. To name a few examples, generative AI could assist us in performing various activities and help (cyber)criminals in carrying out illicit endeavours. In a similar fashion, advanced BISs could serve as a crime-solving tool or contribute to victimisation; like in cases of harassment, stalking, or doxing.

2.3. What about the “bad guys”? – private corps, public commitments, and “hidden” agendas

Similar to civil society, the conduct of private tech companies does not consistently follows unambiguous patterns. Generally, tech players prioritise a “release first, let the others question later” approach preferring speed over caution in technology design, development, and distribution. Yet, when potential adverse implications on fundamental rights and society arise, an inverse path based on a “precautionary approach” ought to be preferred. The thing is that once the “good guys” and the “helpers” of this tale voice concerns about the latest product/service made available, some tech companies try to re-align their behaviours to the “feedback” they received from public authorities, regulators, and the general public.

In the OpenAI case, there was a certain degree of cooperation with the Garante, recognising the necessity of precise rules for generative AI and demonstrating the willingness to engage with regulators in finding suitable solutions. However, news outlets suggest that the company sought to “water down” the AIA through lobbying efforts or by threatening to withdraw from the EU market.

On the other hand, Clearview AI repeatedly claimed ‘compliance with all the different privacy laws from around the world’ (CNN, 2020), disregarding administrative orders and fines – while continuing to scrape biometrics from the web. Meanwhile, as controversies about the use of BISs emerged, other companies in the field voiced opposition to facial recognition as an instrument for mass surveillance.

At this stage, the end of these sagas does not seem predictable. In both cases, without full-and-in-good-faith-compliance, measures extraterritorially imposed by EU DPAs would likely be disregarded. Also, the complete deletion of unlawfully obtained data would be impossible when this data is “embedded” into the output of ML systems trained on it – unless the algorithm is re-trained from scratches (Paolucci, 2023).

The cases at hand highlight the limited deterrent effect and enforceability of fines and orders issued against companies based outside the EU for GDPR – and prospectively AIA – violations, putting under question the effectiveness of unilaterally imposed regulations with extraterritorial effect.

3. In the midst of it all: Taking (some) power back – Epilogue(?)

Legislative initiatives such as the GDPR and the AIA do not seem to effectively achieve the protection of our fundamental rights. Instead, market-based attraction and economic interests drive voluntary compliance with EU standards among non-EU countries and companies. While EU legislative efforts appear admirable in their declared intentions, the cross-border protection of EU citizens’ rights in this field “indirectly” also pursues other ends. The affirmation of digital sovereignty over third actors through “fundamental rights manifestos” could involve “victories” in a multiplicity of sub-battlefields with political, geopolitical, and economic relevance.

Yet, to recognise the universality of the risks posed to international human rights standards by the ever-increasing development of AI systems is essential. And a genuine convergence towards values/rights-based regulations of emerging technologies would be preferable to the unilateral imposition of sub-regional standards. Pending the adoption of regulatory tools like the AIA and Council of Europe Framework Convention on AI, Human Rights, Democracy and the Rule of Law, interim measures to protect our rights include moratoria on the development and use of certain instruments or the anticipation of the effects of specific provisions – e.g. of the AIA – through negotiated “pacts” and/or agreements with third states and companies.

Considering deficiencies in enforcing standards of fundamental rights protection vis-à-vis current and looming threats to democratic values, human rights, and the rule of law, it is time for a wake-up call to individual action. Drawing inspiration from Stefano Rodotà’s ideas on the “costitutionalisation of the internet”, engaging in boycotts and making thoughtful consumer choices can partially address shortcomings in enforcement and effectiveness. By emulating the stances taken by European Constitutional Courts during the process of constitutionalisation of fundamental rights at the EU level, individuals/consumers/citizens could avoid transactions with companies whose business models openly threaten our rights and values ‘as long as’ other solutions prove their value or become available. This could empower individuals with the direct enforcement of “economic sanctions” resulting from the loss of data flows, subscriptions, and corresponding profits for such companies.

On their part, states could at least improve the fulfilment of their positive obligations under international human rights law, promoting and ensuring cultural rights, specifically through “digital literacy”. In a similar vein, public authorities could limit their “complicity” in AI-based rights violations by implementing effective standards and protocols for ‘robust due diligence, monitoring, or transparency’ in AI procurement (Hickok, 2022), in-house development, and usage.

In this race, we better run all: “Ready or not, [A]I come[s]”.

Francesco Paolo Levantino
Ph.D. Candidate at Sant’Anna School of Advanced Studies | + posts

Ph.D. Candidate in International and European Human Rights Law | AI, Modern Technologies, and Surveillance | Sant’Anna School of Advanced Studies (Pisa, Italy).


Featured Artist