HomeSymposiaHow should we regulate LLM chatbots? Lessons from content moderation

Related Posts

How should we regulate LLM chatbots? Lessons from content moderation

Reading Time: 6 minutes
Print Friendly, PDF & Email

* This blog post adapts text from a recent submission to the Inquiry on Large Language Models by the House of Lords Digital & Communications Committee, prepared with Professor Marc Stears at the UCL Policy Lab.

One of the most common uses of Large Language Models (LLMs) is to power chatbots – an interface where users can receive AI-generated responses to queries, such as Open AI’s GPT-4. While these LLM-powered chatbots have potential to expand information access, accelerate knowledge dissemination, and (when curated well) promote a plurality of perspectives, this potential may be outweighed. There is a considerable risk that LLM chatbots will produce speech that causes or could cause harm—to individuals and to democratic institutions. To address this issue, both companies and governments must take responsibility for preventing the spread of harmful speech through LLM chatbots. To that end, core lessons from the regulation of content moderation on social media platforms are indispensable.

What are the risks posed by LLM chatbots?

Unconstrained LLMs risk producing many of the same varieties of harmful speech that have toxified social-media platforms. A key concern is misinformation. Notoriously, LLMs are not programmed to track the truth; they draw on the vast training data of fallible internet text – a process that can often result in inaccurate responses (including “hallucinations” in which LLMs fabricate new falsehoods). There is an obvious danger that as chatbots enter wider use, users will assume that their responses are necessarily true, or that chatbots will be maliciously prompted into generating material for disinformation campaigns – producing harmful content that is cheaper and potentially more believable that what humans can produce. Misinformation and disinformation can harm individuals (for example, with medical misinformation); it can undermine the fairness of political processes (as with electoral misinformation); and it can undermine democracies’ information ecosystems by subverting norms of truth and trustworthiness.

Additionally, LLMs can generate other forms of harmful speech with which social-media networks have grappled. Unconstrained LLMs, fed training data replete with prejudice and bigotry, are at risk of generating hate speech, reinforcing discriminatory attitudes and undermining democratic values of equal respect. Unconstrained LLMs can also be abused to produce various forms of dangerous guidance—from bombmaking instructions, to tips on how to self-harm. These examples only scratch the surface of the potential harms posed by existing LLMs, with other potential harm likely as these models evolve.

What are the ethical responsibilities of LLM firms?

The core philosophical insight underpinning our approach is this: Artificial speakers should be constrained by the same principles that ought to constrain human speakers. It is uncontroversial that human speakers have ethical responsibilities not to cause intolerable harm through their communications, either to individuals or to democratic institutions. It is this responsibility that explains why incitement to violence and hate speech, for example, fall outside the protective scope of freedom of expression.

These same responsibilities should guide and constrain the design and regulation of artificial speakers such as chatbots. Chatbots are not moral agents themselves, of course, but their designers are. Like any company releasing products, firms working at different stages of the AI value chain have an ethical duty of care toward their consumers—and to the broader public—to reduce the risk that their products will cause harm. They can do in three main ways: improving training data; improving content moderation; and improving public understanding.

First, LLM firms can reduce the risk of harm is to limit the harmful content that their foundation models produce in the first place. LLM developers should work to exclude harmful content from their training data. More experimentally, firms might explore efforts to implement forms of “Constitutional AI” (as with Anthropic’s Claude chatbot), whereby models are explicitly programmed with clear constitutional values (although the effectiveness of this approach requires further study).

Second, LLM firms must also establish and enforce standards for what counts as unacceptable content, filtering their models’ outputs accordingly. Such a system is analogous to the private governance systems that major social-media firms have devised and implemented to regulate the content on their platforms. Just as social-media platforms have progressively built more powerful and precise machine learning classifiers to catch harmful content, so too must LLMs. Moreover, given the impact these models may have on public discourse, firms have an obligation to be transparent about what their rules are and how they are enforcing them. The major firms have (embryonic) versions of these systems, as with OpenAI’s Usage Policies (though this framing misleadingly places the onus on users, as if depressed teenagers searching for self-harm guidance were to blame instead of the system’s designers).

Third, LLM firms can further reduce the risk of harm posed by their models by enhancing public understanding of how their systems work and what their appropriate usages are, including through the use of improved disclaimers and caveats, and refinement of their chatbots to reflect greater humility and less confidence on reasonably contestable issues (as in, for example, a responsible Wikipedia article). Models can also be better trained to provide answers that do not confidently assert a single response but instead “map the debate” on a given question, which is especially important for philosophical, moral, and religious issues.

How should governments regulate LLM chatbots?

While it is crucial that firms live up to their ethical responsibilities and adopt self-governance measures, it is also the role of governments and lawmakers to adopt policies and introduce regulation to create the right incentives for companies to do so. LLM chatbot services (like ChatGPT) arguably fall outside the scope of content moderation regulation, such as the EU Digital Services Act (DSA) and the upcoming UK Online Safety Act. As currently designed, they address user-to-user services and search engines – not matching the business models of chatbots specifically. However, there are important lessons to be learnt from the evolution of social media regulation, and the risk-based approach that has emerged in recent years.

While social media platforms have progressively developed comprehensive systems of private governance, in the LLM space we are seeing only the initial stages. LLM firms are far from having put in place a fully-fledged content moderation system and much more work needs to be done. While some platforms (such as OpenAI) have published cursory “usage policies,” we lack transparency on how these rules are enforced. There is nothing remotely akin to Meta’s Oversight Board to offer serious reflection and scrutiny. We also lack information as to systems’ effectiveness, especially given rampant anecdotal evidence of “jailbreaking”. Given that firms will not always have strong incentives to moderate content, or to explain to the public how they are doing so, democratic oversight of this process is essential.

LLMs chatbots firms should be legally required to undertake risk assessments for their LLM products, publish their risk evaluation and proportionate mitigation strategies, and be subject to oversight and evaluation by a regulatory body, which would be responsible for evaluating the adequacy and effectiveness of firms’ efforts. This oversight, when pursued in a proportionate and cooperative manner, need not hamper innovation. As with social media, content moderation will never be perfect—false positives and false negatives are inevitable—but regulation can ensure LLMs firms have a coherent approach and are sticking to it. (Because the goal of regulation is not stifling innovation or chill the potential benefits of generative AI for our economy and democracy, firms’ duty is to reduce the risk of harm, rather than to eliminate the risk of harm.)

Moreover, LLM risk assessments should transcend a myopic focus on currently illegal speech. While overzealous regulation of LLMs could still trigger free-speech concerns due to its impact on audiences and users, focusing on illegal content is an unduly narrow focus for LLM regulation. Even if humans should remain free to communicate misinformation (with some exceptions), the free-speech argument is comparatively weaker for chatbots. Even if LLMs are producing text that could be criminal if communicated by a human, chatbots (and their designers) lack the requisite mens rea (mental element, e.g. intention) for criminal speech.

The risks of harmful speech we have flagged here are purpose-insensitive, in that they do not depend on the goal for which the LLM is deployed. Even if some governments are correct in their concern to future-proof AI regulation so that it is not confined to a specific type of technology or sector, regulation should not shy away from stablishing rules that mitigate the harms that are cross-sectoral and purpose insensitive. Take the case of the UK.  In order to emphasise an innovation-friendly approach to regulation, the Government’s AI White Paper emphasises that it “will not assign rules or risk levels to entire sectors or technologies”. But this is far too limited; it does not account for the cross-sectoral risks of harmful speech produced by LLM chatbots. LLM technology engaging with public audiences for any purpose carries risks of serious harm. LLM outputs should never include hate speech or dangerous misinformation, regardless of whether they are being used as a search engine chatbot, customer-service chatbot, a medical adviser, or an email-writing assistant.

Timing also matters. Governments should not wait, but be proactive to prevent the spread of harmful speech through LLMs. The delay in demanding proactive content moderation for harmful speech on social-media platforms illustrates the perils of a “wait and see” approach. It was a serious error in that domain, and we should not repeat that mistake in the LLM space.

Conclusion

Unconstrained LLMs may generate harmful speech, including misinformation, hate speech, and self-harm guidance. Such speech can endanger individuals, erode trust in democratic institutions, and undermine the overall health of the digital public sphere. Consequently, firms designing and releasing LLMs have ethical responsibilities to mitigate these risks, which requires firms to reduce harmful content in training data, and to define and enforce clear rules for the content generated by their systems.

For governments, the time is ripe to introduce legal measures for the safe development and use of this technology while conferring regulators with the statutory powers needed to supervise these evolving technologies. Specifically, they should introduce legal responsibilities for firms that design and release LLM chatbots, to require them to conduct risk assessments and develop proportionate mitigation strategies.

But constraining LLMs to prevent harmful content production is the beginning—and not the end—of a theory of LLM firms’ and governments’ responsibilities. There is a strong argument for redesigning LLMs to actively promote democratic values, ensuring the inclusion of diverse perspectives and voices. Democracies thrive on a diversity of sources of intelligence, and chatbot responses should align with this pluralistic approach (rather than reinforcing a single worldview). This mirrors the ongoing challenges for social-media platforms, where, even after determining which speech to disallow, the question remains of which perspectives or voices should be algorithmically amplified – a question that also arises with generative AI.

Jeffrey Howard
Associate Professor

Jeffrey Howard is an Associate Professor of Political Philosophy & Public Policy at UCL and Director of the Digital Speech Lab. He is Director of the Digital Speech Lab, which hosts research projects on the proper governance of online communications. He holds a DPhil from Oxford and AB from Harvard.

Beatriz Kira
Lecturer in Law

Beatriz Kira is a Lecturer in Law at the University of Sussex and a Senior Research Associate at the Digital Speech Lab at UCL. She holds a PhD in Economic Law from the University of Sao Paulo and an MSc in Social Sciences of the Internet from the Oxford Internet Institute.

[citationic]

Featured Artist