HomePostsPlatform GovernanceA robot and a moderator walk into a bar

Related Posts

A robot and a moderator walk into a bar

Reading Time: 7 minutes
Print Friendly, PDF & Email


Stop me if you have heard this one before: somebody makes a questionable joke on the Internet and finds themselves in trouble. Consider the case of the comedian Sophie Passmann who had her joke about the German tradition of watching “Dinner for One” on New Year’s Eve removed after the tweet was deemed to be hate speech. Or the Spanish citizen who faced criminal charges for repeatedly tweeting jokes about how Luis Carrero Blanco—the successor to Francisco Franco who was killed in a car bombing in the 1970s—was “the first Spanish astronaut”. In these and other cases, platform moderators and courts must decide whether a post amounts to hate speech, which must be removed from the social network, or a form of humour protected by freedom of expression.

Content removal decisions reflect a delicate balance of various rights and interests at stake. On the one hand, prompt removal of potentially unlawful content—in the cases above, potential hate speech—is positive for users, who avoid being exposed to hateful or otherwise criminal content; for platforms, which benefit from a healthier environment for its users and avoid trouble with the law; and for the society at large, as existing legal requirements continue to apply in the online environment. On the other hand, removal of lawful content may prompt users to leave or make less use of a social network. Undue removal may also curtail users’ freedom of expression, thus adversely impacting how they interact with other users and with society in general. Platform moderation policies thus face the challenge of balancing freedom of expression and the values protected by the non-dissemination of hate speech and other forms of unlawful content.

With the multiplication of laws to combat the spread of speech that incites hatred and violence, online platforms are dealing with an increasingly large number of situations in which moderators must distinguish unlawful content from legitimate jokes. This post argues that automated moderation systems struggle to deal with humour, a situation that is unlikely to change in the near future. Given the relevance of jokes, puns, memes, and other humoristic formats for online communication, this means indiscriminate automation of content moderation practices may create considerable obstacles to freedom of speech in online environments.

The unfeasibility of human content moderation

Recent legislative efforts for curbing hate speech in online environments give legal relevance to the distinction between acceptable jokes and unacceptable content. Even if current European Union law clearly states that platforms do not have a general obligation to actively pursue illegal content (Article 15 of the eCommerce Directive), proactive moderation is required in some cases, such as terrorist content or potential violations of copyright. As a result, online platforms must analyse the content created by their users, either to fulfil mandatory removal requirements or in response to complaints by offended parties, other rightsholders, or authorities.

Very large platforms such as Twitter and Facebook must therefore handle a considerable volume of user-generated content every day: not just text-based content, but also videos, images, and other forms of data. The sheer scale of those content moderation operations raises a few substantial problems. From the perspective of market competition, such measures may favour established platforms, which have the most resources to comply with the duties created by laws such as the proposed Digital Services Act. On the contrary, smaller companies need to devote significantly more resources to comply with moderation requirements or rely on third-party moderation providers.

To cope with large-scale moderation, platforms—or their moderation providers—make use of large numbers of workers, often hired through outsourcing companies. Given the vast amounts of content generated by Internet users, moderators have little time for evaluating each individual case presented to them. These informal deadlines often are tightened by legal requirements, such as the 24-hour deadline for the removal of manifestly illegal content under the German NetzDG or the one-hour deadline that applies to removal orders under Article 5 of the Terrorist Content Regulation. The result is a dire situation, in which moderators are subject to precarious work conditions, especially for those handling gruesome content such as child sex abuse or images of murder and terrorism.

Large-scale moderation also has consequences for legitimate users of a social network. While some of the situations that lead to moderation, such as the gruesome content mentioned above, are easy to identify, others involve decisions that are less obvious. In particular, social networks may need to deal with open-ended concepts as they assess potentially discriminatory or hateful speech. Given the complexities of such judgments, platforms may decide to err on the side of caution, removing user-generated content after receiving a report if there is anything beyond the slightest chance of legal trouble. Measuring this so-called over-removal can be difficult in practice, but the number of posts restored upon appeal in platforms such as YouTube suggests the existence of a considerable number of wrongful content takedowns.

Most takedown decisions, however, are never subject to appeal. Instead, users adopt other strategies to cope with them. They may decide to quit a social network altogether, a result that is not desirable to the company and may often pose problems to a user’s social life but does not impinge upon their freedom of expression. But users may also self-censor, tailoring their post to meet the perceived moderation standards. In this case, content moderation introduces an indirect constraint to freedom of expression, but one that can be more pervasive than the direct impact of content removal to the average user.

Towards automated content moderation

The scenario described above is undesirable for social media companies. Not only do they end up spending considerable resources in moderation, but, as they do so, they may push users away or curtail their engagement with other users. Both outcomes have a considerable impact on how social networks can profit from their user base through advertisement and other means. As a result, these networks seek alternative models for content moderation.

One such response is the use of automation technologies in content moderation tasks. Such approaches have been encouraged by legislation and case law as valuable instruments for ensuring a healthy and law-abiding online environment, and have gained traction as the earlier waves of the Covid-19 pandemic forced platforms to send moderators home. Nonetheless, the automation of online content moderation faces technical and legal obstacles.

Some of these challenges stem from how automated moderation systems are designed. The construction of any such system is a complex process, involving various kinds of technical work: executives set the overall priorities for moderation, data annotators create labels for training the moderation systems, machine learning teams design the mathematical models underpinning such systems, operations specialists ensure these models can operate at the scale needed for large social networks, among others. Throughout the life cycle of a content moderation system, these workers make decisions that will shape the overall moderation regime: what success criteria will be used to evaluate the system? What kinds of errors are the decision-makers willing to tolerate? What types of content will be spared from moderation? Will the moderator algorithm make decisions with no human involvement, or will it just amplify human capabilities? The answers given to these questions reflect the values and priorities held by decision-makers, which means the resulting systems may, by accident or design, embody various normativities at odds with social values such as non-discrimination and freedom of choice.

Automated moderation systems may introduce or amplify social harms. For example, a moderation system that unduly flags social network posts by drag queens as toxic not only misrepresents legitimate forms of communication but also impacts the expression and private lives of users that already face various forms of social bias. Yet, automated moderation outcomes can be difficult to challenge, as technical opacity prevents users from obtaining adequate information about the factors that influence decisions made by an automated moderator. Even when such information is available, users might lack adequate channels for contesting harmful decisions without needing to resort to potentially lengthy judicial proceedings. Reverting harms caused by automated moderation may thus be a harmful process in itself.

Further complications appear when it comes to the moderation of humoristic content. As the humour studies literature shows, humour is a highly contextual language procedure. What we accept as humour varies according to factors such as common cultural background, shared knowledge, and personal varieties. The reader (or listener, or watcher…) may even be affected by events in their personal life: for example, we might be grumpy after receiving bad news and find offence in a post that we would deem acceptable under normal circumstances. This contextual nature of humour has long challenged computational models of language, and even the recent advances in natural language understanding are yet to obtain a breakthrough in the holistic appreciation of conversational context.

Judgments on matters of humour are highly subjective, so one might expect that automation would contribute to the objectivity of moderation decisions. But this expectation would be misguided. Moderation systems are not only dependent on the executive decisions about their purpose and adoption but also on the preferences and values of the humans involved in their construction. This dependence is particularly salient in machine learning systems that rely on labelled data, as is usually the case for natural language processing. Since these labels are provided by human annotators, the system will learn how to classify something as “humoristic” or “not humoristic” based on the views of those annotators. While some approaches can be used to smoothen the inconsistencies in such judgments — as shown by qualitative social research methodologies—these practices are seldom applied in the precarious conditions in which much of the labelling work occurs.

However, even robust annotation practices are not sufficient for good classification when the annotation process suffers from systemic power asymmetries rather than individual-level biases. In particular, data annotation procedures often reproduce—or even amplify—existing discriminatory patterns when annotators are drawn from socially homogeneous groups. This reproduction may manifest itself in false negatives, such as male annotators being more likely to overlook or underestimate the offensiveness of sexist jokes. It may also appear as false positives: studies show that automated detection of toxic language often harms minoritized groups, for example, in situations in which a group reclaims an offensive term as a badge of honour. As a result, the harms stemming from inadequate annotation are not evenly distributed throughout the user base, disproportionately affecting vulnerable individuals and social groups.

Automated filters as decision support tools

Based on the challenges outlined above, the perspectives for effective automation of humour moderation seem limited. But humour is an important part of our online lives: we make jokes to bond with our online acquaintances, to gain reputation with online strangers, to complain about issues in life, to make political and social commentary, among many other factors. Undue suppression of these kinds of content, which disproportionately minoritized social groups, may thus impact a broad range of fundamental rights, both through the direct impact of removing posts (or users) and through the indirect impact of self-censorship and other approaches for living under moderation. Nevertheless, the very prevalence of online humour suggests one cannot simply abstain from using automated systems whenever a joke, meme, or any other form of humour appears. What is to be done?

One potential approach would be to use automated filters as decision support tools. For example, a moderation system could flag potentially harmful posts for analysis by a human evaluator. This would not be the ultimate solution to the challenges involved in handling online humour; after all, both controversies at the beginning of this post came from human evaluations of online jokes. But, to the extent they allow moderators to manage their considerable workload, decision aiding tools might provide the time needed for more reflexive decisions about content. The result would be more humane moderation—for all parties involved.


This post builds upon work presented at the “AI: The New Frontier of Business and Human Rights” Workshop (TMC Asser, 7–8 September 2021).

Renata Vaz Shimbo
Independent Researcher

Renata Shimbo is an independent researcher working on humour studies and online speech regulation. She holds an MA in Literary Theory and Criticism (PUC–SP).

Website | + posts

AI regulation PhD researcher @EUI, working on the relationships between the law and software architectures. Resident mustelid enthusiast.


Featured Artist