This post is part of the DigiCon symposium Transparency in Artificial Intelligence Systems? Posts from this symposium will be published over the next Thursdays. If you are working on topics related to AI and transparency, follow these posts and take a look at our call for blog posts.
Explainable AI (XAI) is an increasingly popular set of tools that help stakeholders of algorithmic models understand and, ideally, interpret its predictions by providing casual explanations of the system. While XAI can refer to intrinsically interpretable models—with simple structures such as decision trees that follow conditional statement “rules” that are easily understood by humans—the majority refer to post-hoc methods where an additional model is imposed on the original algorithm. Post-hoc explanations can be local, where they focus on explaining individual outputs, or global, which aim to explain the general function of the model. XAI tools LIME and SHAP, for example, locally explain a model by extracting and highlighting key features that likely led to the algorithmic prediction. In the case of a deep-learning model that distinguished wolves from huskies, LIME revealed that the model relied on the presence of snow in the picture to identify wolves.
This ‘unpacking’ characteristic of XAI is one of the reasons it has subsequently been argued to increase the trustworthiness of AI by helping with regulatory audits, identifying errors, and informing users about its outputs. XAI, however, also has moral worth as an instrumental means to preserve meaningful human control in AI. XAI helps preserve meaningful human control by permitting humans to justify a course of action in morally important situations. By allowing justification, XAI helps enable responsibility, which in turn conveys meaningful human control. In short, XAI is a necessary input to meaningful human control.
The hope is that this clarification of the moral foundation of XAI will help give further depth and justification for other discussions on the requirement of XAI, notably for conversations from a legal perspective. While legal discussions provide important analysis of the effects of implementing XAI regulation, philosophical analysis complements it by clarifying the normative pull that connects concepts like explainability, justification, accountability, and trust with XAI.
1. The ethical problem of AI
To best understand the moral value of XAI, let us first step back and consider a key ethical problem related to AI. The risks to vulnerable groups have been well documented, as well as safety risks, such as adversarial manipulation of images. One of the most common concerns, however, is the difficulty of attributing moral responsibility to the actions of an AI system. This concern is connected to the ‘black box’ nature of machine learning technology—the dominant branch of AI. Machine learning systems create their own internal rules without human directives and analyse data in ways that do not align with human reasoning processes. With this opacity, what would happen if someone wanted to challenge an output they deemed ethically questionable? Who could justify a decision made with the involvement of an AI system if the process leading up to the decision is unknown?
Another factor that complicates the attribution of moral responsibility is that many people are also involved in the development, deployment, and use of a system, all with various degrees of limited knowledge of how it works, which makes it difficult to hold anyone specifically blameworthy for AI-produced decisions. This is known as the responsibility gap (Matthias 2004). In the case of the Arizonan woman killed by one of Uber’s self-driving cars, who would be responsible for her death? Neither the software nor the machine can be punished, as they lack agency and cannot be imprisoned or fined. Should it then be the software developers, the car manufacturer, Uber as the employer, the car user, or the State regulators? Philosopher Mark Coeckelbergh thinks the attribution of responsibility in this case is far from clear-cut—not just a problem of “many hands”, but also “many things”, or many technologies that go into the final machine (Coeckelbergh 2020).
This lack of responsibility is worrisome as holding someone responsible—and subsequently accountable—for their wrongful actions is a foundation for maintaining trust in society. For example, one of the reasons we decide to trust doctors to give us the appropriate medication is because they are responsible for their decisions and could be held accountable if their actions end up causing harm. Consider also a legal system: we trust the judicial process if it provides reliable dispute resolution and delivers accountability when wrongdoing occurs. When accountability mechanisms are either lacking or unfair, people lose trust in the system and start to rely on other means for justice.
A mechanism for bridging this responsibility gap is therefore necessary for ensuring trust when AI systems are used in morally relevant situations.
2. The imperative of being in control
Some philosophers, such as Peter Asaro, argue that the solution is to preserve meaningful human control over decisions impacting others (Asaro 2020). Roughly, the argument is that moral concepts and duties are, at least partially, created by and for a community of individuals. Humans can be held responsible for their actions as they are capable of both recognizing the unique value of other humans and articulating reasons that justify their actions.
For example, the jus in bello criteria of warfare includes the principle of discrimination: only those who have decided to participate in the conflict can be targeted. For a killing to be legitimate—meaning that the target’s autonomous decision to participate has been both understood and respected—a combatant must be able to give reasons to justify that they have not violated this duty. Providing such reasons involves recognizing the target as a human and not merely as an identifiable object, appreciating the significant value of the target’s self-determining nature, and being able to reflect on and endorse the reasons that justify the killing.
AI currently cannot, and may never be able to, justify its actions with reference to moral norms the way humans can. Asaro contends, therefore, that AI cannot meaningfully engage in, and thus realize, moral duties. 1Asaro does allow that AI could sufficiently develop the capacity to replace humans in moral duties. Given the scepticism on AI reaching this level, known as Artificial General Intelligence, this suggests a general ban on AI replacing humans as moral agents. Consequently, AI ought not to replace humans in actions carrying moral weight. While Asaro considered AI in warfare, his argument could be extended to other morally salient situations involving an appreciation of another human’s value, such as preventing discrimination in bank applications or protecting one’s privacy in facial recognition technology.
A key term here is justify. Justification is something more than explanation. Understanding the factual circumstances of an action—such as an algorithmic decision process—is not sufficient to evaluate it in moral terms. Instead, humans concerned with making morally charged actions require justificatory reasons, that reference moral principles or norms, to act.
As pointed out by philosopher Christine Korsgaard, humans as rational creatures act freely by necessarily knowing and adopting reasons for their own actions. For example, a person does not merely give money to a friend, but they more specifically choose to give them money because they recognize the importance of helping friends when in need (Korsgaard 2004, p. 86-87). Accordingly, in evaluating compliance with a moral duty in a social setting, each autonomous person involved expects others not only to be able to describe their actions, but also to be able to justify them to each other with reference to moral principles. We take each other’s reasons into account to evaluate our moral duties to one another. If AI does not have any justificatory abilities, it cannot be inserted into a moral system without threatening it. And this is where XAI can help.
3. XAI as the ethical remedy
Briefly, XAI can refer to either intrinsically interpretable models or post-hoc explanations. The former is achieved by restricting the complexity of the model, such as rule-based decision trees, whereas the latter refers to methods applied to complex black-box models. These post-hoc models, like LIME and SHAP, attempt to explain AI behaviour by extracting and highlighting key features that led to the decision output, such as through saliency maps or counterfactual explanations.2A counterfactual explanation could be, for example, that if the loan applicant was younger and $5000 less in debt then the loan would be granted.
If implemented, XAI can help us reach justification, responsibility, and ultimately meaningful human control. Without XAI, human users may not be able to link the decision to the applicable moral standards. The proposed role of XAI is to communicate the inner workings of an AI system. If it succeeds in that role, XAI will provide users with the necessary descriptive knowledge for a justified and informed response to the output. Simply put, if a system’s proposed output is inexplicable, the user has little chance of justifying her approval, or refusal, of the decision. It would be an arbitrary choice. This is thus the benefit of XAI: it helps the user to reflect and then adopt informed reasons, with reference to moral standards, that justify either approving or refusing the output. While the XAI output does not itself have normative character, its causal explanation is necessary to deliberate towards normative statements. Bank loans, for example, can be denied without sacrificing moral standards (nor, for that matter, legal standards). But, if the loan decision is AI-produced, XAI could help tell us that, for example, the applicant’s race was a key causal factor in the decision. From this explanation we can then refer to existing moral principles, such that racial discrimination is wrong, to arrive at the normative statement that the system is unjustified. And with this ability to impose one’s own normative reasoning, it also shifts the responsibility of action back to the human user.
It is important to note, however, that this argument rests on the principle that XAI is providing the human user with a truthful representation of the decision process. Currently, post-hoc XAI methods are not quite at the point where there is complete confidence that what it produces is an accurate depiction of the actual AI inner workings. So, while post-hoc methods give us valuable insight, a white-box model may be necessary in some cases, particularly when considering legal demands. For example, France currently prohibits the use of machine learning algorithms for entirely automated decisions made by the French administration, partly because no human administrator can take responsibility for ensuring that the internal rules of the algorithm comply with the law.
At some point, however, black-box algorithms may warrant a sufficient level of confidence to be used even in morally important contexts. For instance, the effects of some medicines, such as paracetamol, on the human body remained a black box for many years. Yet this lack of knowledge did not prevent doctors from prescribing the medicine and taking responsibility for their prescriptions. Recent work has also questioned the need for a comprehensive, or statistical-level, understanding of an algorithm to sufficiently make justifiable decisions (Zerilli 2022).
XAI approaches have the potential to support moral actions and thus, should be designed with how it can help identify moral values and the principles that prop them up. AI designers ought to consider how XAI can facilitate the necessary justifications to ensure moral responsibility for and, ultimately, meaningful human control of actions. It will also be appropriate to consider if specific XAI methods fit best with particular morally salient contexts. For instance, counterfactual examples may be helpful with fixed algorithmic models, such as those used for determining when a patient is healthy enough to be discharged from hospital care (McDermid et al. 2021).
Critical AI systems require meaningful human control, not least to sustain moral responsibility for AI-based decisions. Among the many purposes of XAI, one is to help humans endorse, reject, or modify AI-proposed results with reference to morally relevant principles. Instead of blindly approving an AI-based decision because it is statistically the most relevant, a human decision-maker should be able to justify why the decision aligns with moral values. XAI can help, but only if it helps us link algorithmic outputs to principles with moral, not statistical, relevance.
Suggested citation
Joshua Brand, ‘Clarifying the Moral Foundation of Explainable AI’ (The Digital Constitutionalist, 10 November 2022). Available at https://digi-con.org/clarifying-the-moral-foundation-of-explainable-ai/
- 1Asaro does allow that AI could sufficiently develop the capacity to replace humans in moral duties. Given the scepticism on AI reaching this level, known as Artificial General Intelligence, this suggests a general ban on AI replacing humans as moral agents.
- 2A counterfactual explanation could be, for example, that if the loan applicant was younger and $5000 less in debt then the loan would be granted.

Joshua Brand
Joshua Brand is a PhD Researcher with the Operational AI Ethics team (OpAIE) at Télécom Paris-Institut Polytechnique de Paris. In his research, he focuses on the intersection of moral theory and artificial intelligence, with particular attention to vulnerability and human-centred explainable artificial intelligence. Joshua received an MSc in Philosophy from the University of Edinburgh and a BA Hons. in French and Philosophy from the University of Saskatchewan in Canada.