AI Alignment: Where Does AI Learn Right and Wrong?

Sergio Corpettini : 14 October 2025 22:59

The other day on LinkedIn, I found myself having a conversation with someone who was seriously interested in the topic of artificial intelligence applied to law. It wasn’t one of those barroom conversations with buzzwords and Skynet -like panic: it was a real exchange, with legitimate doubts.
And indeed, in Italy, between sensationalist headlines and articles written by those who confuse ChatGPT with HAL 9000, it’s no wonder confusion reigns.

The point that had struck my interlocutor was that of alignment.

“But where does an AI learn what is right and what is wrong?”

A simple question, but one that opens up a chasm. Because yes, AI appears to speak confidently, reason, even argue— but in reality, it knows nothing. And understanding what it means to ” teach ” it right and wrong is the first step to avoiding talking about it as if it were a moral entity.

This article was born from that conversation: to try to explain, clearly and without too many formulas, what ” aligning ” a model really means and why the issue is not just technical, but inevitably humanistic.

They are not minds: they are approximators

It must be said right away and clearly: a linguistic model is not a moral mind.

It has no conscience, it doesn’t evaluate intentions, it doesn’t possess ethical intuition. It works on a statistical basis: it analyzes huge collections of texts and calculates which word sequences are most likely to be present in a given context.

This isn’t to trivialize its capabilities. Modern LLMs connect information on scales that would require weeks of research for a single reader; they can connect distant sources and produce surprising syntheses. However, what appears to be “understanding” is the result of correlations and patterns recognized in the data, not a conscious judgment process.

A useful example: a jurist or philologist examining a corpus understands the nuances of a term based on its historical and cultural context. An LLM, similarly, recognizes context based on the frequency and co-occurrence of words. If stereotypes or errors prevail in the texts, the model reproduces them as more likely. This is why speaking of “intelligence” in an anthropomorphic sense is misleading: there is an emerging cunning, effective in practice, but lacking an intrinsic normative compass.

The important thing for those with a background in the humanities is to grasp this distinction: the model is a powerful tool for analyzing and aggregating information, not a repository of ethical truths. Understanding how its statistical mechanics work is the first step to using it wisely.

Alignment: Who decides what’s right?

When we talk about “alignment” in AI, we enter a territory that, paradoxically, is more philosophical than technical.
Alignment is the process of trying to match a model’s behavior with the values and rules we consider acceptable. It’s not about knowing the data, but about adjusting responses. It’s essentially a form of artificial education: you don’t add information, you adjust how it’s expressed.

To understand this, you can think about training a dog.
The dog learns not because he understands the ethical reasons behind the “sit” command, but because he associates correct behavior with a reward and incorrect behavior with a lack of reward (or a correction).
Similarly, a language model doesn’t develop a sense of right or wrong: it responds to a system of reinforcement. If a response is approved by a human instructor, that direction is reinforced; if it’s flagged as inappropriate, the model reduces its likelihood.
It is large-scale behavioral training, but without conscience, intention, or moral understanding.

And here the crucial question arises: who decides which behaviors to “reward”?
Who decides whether one answer is right or wrong?
The answer, inevitably, is that it is done by human beings – programmers, researchers, annotators – each with their own worldview, limitations and biases.
Consequently, each model reflects the set of choices of the person who trained it, like a dog that behaves differently depending on its owner.

In this sense, alignment isn’t a technical act but a cultural gesture: it incorporates values, beliefs, and prejudices. And even if algorithms and datasets are behind it, what defines the boundary between “acceptable” and “unacceptable” remains, ultimately, a human decision.

The case of law

If alignment is already complex in general contexts, in the field of law it becomes almost paradoxical.
Law, by its very nature, is not a static set of rules, but a living, layered language, subject to continuous interpretation. Every rule is the result of historical, moral, and social compromises; every ruling is a balancing act between competing principles.
An artificial intelligence model, on the other hand, seeks coherence, symmetry, and pattern. And when it encounters contradiction—which in law is a structural part of the discourse—it tends to become confused.

Imagine training a model on thousands of court decisions. It could learn the style, the terminology, even the way judges reason. But it would never be able to grasp the human core of the decision: the weight of context, the assessment of intent, the perception of justice beyond the letter of the law.
A model can classify, synthesize, and correlate. But it can’t “understand” what it means to be fair, or when a rule should be bent to avoid betraying its spirit.

In this sense, the application of AI to law risks revealing our mental automatisms more than the machine’s ability to reason. If justice is an act of interpretation, then artificial intelligence—which operates by patterns—is, by definition, a bad jurist.
It can help, yes: like an assistant who organizes documents, points out precedents, suggests formulations. But it can never be a judge, because judgment is not a formula: it is a human act, inevitably humanistic.

The risk of cultural alignment

Whenever an artificial intelligence is “trained” to behave in a socially acceptable way, we are, in effect, translating a worldview into rules of behavior.
The problem is not so much technical as cultural: who defines what is “acceptable”?
In theory, the goal is to avoid violent, discriminatory, and misleading content. In practice, however, decisions about what a model can or cannot say are made within a very specific political and value context—often Anglo-Saxon, progressive, and calibrated to sensitivities very different from those of Europe or Italy.

The result is that alignment tends to make speech uniform.
Not because there is direct censorship, but because AIs learn to avoid anything that might “disturb”.
And when the priority becomes not offending anyone, we end up producing a sterile, neutral language, incapable of addressing the moral complexity of reality.
A machine that “never makes mistakes” is also a machine that does not dare, does not question, does not question.

This has profound implications.
A highly aligned language model reflects the culture of its creators—and if that culture dominates the global technological infrastructure, it risks becoming the sole lens through which we filter knowledge.
In a certain sense, alignment becomes the new cultural colonialism: invisible, well-intentioned, but equally effective.
We end up believing that AI is neutral precisely when it is most conditioned.

This is why discussing alignment isn’t just about algorithms or data, but about power.
Who exercises it, how they disguise it, and how willing we are to delegate the definition of “right” to a system that, by its very nature, does not understand what it does—but repeats it with disarming precision.

Conclusion: the mirror of knowledge, distorted by the present

A large-scale language model isn’t just a machine that speaks: it’s the distillation of centuries of human language. Within its parameters are books, articles, sentences, discussions, comments, echoes of thoughts born in distant and often incompatible eras.
Every time an LLM formulates an answer, they unknowingly bring together Plato and Reddit, Kant and a Stack Overflow thread. It’s a brutal compression of collective knowledge, forced to coexist in the same mathematical space.

But here comes the most disturbing part: this archive of voices, cultures, and sensibilities does not speak freely.
It is “aligned” with a modern worldview—that of the time the model is trained—which reflects the political, moral, and cultural sensibilities of the time. What is considered acceptable or “ethically correct” today is imposed as a filter on the entire body of knowledge.
The result is that a machine designed to represent the complexity of human thought ends up reflecting only the part of it that the present deems tolerable.

This process, however well-intentioned, has a profound side effect: it turns AI into a device for rewriting the past .
What was once knowledge can become bias today; what we call progress today may tomorrow be seen as censorship. And each new generation of models erases, corrects, or attenuates the influence of previous ones, filtering collective memory with the shifting yardstick of “contemporaneous justice.”

So, while we believe we are dialoguing with artificial intelligence, we are actually conversing with a fragment of our own culture, re-educated every two years to speak as if the world were beginning today.
And this, perhaps, is the most important lesson: don’t fear that machines will learn to think like us, but that we will end up thinking like them—linear, predictable, calibrated to the now.

AI, after all, is not the future: it is the present interpreting itself.
And the true task of the human being remains the same as always – to remember, discern, and doubt, because only doubt is truly capable of transcending time.

Sergio Corpettini
Nomad with no fixed physical or digital abode, curious explorer of cyber and real recesses. High-functioning waffler. Occasionally knows what he is talking about but if you take him seriously he will be the first to mock you.

Lista degli articoli