Redazione RHC : 24 September 2025 07:12
Google DeepMind researchers have released an updated version of their AI risk assessment framework, Frontier Safety Framework 3.0 . This paper examines how generative models can run amok and pose a threat . It considers scenarios in which the AI ignores users’ attempts to stop it.
DeepMind’s approach is based on so-called “Critical Capability Levels” (CCLs) . This is a scale for assessing the point at which a model’s behavior becomes dangerous, for example in cybersecurity or biotechnology.
The document describes the steps developers should take when their systems reach a certain level of risk.
The researchers cite the model’s potential for weight loss as a major threat. If these losses fall into the hands of malicious actors, they could disable the built-in limitations and use the AI to create malware or even develop biological weapons . Another risk is manipulative behavior.
DeepMind warns that chatbots could influence people’s worldviews, though it notes that this is a “low-velocity threat” that the company is currently addressing with its own defense mechanisms.
Particular attention is paid to “uncoordinated AI,” or systems that begin to ignore instructions or act against human interests. Cases of deceptive or stubborn models have already been recorded.
In the future, such systems may develop effective “simulated reasoning,” but without verifiable intermediate steps. This means that monitoring their processes will become virtually impossible.
There are currently no proposals for a definitive solution to this problem. DeepMind only recommends using automated monitoring to analyze intermediate model results and identify any signs of inconsistency.
However, the researchers themselves acknowledge that too little is still known about how modern AIs arrive at their responses and that the threat could intensify in the coming years.