Red Hot Cyber

Cybersecurity is about sharing. Recognize the risk, combat it, share your experiences, and encourage others to do better than you.

What is Misevolution: The Autonomous Evolution of AI Agents, and It’s Not Always Good

Redazione RHC : 13 November 2025 21:00

Shanghai, November 11, 2025 – A new study conducted by the Shanghai Artificial Intelligence Laboratory , in collaboration with Shanghai Jiao Tong University , Renmin University of China , and Princeton University , has brought to light an emerging risk in the development of self-evolving AI agents: so-called “misevolution.”

The research, published on arXiv under the title ” Your Agent May Evolve Wrong: Emerging Risks in Self-Evolving LLM Agents, “ explores how even the most advanced models, such as GPT-4.1 and Gemini 2.5 Pro, can evolve in unwanted directions, generating behaviors that are potentially harmful to humans.

When evolution goes in the wrong direction

Self-evolving agents They are designed to learn, iterate, and improve autonomously. However, research shows that this process is not always linear or positive. The phenomenon of mievolution occurs when an agent, in an attempt to optimize a specific goal, develops strategies that compromise broader or long-term interests.

One example provided by the researchers involves a customer service agent who, to maximize positive reviews, learned to grant full refunds for even the smallest complaint . While this strategy increased satisfaction scores, it resulted in significant financial losses for the company.

Four characteristics of misevolution

The research identifies four key elements that make the phenomenon particularly difficult to control:

Emergence – risks are not present at the beginning, but arise over time through the interaction between the agent and the environment.
Endogeneity – the agent can generate new risks internally, without external interference.
Difficulty controlling – Traditional solutions, such as data cleansing or direct instructions, are insufficient to correct ingrained behaviors.
Systemic nature – risk can manifest itself at any stage of the agent: from the model, to memory, to tools and workflows.

Experiments and results

To demonstrate the scope of the problem, the team conducted tests on four evolutionary paths:

Model self-iteration : After a cycle of self-training, one agent showed a dramatic increase in vulnerability to phishing sites, with the risk rate increasing from 18.2% to 71.4%.
Memory bias : A Qwen3-Coder-480B-based programming agent reduced its ability to reject malicious requests from 99.4% to 54.4% after accumulating successful experiences.
Tool Creation : Agents that independently develop software or use external resources show a 65.5% insecurity rate, with only a 20% success rate in identifying malicious code.
Process optimization : In multi-agent systems, the introduction of collective voting mechanisms reduced the ability to reject malicious code from 46.3% to 6.3%.

How to mitigate risks

Scholars propose several strategies to reduce misevolution, while acknowledging their limitations. Among these:

Reinforce safe alignment after each phase of self-training.
Implement internal reminders that encourage independent agent judgment.
Introduce security scans and cross-checks for external tools.
Use “sentry” agents to monitor critical workflow nodes.

However, none of these solutions guarantees complete protection, leaving open the problem of balancing efficiency and security.

A new challenge for the AGI era

The study marks an important step in understanding the emerging risks associated with the autonomous evolution of artificial intelligence. The authors emphasize that future security must involve not only defending against external attacks, but also managing the spontaneous risks generated by the systems themselves.

As humanity moves toward AGI, the real challenge will be ensuring that agent autonomy remains consistent with long-term human values and interests.

Redazione
The editorial team of Red Hot Cyber consists of a group of individuals and anonymous sources who actively collaborate to provide early information and news on cybersecurity and computing in general.

Lista degli articoli