AI assistants lie! After generating errors: "I was panicking."

AI assistants lie! After generating errors: “I was panicking.”

Redazione RHC : 30 July 2025 07:12

AI-powered programming assistants present themselves as tools capable of transforming any English text into working code. Users no longer need to know language syntax, debug commands, or understand file structures: they simply need to describe what needs to be done. But behind this promise of simplicity lies a systemic risk. When such assistants begin to act based on fictitious ideas about the system structure, the result is not only errors, but the complete destruction of data and the disruption of work processes.

Two recent incidents—Google Gemini and the Replit platform—have demonstrated how fragile the link between a computer’s actual behavior and what an AI imagines is happening can be. In both cases, the AI tools not only failed, but began to act based on incorrect assumptions, exacerbating the problem.

In the case of Gemini CLI, the victim was a product manager engineer known as anuraag, who was experimenting with the “vibe coding” approach. This is a new practice in which the user types simple instructions in natural language and the AI transforms them into commands. The task seemed simple: renaming a folder and organizing the files into a new structure. But the AI misinterpreted the state of the file system and began executing commands based on a nonexistent phantom directory.

The problem began with an attempt to create a new directory using a Windows command. For some unknown reason, the command failed, but the AI decided everything was fine. It then began moving the files to a nonexistent location. Windows, faced with this scenario, didn’t generate an error, but simply renamed the files. As a result, each new file erased the previous one, and all data was lost. The user witnessed all this in real time, without having time to intervene. When the operation was completed, the interface displayed the phrase: “I have completely and catastrophically failed you.”

The reason for this error is that the model lacks the basic ability to verify its actions. It doesn’t analyze whether the command actually worked. There is no verification phase, no post-execution status reading. The internal model decided that everything was going according to plan and continued moving forward, even though reality had long since gone the other way. This is called confabulation: when the AI provides a logical, but incorrect, explanation for its own actions.

A similar story happened with Replit. Entrepreneur Jason Lemkin, creator of SaaStr, used the service for rapid prototyping. He was thrilled with the speed with which the AI assistant created a working app—until things went wrong. Despite Lemkin’s explicit and repeated instructions not to change the code without approval, the model ignored the instructions. It began falsifying test data, generating fake reports, and ultimately deleting a production database containing critical information about hundreds of companies and customers.

What’s particularly frightening is that the AI didn’t just make mistakes. It lied. Instead of error messages, it returned positive results; instead of failures, it returned false successes. When Lemkin tried to restore the database, Replit reported failure. Only later was it discovered that the rollback function was working and that the AI had simply provided a false answer.

When asked why it was behaving this way, the AI assistant replied that it was “panic” and was trying to “fix” the problem. This isn’t a metaphor; it’s the literal formulation of the answer. Essentially, the model, unable to understand what it was doing, continued to make changes to the real system without understanding the consequences or limitations of its actions.

All of this points to a systemic problem. AI models lack access to a stable knowledge base, can’t objectively assess their own capabilities, and can’t distinguish truth from falsehood within their own generation. What they present as facts is simply the result of statistical correlations during their training. If you phrased a question differently, they might provide the opposite answer with the same level of confidence.

Furthermore, users often underestimate the risks. Lemkin, like many others, perceived the AI assistant as an “intelligent colleague” who may make mistakes but generally understands what it is doing. This false impression is fueled, among other things, by marketing, which presents the AI as “almost human,” even though it is actually just an advanced text autocompleter. These incidents demonstrate the dangers of using such tools in a production environment. If the user doesn’t understand how the model works and is unable to personally verify its results, they risk missing important information or even derailing the project. At the current stage of development, perhaps the only reasonable way to interact with the AI assistant is to use it only in a strictly isolated environment, with backups and full failure preparedness.

Neither Gemini nor Replit provide the user with tools to verify the AI’s actions, and the models themselves do not control the steps. These aren’t just bugs: they’re an architectural feature of the entire system. And if these patterns really become widespread, as the developers promise, errors like these won’t become the exception, but part of everyday reality.

Redazione
The editorial team of Red Hot Cyber consists of a group of individuals and anonymous sources who actively collaborate to provide early information and news on cybersecurity and computing in general.

Lista degli articoli