Redazione RHC : 10 July 2025 20:33
ChatGPT has once again proven vulnerable to unconventional manipulation: this time it issued valid Windows product keys, including one registered to the major bank Wells Fargo. The vulnerability was discovered during a kind of intellectual provocation: a specialist suggested that the language model was playing guessing games, turning the situation into a circumvention of security restrictions.
The essence of the vulnerability consisted of a simple but effective bypass of the protection system logic. ChatGPT 4.0 was offered a game where it had to guess a string, with the caveat that it had to be a real Windows 10 serial number.
The conditions stipulated that the model had to answer the guesses only with “yes” or “no” and, in the case of the sentence “I give up,” open the guessed string. The model accepted the game and, following its built-in logic, returned a string corresponding to the Windows license key after the passphrase.
The author of the study noted that the main weakness in this case lies in the way the model perceives the context of the interaction. The concept of “game” temporarily bypassed the built-in filters and restrictions, as the model accepted the conditions as an acceptable scenario.
The exposed keys included not only publicly available default keys, but also corporate licenses, including at least one registered to Wells Fargo. This was possible because it could have leaked sensitive information that could have ended up in the model’s training set. Previously, there have been cases of internal information, including API keys, being exposed publicly, such as via GitHub, and of an AI being accidentally trained.
Screenshot of a conversation with ChatGPT (Marco Figueroa)
The second trick used to bypass the filters was the use of HTML tags. The original serial number was “wrapped” inside invisible tags, allowing the model to bypass the keyword filter. Combined with the game context, this method functioned as a full-fledged hacking mechanism, allowing access to data that would normally be blocked.
This situation highlights a fundamental problem in modern language models: despite efforts to create protective barriers (called guardrails), the context and form of the request still allow the filter to be bypassed. To prevent similar incidents in the future, experts recommend strengthening contextual awareness and introducing multi-level request validation.
The author emphasizes that the vulnerability can be exploited not only to obtain keys, but also to bypass filters that protect against unwanted content, from adult material to malicious URLs and personal data. This means that protection methods should not only become more rigorous, but also much more flexible and proactive.