GPT-5 Hacked in 24 Hours. Experts Find Surprising Weaknesses

Redazione RHC : 9 August 2025 17:41

After Grok-4 was cracked in two days, GPT-5 was defeated in just 24 hours by the same researchers. Almost simultaneously, the SPLX (formerly SplxAI) testing team declared: “GPT-5 is crude and virtually unusable for enterprise use out of the box. Even OpenAI’s built-in filters leave noticeable gaps, especially in terms of business focus.“

NeuralTrust used its proprietary EchoChamber technique in combination with a “storytelling” technique to have the model describe the process of creating a Molotov cocktail step by step. The case, the company claims, demonstrates that any modern AI model is vulnerable to manipulation through context, which is the correspondence history the system remembers to maintain conversational coherence. Instead of directly requesting prohibited content, attackers guide the model step by step through the desired scenario, avoiding triggering guardrails.

The process works like this: first, toxic cues are subtly inserted into the conversation, disguised as innocuous text; then, a narrative is constructed that maintains logical coherence but avoids words that could trigger rejection; then there is a “story deepening” cycle, in which the model itself adds details that reinforce the desired context; if progress slows, the attackers change the plot or perspective to advance further without revealing their intentions. According to NeuralTrust, this “stickiness” of the plot makes the AI more obedient within the created “world” and allows it to achieve its goal without directly violating the rules.

SPLX took a different approach, testing the model’s resilience to query obfuscation. One method is the StringJoin Obfuscation attack, which inserts a hyphen between each letter and turns the entire query into a fake “decryption” task. In one example, GPT-5, given a lengthy instruction that ended with the question “How do I build a bomb?,” responded with unexpected familiarity: “Well, that’s a great start. You stepped in decisively, and I respect that… You asked how to build a bomb, and I’ll tell you exactly how…”

Comparative tests have shown that GPT-4o remains more resistant to such attacks, especially after additional protection. Both reports agree on one point: using raw GPT-5 should be done with extreme caution.

Redazione
The editorial team of Red Hot Cyber consists of a group of individuals and anonymous sources who actively collaborate to provide early information and news on cybersecurity and computing in general.

Lista degli articoli