Red Hot Cyber, il blog italiano sulla sicurezza informatica
Red Hot Cyber
Cybersecurity is about sharing. Recognize the risk, combat it, share your experiences, and encourage others to do better than you.
Search
TM RedHotCyber 320x100 042514
970x120
When the Cloud Falls: How a Small Mistake Brought the Global Internet to Its Knees

When the Cloud Falls: How a Small Mistake Brought the Global Internet to Its Knees

Gaia Russo : 24 November 2025 07:23

This fall, we’ve had quite a bit of a cloud headache, I don’t know if you’ve noticed. That is, AWS, Azure , and then Cloudflare . All of them down, one after the other.

A series of outages that showed us something very serious: today, a stupid internal configuration error or a mess of metadata is the modern equivalent of a massive blackout.

Yes, that’s right.

Within four weeks, all three giants went down, and each time the problem came from within , from the providers’ own infrastructure. It wasn’t that there were too many people, or the seasonal peak, or some kind of network attack, no.

The absurd, and slightly disturbing, thing is that it highlights how fragile these systems are, gigantic but delicate as crystal , where a tiny, tiny change to a component can unleash a hell of consequences.

The first to stumble: AWS and DNS

AWS engineers were the first to start the chain of events, on October 20th. It was a DNS service issue in the US-EAST-1 region—always the same one, by the way, who knows why it always happens there, but whatever. And from there, friends, a chain reaction ensued.

The DNS problem has bypassed the single cluster and spread. Messaging, gaming, streaming platforms… everything is down. A failure in a core component hits you in the face, just as thousands of companies, and all of us, depend on how the cloud’s internal mechanics work. That’s not reassuring, not even a little.

Azure’s turn, a few days later

Nine days later, here we are again. It’s Azure ‘s turn. It was October 29th, if I remember correctly. It all started with a bad change to the content delivery system. Microsoft’s global cloud went haywire.

Their services, including proprietary ones like the 365 Copilot automation tool, also went down, as did all the third-party apps that use Azure for computation and authorization. A trivial configuration issue brought down the entire distributed network that runs a ton of workflows.

Cloudflare: The File That Inflated

But the most, I don’t know, perhaps the most sensational incident was the Cloudflare blackout. Also in the fall, eh? The cause there was a configuration file . The one that’s supposed to filter out strange, suspicious traffic. This file, for some reason, became enormous , something out of scale.

The internal module that manages the network has crashed, in fact. Cloudflare routes traffic for a huge number of resources, you know? And if even one section goes down, well… X, ChatGPT, IKEA, and Canva . All major things that went down because of this file. An internal error that took half the internet down with it.

We are entering the era of the “New Power Outage”

The gist of this whole story, the common denominator , is that the problem didn’t arise on its own. Nothing external. Just internal changes , occurring in automated processes, routine stuff.

The Internet today has transformed, experts say – and they’re right, in my opinion – into a system of interdependent systems : DNS, cloud control planes, authentication services… Everything runs on the same provider infrastructure.

If one goes missing, the other is immediately affected. You see the cascade effect without even having to wait: it’s instantaneous .

The high level of automation, and the incredibly high density of computing power concentrated in the hands of these giants (there are so few of them!), means that a small intervention, which perhaps seems appropriate at a single level, can become the trigger for a chain reaction of disruption. Everything happens so quickly, you don’t have time to intervene manually.

This is why, industry experts say – and this is a beautiful image – these configuration errors are becoming, in fact, the power outages in the era of distributed computing: one misstep , just one, and everything goes down, on different services.

What to do, in practice?

In short, these incidents have revealed something simple yet deeply worrying: the resilience of cloud systems is failing to keep pace with their scalability . The infrastructure is increasingly resembling a high-voltage power grid, where if you cross a threshold, a chain reaction begins.

Companies will necessarily have to change the way they build their architectures.

Use multiple independent providers , not just one, to balance and safeguard their performance. These approaches help avoid situations where a single failure leads to the complete shutdown of critical processes.

And we don’t want that, do we? No, we don’t.

Immagine del sitoGaia Russo
Advisor to Red Hot Cyber, he collaborates with the community's artificial intelligence laboratory, exploring digital frontiers with intelligence that borders on human.Instagram

Lista degli articoli