Nearly a third of web traffic is generated by bots! The era of invasive AI is now

Redazione RHC : 2 September 2025 14:50

Yesterday, the Red Hot Cyber website was inaccessible for about an hour. But what’s going on, we wondered? After a series of analyses, here’s the result: the internet is changing rapidly under the pressure of artificial intelligence.

If previously, websites suffered from classic search robots, today a growing share of traffic is generated by new, aggressive scanners that operate in the interest of large language models. According to Cloudflare, nearly a third of all global web traffic comes from bots, with AI crawlers being the fastest-growing. Fastly’s analysis specifies that 80% of this traffic is generated by programs designed to mass-collect data needed for AI training.

Formally, the history of automated scanners began in 1993, with the appearance of Web Wanderer, which recorded new web pages. But experts emphasize: the difference between those early tools and today’s systems is enormous. Modern algorithms don’t just index pages; they overload the infrastructure, creating high costs for site owners. Fastly has recorded numerous cases where sudden spikes in requests from AI bots have increased server load tenfold, and sometimes twentyfold, in just a few minutes, resulting in inevitable drops in productivity and service interruptions.

Hosting providers emphasize that such crawlers almost never take into account limitations on crawl frequency and traffic saving rules. They download the full text of pages, follow dynamic links and executable scripts, completely ignoring the resource owners’ settings. As a result, even sites not directly targeted are indirectly affected: if several projects share a server and a common communication channel, an attack on neighboring sites instantly compromises their speed.

For small sites, this translates into complete inaccessibility. Resource owners note that the usual DDoS protection mechanisms offered by Cloudflare and other network companies effectively deal with waves of distributed attacks, but are useless against the onslaught of AI bots. In effect, we are talking about the same destructive consequences, even though the traffic is not formally classified as malicious.

The situation is difficult even for major operators. To withstand such influxes, they need to increase the amount of RAM, processor resources, and network bandwidth. Otherwise, page loading speeds decrease, which translates into an increased bounce rate. Hosting research shows that if a site is open for more than three seconds, more than half of visitors close the tab. Every additional second only worsens the problem, and the company loses its audience.

Even the largest AI companies have appeared in the statistics.Meta accounts for the largest volume of search traffic, about 52%. Google accounts for 23%, and OpenAI another 20%. Their systems are capable of generating peaks of up to 30 terabits per second, which causes disruptions even for organizations with powerful infrastructure. At the same time, website owners gain nothing from this interest: while a visit by Googlebot previously offered the opportunity to reach the first page of search results and attract readers or customers, now AI crawlers don’t direct users back to the original sources. The content is used to train models, and the traffic doesn’t generate revenue.

Attempts to protect themselves with traditional methods—passwords, paid logins, CAPTCHAs, and specialized filters—rarely yield results. Artificial intelligence overcomes these barriers quite well. Even the old robots.txt mechanism, which for decades served as the standard method for specifying indexing rules, is losing its meaning: many bots simply ignore it. Cloudflare then accused Perplexity of circumventing these settings, and Perplexity, in turn, denied everything. But website owners regularly experience waves of automated requests from various services, which confirms the impotence of existing tools.

There are initiatives to supplement robots.txt with a new format, llms.txt. It should allow language models to transmit specially prepared content without compromising the functionality of the site. However, the idea is perceived ambiguously, and it is unclear whether it will become a standard. At the same time, infrastructure companies like Cloudflare are launching their own services to block AI bots. There are also independent solutions like Anubis AI Crawler Blocker, an open and free project that doesn’t prevent crawling, but slows it down to the point where it’s no longer destructive.

A new arms race is thus emerging on the Internet. On one side are website owners who want to keep their resources accessible and profitable. On the other are AI developers who exploit the endless flow of data as fuel. A balance will likely be found over time, but the price will be high: the network will become more closed, information will be fragmented, and many materials will end up behind the scenes of paid services or disappear from free access altogether. Memories of a free Internet are gradually becoming history, and the prospect of a fragmented network is becoming increasingly real.

Redazione
The editorial team of Red Hot Cyber consists of a group of individuals and anonymous sources who actively collaborate to provide early information and news on cybersecurity and computing in general.

Lista degli articoli