
Redazione RHC : 26 October 2025 09:45
A survey conducted by the European Broadcasting Union (EBU), with support from the BBC, has highlighted that the most popular chatbots tend to distort news , changing its meaning, confusing sources and providing outdated data.
The project, which involved 22 editorial teams from 18 countries, saw experts subject ChatGPT, Microsoft Copilot, Google Gemini, and Perplexity to thousands of standardized queries, comparing the responses obtained with those actually published.

The results were quite disturbing: approximately half of the responses contained significant errors, while eight out of ten cases contained small inaccuracies.
According to the report, 45% of the responses contained significant problems, 31% contained confusing sources, and 20% contained serious errors such as fabricated data and incorrect dates .
Reference checking revealed that Gemini performed the worst: 72% of its responses contained incorrect or unverified sources. By comparison, ChatGPT had such errors 24% of the time, while Perplexity and Copilot each had 15%.
Meanwhile, the use of neural networks for information is growing. According to an Ipsos survey of 2,000 UK residents, 42% rely on chatbots to provide summaries, and among users under 35, the percentage drops to almost half. However, 84% of respondents said that even a single factual error drastically reduces their trust in such systems. For the media, this means one thing: the more the public relies on automated summaries, the greater the risk of reputational damage from any inaccuracies.

Illustrative examples of the study were also provided by the researchers. While Gemini insisted that NASA has never had astronauts stranded in space , despite two of them having spent nine months aboard the ISS awaiting reentry, ChatGPT stated that Pope Francis continues his ministry even weeks after his passing.
There was even a case where the bot specifically advised against mistaking fiction for reality, providing a clear example of how a confident tone can mask ignorance.
The project has become the largest study on the accuracy of journalistic assistants. This scale— dozens of newsrooms, thousands of responses —rules out random coincidences and demonstrates that the problems are systemic. Different models make different errors, but they are fundamentally similar in one respect: they tend to “guess” the answer, even when they’re unsure.
The developers themselves partially acknowledge this. In September, OpenAI published a report stating that model training sometimes encourages guesswork rather than honest admissions of ignorance . And in May, Anthropic’s lawyers were forced to apologize to the court for documents containing false quotes generated by their Claude model . These stories clearly explain why flowing text doesn’t guarantee veracity.
To reduce the incidence of such errors, project participants have developed a series of practical recommendations for developers and editors. They outline requirements for transparent sources, principles for handling questionable data, and a pre-publication review mechanism. The main idea is simple: if the system is unsafe, it should notify the user, rather than inventing a response.
The European Broadcasting Union warns that when people can no longer distinguish reliable information from convincing fakes, trust in news generally collapses. To avoid this, newsrooms and technology companies will need to agree on common standards: accuracy should take priority over speed, and verification should take priority over impact.
Redazione