DeepSeek challenges AI giants: 50% cost and API cuts

Redazione RHC : 6 October 2025 06:54

The Chinese company DeepSeek has presented an experimental version of its language model, DeepSeek-V3.2-Exp, which for the first time implements its own version of sparse attention, a technique that significantly reduces the computational cost of processing long text sequences . The new mechanism, called DeepSeek Sparse Attention, is said to be able to reduce the model’s running costs by nearly half . To demonstrate these savings, the company has reduced the price of its API by 50%.

The problem of computational overhead in large language models is particularly acute for long dialogues. The classic Transformer architecture, developed in 2017, compares every word in the input sequence with every other word, resulting in a quadratic increase in the number of operations. For a thousand words, this translates to a million comparisons, and for ten thousand words, to a hundred million. This overhead increases resource usage in long sessions and slows performance, as the system is forced to reanalyze the entire dialogue history for each new request.

Sparse Attention technology works differently. It doesn’t match every word with every other, but selects a limited set of the most significant connections. DeepSeek uses a proprietary mechanism called the Lightning Indexer, a small additional neural network unit that evaluates the significance of word pairs and selects up to 2,048 of the most relevant connections for each position . The company hasn’t disclosed details about how the indexer makes its decisions, but says it doesn’t compromise the quality of text understanding.

Internal tests have shown that the new model provides comparable results to the previous version, DeepSeek-V3.1-Terminus , while maintaining high accuracy and the ability to process long sequences. Notably, DeepSeek has open-sourced its components under the MIT license and provided publicly accessible weights, allowing other researchers to test and develop the proposed solutions.

DeepSeek first made headlines in January when its R1 model matched OpenAI’s o1 performance with a training cost of just $6 million . Additionally, the company’s chat app briefly topped the iPhone app store, surpassing ChatGPT. Since then, industry attention has focused on the Chinese lab, which has been forced to find ways to optimize its computations due to limited access to modern GPUs and other specialized chips due to export restrictions.

Although this approach has long received little attention and was first used in GPT-3 and several other models by Western developers, DeepSeek claims that its implementation has enabled precise tuning and a significant reduction in computational costs without any noticeable loss of quality. Independent experts have not yet confirmed these results, but if the company’s conclusions prove correct, these methods could significantly change the economics of using AI models in the long term.

Redazione
The editorial team of Red Hot Cyber consists of a group of individuals and anonymous sources who actively collaborate to provide early information and news on cybersecurity and computing in general.

Lista degli articoli