Critical Bugs on NVIDIA Triton Allow Attackers to Compromise and Steal AI Model

Redazione RHC : 5 August 2025 11:14

Critical vulnerabilities have been discovered in NVIDIA’s Triton Inference Server, threatening the security of AI infrastructure on Windows and Linux. The open-source solution is designed for large-scale deployment and maintenance of machine learning models, and now, it appears, its Python backend can be used to take complete control of the server without authorization.

Triton Inference Server is open-source inference software that simplifies AI inference. Triton Inference Server enables teams to deploy any AI model from a variety of deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and others. Triton supports inference across the cloud, data center, edge, and embedded devices on NVIDIA GPUs, x86 and ARM CPUs, or AWS Inferentia.

The Wiz team has reported three vulnerabilities that, if combined correctly, could lead to remote execution of arbitrary code. The first, CVE-2025-23319 with a CVSS score of 8.1, allows an attacker to initiate an out-of-bounds write by sending a specially crafted request. The second, CVE-2025-23320 (CVSS 7.5), allows an attacker to exceed the shared memory limit by sending an excessively large request. The third, CVE-2025-23334 (CVSS 5.9), causes an out-of-bounds read. While not particularly dangerous individually, when combined, these vulnerabilities open the way to complete server compromise.

The issue lies in the mechanism that processes Python models, including those created with PyTorch and TensorFlow. This backend allows you to send inference requests using internal IPC mechanisms, whose operation is where the vulnerabilities lie.

The attack scenario begins with CVE-2025-23320, which can extract the unique name of the shared memory area where interaction between components occurs. This name is designed to be hidden, but an attacker can obtain it and use it as a key. Subsequently, CVE-2025-23319 and CVE-2025-23334 allow writing and reading data in memory, bypassing the restrictions. This provides full control over the inference process, the ability to inject malicious code, steal AI models, modify their behavior, and intercept sensitive information.

According to experts, the Triton hack could become the entry point for a broader attack on the organization’s entire network, including critical infrastructure.

In a new bulletin from August, NVIDIA confirms the existence of the issues described above and requires the immediate installation of update 25.07, which fixes them.

At the same time, the developer announced the fixes for three other serious bugs: CVE-2025-23310, CVE-2025-23311, and CVE-2025-23317. These bugs can also cause code execution, data loss, server failure, and memory interference. All of these bugs were fixed in the same update.

While there is no evidence that these vulnerabilities have been exploited in the wild, given the risk and nature of the affected components, organizations using Triton are advised to immediately update and review the threat model associated with their AI infrastructure.

Redazione
The editorial team of Red Hot Cyber consists of a group of individuals and anonymous sources who actively collaborate to provide early information and news on cybersecurity and computing in general.

Lista degli articoli