VU#518910: Ollama GGUF Quantization Remote Memory Leak

Ollama’s model quantization engine has a vulnerability that can let an attacker with access to the model upload interface read and potentially exfiltrate heap memory from the server. The issue can cause unintended behavior, including unauthorized access to sensitive data, and in some cases broader system compromise.

An out-of-bounds heap read/write vulnerability was identified in Ollama’s model processing engine. Ollama is an open-source tool that runs large language models locally on personal systems, including macOS, Windows, and Linux, and it supports model quantization. By uploading a specially crafted GPT-Generated Unified Format (GGUF) file and triggering the quantization process, an attacker can make the server read beyond intended memory boundaries and write the leaked data into a new model layer. CVE-2026-5757 is described as an unauthenticated remote information disclosure vulnerability in Ollama’s model quantization engine, enabling an attacker to read and exfiltrate the server’s heap memory, potentially leading to sensitive data exposure, further compromise, and stealthy persistence. The vulnerability is attributed to three combined factors: the quantization engine trusts tensor metadata such as element count from the user-supplied GGUF file header without verifying it against the actual size of the provided data; Go’s unsafe.Slice is used to create a memory slice based on the attacker-controlled element count, extending far beyond the legitimate data buffer into the application’s heap; and the out-of-bounds heap data is processed and written into a new model layer, after which Ollama’s registry API can be used to “push” this layer to an attacker-controlled server for exfiltration.

With access to the model upload interface, an attacker can exploit the vulnerability to read from or write to heap memory. The advisory states this may result in exposure of sensitive data, data exfiltration, and potentially full system compromise.

No patch is available because coordination with the vendor could not be completed. The advisory says the underlying issue should be addressed by implementing proper bounds checking to validate tensor metadata against the actual size of the provided data before any memory operations are performed. As an interim mitigation, it recommends restricting or disabling access to the model upload functionality, especially where environments are exposed to untrusted users or networks, and limiting deployments to local or otherwise trusted network environments where possible. If model uploads are required for operational reasons, the advisory states that only models from trusted and verifiable sources should be accepted, with appropriate validation controls applied to reduce risk.

The advisory credits Jeremy Brown for detecting the vulnerability through AI-assisted vulnerability research. It also notes that the document was written by Timur Snoke. It includes publication dates of 2026-04-22, with the last updated time shown as 2026-04-22 13:09 UTC, and a document revision of 1.