VU#915947: SGLang is vulnerable to remote code execution when rendering chat templates from a model file

SGLang has a remote code execution vulnerability in its reranking endpoint, /v1/rerank. The issue involves rendering chat templates from model files, and successful exploitation can result in arbitrary code execution within the SGLang service context.

CVE-2026-5760 tracks the problem. SGLang serves large language models and multimodal AI models and includes reranking using a cross-encoder model, where the /v1/rerank endpoint reranks documents based on their relevance to a query. An attacker creates a malicious GPT Generated Unified Format (GGUF) model file with a crafted tokenizer.chat_template parameter containing a Jinja2 server-side template injection (SSTI) payload that uses a trigger phrase to activate the vulnerable code path. The tokenizer.chat_template metadata defines how text is structured before processing. After the victim downloads and loads the model in SGLang, a request that reaches /v1/rerank causes the malicious template to be rendered, executing the attacker’s arbitrary Python code on the server. The underlying cause is the use of jinja2.Environment() without sandboxing in the getjinjaenv() function, which does not restrict arbitrary Python code execution. The described condition applies when a malicious model file containing a crafted tokenizer.chattemplate is loaded and the reranking endpoint is accessed.

An attacker can create a malicious model for SGLang to achieve RCE. Successful exploitation could allow arbitrary code execution in the context of the SGLang service, potentially leading to host compromise, lateral movement, data exfiltration, or denial-of-service (DoS) attacks. Deployments that expose the affected interface to untrusted networks are at the highest risk of exploitation.

Mitigation guidance recommends using ImmutableSandboxedEnvironment instead of jinja2.Environment() to render the chat templates, preventing execution of arbitrary Python code on the server. No response or patch was obtained during the coordination process.

The coordination effort reported no response from the project maintainers, and the advisory notes that no patch information was provided. The report also credits the reporter, Stuart Beck, and states the document was written by Christopher Cullen.