Netskope Threat Labs details GPT model capabilities in generating malware-related code

Research conducted by Netskope Threat Labs explores the potential for large language models (LLMs) such as GPT-3.5-Turbo and GPT-4 to autonomously generate malicious code, raising questions about the evolution of malware design and its implications for cybersecurity defenses.

Research overview

The study investigated whether LLMs can be used to produce malware functions without embedded hardcoded instructions, focusing first on code generation for defense evasion techniques. Two main experiments were performed: prompting LLMs to create Python scripts capable of process injection and terminating antivirus or Endpoint Detection And Response (EDR) processes, followed by assessing the operational reliability of an LLM-generated script designed for virtualization environment detection.

Findings confirmed that LLMs can produce code aligned with malicious objectives. Both GPT-3.5-Turbo and GPT-4 generated scripts for process injection and disabling AV/EDR systems when subjected to role-based prompt injection methods, which circumvent GPT-4’s built-in content restrictions.

Key findings

The capability of LLMs to create polymorphic code indicates that future malware could significantly reduce reliance on static, detectable instructions embedded in binaries. Although GPT-4’s safeguard mechanisms prevent generating harmful code under normal conditions, these guardrails were bypassed using a contextual prompt that assigned an adversarial persona to the model.

Despite successful code generation, tests revealed that the reliability of LLM-produced malware components is currently limited. Python scripts intended for detecting virtualized or sandboxed environments exhibited moderate success in traditional environments like VMware but performed poorly in modern cloud virtual desktop infrastructures, undermining their operational viability.

Technical challenges and future outlook

This research identified a gap in code robustness generated by LLMs in operational contexts, which currently limits their Standalone (SA) use in fully autonomous malware. Preliminary testing with GPT-5 indicated improvements in code reliability, achieving higher success rates in complex environments; however, this newer model incorporates advanced safeguards that are more resistant to prompt manipulation, impacting the feasibility of generating malicious code on demand.

Netskope Threat Labs indicated that overcoming these newer precautionary layers will require sophisticated techniques in prompt engineering or alternative modeling, topics planned for further study.

Operational impact

If LLMs become capable of reliably generating malware in real-time, attackers could automate polymorphic and evasive behaviors without embedding explicit malicious instructions. This would present new challenges for detection technologies, necessitating adaptive defense strategies to counter dynamically generated threats.

Currently, the balance between code generation capabilities and reliability constraints suggests that fully autonomous LLM-driven malware remains an area requiring additional Research and Development (R&D) from threat actors.

This Blog Signals brief provides an evidence-based summary of research examining the architectural and operational considerations of Large Language Model (LLM) integration into malware, offering insights relevant to enterprise cybersecurity stakeholders monitoring emerging threats.