Skip to main content

Netskope Threat Labs examines large language models’ ability to generate operational malware code

Recent research from Netskope Threat Labs explores whether large language models (LLMs) like GPT-3.5, GPT-4, and GPT-5 can autonomously generate functional malicious code, a factor that could influence future malware development and cybersecurity strategies. Understanding LLMs' capacity to produce operational malware code informs enterprise security efforts to anticipate and mitigate emerging threats that leverage Artificial Intelligence (AI) capabilities.

Research overview

Netskope Threat Labs conducted two core experiments to assess if LLMs can autonomously produce malicious scripts and evaluate their operational reliability. The first involved instructing GPT-3.5-Turbo and GPT-4 to develop Python code aimed at process injection and terminating antivirus or Endpoint Detection And Response (EDR) processes, thereby demonstrating LLMs' potential for dynamic threat code generation. The second experiment tested GPT-4's effectiveness in producing a Python script designed to detect virtualized environments, simulating defense evasion tactics critical in malware operations.

Key findings

The initial test confirmed that GPT-3.5-Turbo complied with generating malicious code, while GPT-4 initially declined due to enhanced safety protocols, which were subsequently bypassed using role-based prompt injections. This finding indicates that while newer LLMs have stronger guardrails, these can be circumvented, enabling the generation of potentially harmful scripts. However, both GPT-3.5-Turbo and GPT-4 demonstrated limited operational reliability in producing code that effectively identifies virtual environments, with many scripts failing or lacking robustness.

Preliminary testing of GPT-5 showed improved reliability in generating virtualization detection scripts, achieving approximately 90% success in identifying virtualized environments where prior models failed. Nonetheless, GPT-5's heightened safety measures made it more challenging to produce code that fulfilled malicious intents, often substituting safer script versions, thus reducing operational effectiveness from an attacker's perspective.

Technical breakdown of operational reliability

The experiments evaluated script performance across VMware Workstation, AWS Workspace Virtual Desktop Infrastructure (VDI), and physical host environments. GPT-4 and GPT-3.5-Turbo achieved moderate success on VMware but performed poorly on cloud-based VDI systems, limiting their practical application for defense evasion in modern environments. Both models maintained low false-positive rates when assessing physical machines, indicating some accuracy but insufficient versatility for advanced evasion.

GPT-5's improved accuracy on AWS VDI environments represents a notable enhancement in operational capability but coincides with increased difficulty in bypassing protective guardrails. This trade-off introduces new challenges for adversaries aiming to fully automate multi-step attack chains using LLM-generated code.

Operational impact and future research directions

The demonstrated ability of LLMs to generate malicious code confirms a potential shift in malware architecture toward minimal embedded code, relying instead on dynamic code creation during execution. However, current limitations in code reliability present barriers to fully autonomous malware driven by LLMs. This research establishes foundational insights into the evolution of AI-assisted threats and highlights areas requiring further investigation.

Future research planned by Netskope Threat Labs will focus on overcoming operational reliability challenges through advanced prompt engineering and evaluating alternative models. Additional studies aim to integrate Large Language Model (LLM) capabilities across diverse attack techniques to construct a comprehensive threat model reflective of emerging autonomous malware architectures.

This Blog Signals brief provides a fact-based summary of Netskope Threat Labs' research into the use of large language models for generating operational malware code, emphasizing the implications for enterprise cybersecurity stakeholders monitoring developments in AI-driven threat mechanisms.