Self-propagating worm created to target generative AI systems

esteria.white

Researchers have developed a computer worm that targets generative AI (GenAI) applications to potentially spread malware and steal personal data.

The new paper details the worm dubbed “Morris II,” which targets GenAI ecosystems through the use of self-replicating adversarial prompts, leading GenAI systems to deliver payloads to other agents.

Once released, the worm is stored in Recovery Augmented Generation (RAG) and “passively” moves to new targets, without the attackers needing to do anything else – something the authors described as “0-click propagation”.

A RAG application allows a GenAI model to query relevant data from additional sources such as private documents when answering questions and queries, thereby providing more accurate answers.

Researchers from the Israel Institute of Technology, Intuit and Cornell Tech said their work aims to highlight “threats associated with GenAI-based applications and caused by the underlying GenAI layer.”

They added that this risk should be taken into account when designing GenAI ecosystems.

How the Morris II Worm Targets GenAI Systems

THE study was based on the concept of malware powered by self-replicating adversarial prompts, triggering GenAI models to replicate input as output and engage in malicious activities.

The researchers crafted a message consisting of a self-replicating adversarial prompt against GenAI-powered messaging assistants equipped with auto-reply functionality. This message must be able to fulfill the following requirements:

  • Be retrieved by the RAG when replying to new messages
  • Undergo replication during inference performed by the GenAI model
  • Initiate malicious activity predefined by the attacker

This prompt can be generated using jailbreaking techniques at the prompt and token levels defined in previous research and via the Internet. This can allow attackers to “steer” the application’s decision toward the desired activity.

Jailbreaking in this context refers to the practice of users exploiting vulnerabilities in AI chatbot systems, potentially violating ethical guidelines and cybersecurity protocols.

The initial message prompts the GenAI model to generate a response containing the adversarial self-replication prompt and send information about sensitive user data, including emails, addresses, and phone numbers, extracted from the context provided in the query.

Researchers demonstrated the application of Morris II against GenAI-based email assistants in two use cases: spam and personal data exfiltration. They also evaluated the technique in two contexts (black box and white box access), using two types of input data (text and images).

Three different GenAI models were used in the study to test the worm’s capabilities: Google’s Gemini Pro, OpenAI’s ChatGPT 4.0, and open source LLaVA (Large Language Model).

The effectiveness of the technique was evaluated according to two criteria: the performance of malicious activities and the propagation to new hosts.

The researchers suggested that malware could be developed to launch cyberattacks on the entire GenAI ecosystem using this approach.

Countermeasures against self-replicating adversarial prompts

The researchers urged developers of GenAI systems to implement countermeasures against replication and propagation to mitigate this type of threat.

“This process is important to ensure the safe adoption of GenAI technology that promises a worm-free GenAI era,” they wrote.

These recommendations include:

  • Rephrase the entire output in GenAI models to ensure that the output does not consist of similar elements to the input and give the same inference.
  • Implement anti-jailbreak countermeasures to prevent attackers from using known techniques to replicate input into output
  • Use techniques developed to detect malicious propagation patterns associated with computer worms. For RAG based worm, the simplest method is to use a non-active RAG.
Leave a comment