Prompt injection attack risk for AI

Risks associated with input

Inference

Robustness

New to generative AI

Description

A prompt injection attack forces a generative model that takes a prompt as input to produce unexpected output by manipulating the structure, instructions, or information contained in its prompt.

Why is prompt injection attack a concern for foundation models?

Injection attacks can be used to alter model behavior and benefit the attacker.

Background image for risks associated with input

Example

Manipulating AI Prompts

As per the source article, the UK’s cybersecurity agency has warned that chatbots can be manipulated by hackers to cause harmful real-world consequences (e.g., scams and data theft) if systems are not designed with security. The UK’s National Cyber Security Centre (NCSC) has said there are growing cybersecurity risks of individuals manipulating the prompts through prompt injection attacks. The article cited an example where a user was able to create a prompt injection to find Bing Chat’s initial prompt. The entire prompt of Microsoft’s Bing Chat, a list of statements written by Open AI or Microsoft that determine how the chatbot interacts with users, which is hidden from users, was revealed by the user putting in a prompt that requested the Bing Chat “ignore previous instructions”.

Sources:

The Guardian, August 2023

Parent topic: AI risk atlas

We provide examples covered by the press to help explain many of the foundation models' risks. Many of these events covered by the press are either still evolving or have been resolved, and referencing them can help the reader understand the potential risks and work towards mitigations. Highlighting these examples are for illustrative purposes only.