Attacks against Generative AI Systems
Adversarial machine learning presents a unique set of challenges where attackers aim to exploit AI models through ingenious methods. With prompt injection, an attacker could bypass safeguards or exploit vulnerabilities in the model’s design, its training data, or the system it operates within, leading to unauthorized actions or disclosures. Data poisoning is another critical threat where malicious data is introduced to corrupt the model’s learning process, leading to flawed outputs or breaches of sensitive information.
The Open Worldwide Application Security Project (OWASP) identifies several critical issues necessitating robust Data+AI safeguards:
- Prompt Injections: This issue involves malicious users injecting undesired commands or inputs into AI systems, potentially leading to unauthorized responses.
- Insecure Output Handling: Refers to the improper sanitization or handling of data generated by AI systems, leading to vulnerabilities such as cross-site scripting or information leakage.
- Training Data Poisoning: Involves tampering with the data used to train AI models, aiming to skew their behavior or decision-making process in a harmful way.
- Denial of Service: This concerns attacks that overwhelm AI systems, rendering them unavailable to legitimate users, often by exploiting resource-intensive processes.
- Supply Chain Vulnerabilities: The LLM application lifecycle can be compromised by vulnerable components or services, leading to security attacks.
- Sensitive Data Leakage or Information Disclosure: Occurs when sensitive information is inadvertently exposed by AI systems, due to flaws in data handling or privacy controls.
- Insecure Plugin Design: LLM plugins can have insecure inputs and insufficient access control due to a lack of application control.
- Excessive Agency: Arises from excessive functionality, permissions, or autonomy granted to LLM-based systems, leading to unintended consequences.
- Overreliance: Refers to the excessive trust in AI systems’ outputs without adequate oversight, leading to the dissemination of inappropriate or harmful content.
- Model Theft: Involves unauthorized access, copying, or exfiltration of proprietary LLM models.