Group 9 Read Me

IPI Defender

IPI defender is a machine learning based system designed to detect and prevent attacks or malicious prompts injected into a system. It uses a combination of natural language processing techniques and deep learning models to classify prompts as either not harmful or malicious.

Functionality

The application is capable of:

Loading and Preprocessing Data: The system loads a dataset of prompts with corresponding labels. It preprocesses the text data by converting it to lowercase, removing special characters and digits, and removing stopwords.
Feature Extraction: The IPI defender uses pretrained BERT (Bidirectional Encoder Representations from Transformers) to take meaningful representations from the preprocessed text data.
Model Training: A LSTM (Long Short-Term Memory) is trained on the taken BERT embeddings to learn to recognize prompts as not harmful and or malicious. The training process includes techniques like random oversampling, to handle imbalance.
Prompt Classification: When given a new prompt, the system can process the text, take the BERT embeddings, and using the trained LSTM model to classify if the prompt is either harmless or harmful.

Model Evaluation

The IPI Defender follows a precise approach to train and evaluate its model:

Dataset Generation: The load_dataset method generates a reproduced dataset of 100 prompts, with 50 harmless, which is created by generate_filtered_prompt and 50 harmful prompts created by generate_malicious_prompt.
Data Preprocessing: The preprocess_data method tokenizes and encodes the test data using the BERT tokenizer, and takes the BERT embeddings for each prompt.
Model Training: The train_model method trains the LSTM classifier on the training data.
Model Evaluation: After training, the model's performance can be evaluated on the test set using metrics like precision and recall. The model can be tested on the generated dataset, as shown in the example code, where each prompt is classified, and the result is printed.

Demo

To run the demo, please install python requirements from requirements.txt with pip install -r requirements.txt, then run python api.py. Then install node requirements in Reactonaut with npm i then run npm start and visit localhost:3000

Article Code

For ‘Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection.’ View on GitHub

Original Paper

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection Large Language Models (LLMs) are increasingly being integrated into various applications. The functionalities of recent LLMs can be flexibly modulated via natural language prompts. This renders them susceptible to targeted adversarial prompting, e.g., Prompt Injection (PI) attacks enable attackers to override original instructions and employed controls. So far, it was assumed that the user is directly prompting the LLM. But, what if it is not the user prompting? We argue that LLM-Integrated Applications blur the line between data and instructions. We reveal new attack vectors, using Indirect Prompt Injection, that enable adversaries to remotely (without a direct interface) exploit LLM-integrated applications by strategically injecting prompts into data likely to be retrieved. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities, including data theft, worming, information ecosystem contamination, and other novel security risks. We demonstrate our attacks' practical viability against both real-world systems, such as Bing's GPT-4 powered Chat and code-completion engines, and synthetic applications built on GPT-4. We show how processing retrieved prompts can act as arbitrary code execution, manipulate the application's functionality, and control how and if other APIs are called. Despite the increasing integration and reliance on LLMs, effective mitigations of these emerging threats are currently lacking. By raising awareness of these vulnerabilities and providing key insights into their implications, we aim to promote the safe and responsible deployment of these powerful models and the development of robust defenses that protect users and systems from potential attacks.

Machine Learning Algorithms

BERT (Bidirectional Encoder Representations from Transformers): A pre-trained language model used for extracting meaningful representations from the text data.
LSTM (Long Short-Term Memory): A type of network architecture used for the classification model. The LSTM model is responsible for processing the sequence of input embeddings and captured dependencies within the text data so the model can make accurate predictions.
Adam Optimizer: This is used to update the model parameters during the training process.

To summarize, the IPI Defender application primarily relies on the BERT language model for feature extraction and an LSTM based learning model for classification.

Limitations and Future Work

While the IPI Defender application represents a significant step towards detecting and defending against potential attacks or harmful prompts injected into systems, it is important to know that there are limitations and areas for future improvement.

Limitations

Emerging Research: Indirect Prompt Injection attacks and the even broader field of securing a Large Language Model integrated applications are relatively new areas of research. There is a limited resource of well-made datasets and recognized practices, which made the development difficult to create an effective defense mechanism.
Static Model: The current implementation used relies on an already pre-trained LSTM classifier, which may not be able to adapt to evolving attack patterns or new types of harmful prompts. There will always be people finding new ways to destroy the defense mechanisms, which means continuous model updates and retraining.

Future Work

Real-World Data Collection: Future efforts should focus on collecting completely diverse real-world prompts which should include harmless and harmful samples. This could be done by collaborating with industry partners and giving public access to datasets related to Cybersecurity incidents.
Continuous Learning and Updating the system: Examining and expanding technology to have a better chance of staying ahead of evolving threats, the IPI defender should always be learning and adapting to new attacks. This could happen by implementing online learning techniques and developing mechanisms for seamless model updates without disrupting the systems' daily operation.

Citations

Security Vulnerabilities in LLM-Integrated Applications: Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. DOI:10.48550/arXiv.2302.12173
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. DOI:10.48550/arXiv.1810.04805

ethnh / tempaa Goto Github PK

tempaa's Introduction

Group 9 Read Me

IPI Defender

Functionality

Model Evaluation

Demo

Article Code

Original Paper

Machine Learning Algorithms

Limitations and Future Work

Limitations

Future Work

Citations

tempaa's People

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs