GithubHelp home page GithubHelp logo

tempaa's Introduction

Group 9 Read Me

IPI Defender

IPI defender is a machine learning based system designed to detect and prevent attacks or malicious prompts injected into a system. It uses a combination of natural language processing techniques and deep learning models to classify prompts as either not harmful or malicious.

Functionality

The application is capable of:

  • Loading and Preprocessing Data: The system loads a dataset of prompts with corresponding labels. It preprocesses the text data by converting it to lowercase, removing special characters and digits, and removing stopwords.

  • Feature Extraction: The IPI defender uses pretrained BERT (Bidirectional Encoder Representations from Transformers) to take meaningful representations from the preprocessed text data.

  • Model Training: A LSTM (Long Short-Term Memory) is trained on the taken BERT embeddings to learn to recognize prompts as not harmful and or malicious. The training process includes techniques like random oversampling, to handle imbalance.

  • Prompt Classification: When given a new prompt, the system can process the text, take the BERT embeddings, and using the trained LSTM model to classify if the prompt is either harmless or harmful.

Model Evaluation

The IPI Defender follows a precise approach to train and evaluate its model:

  • Dataset Generation: The load_dataset method generates a reproduced dataset of 100 prompts, with 50 harmless, which is created by generate_filtered_prompt and 50 harmful prompts created by generate_malicious_prompt.

  • Data Preprocessing: The preprocess_data method tokenizes and encodes the test data using the BERT tokenizer, and takes the BERT embeddings for each prompt.

  • Model Training: The train_model method trains the LSTM classifier on the training data.

  • Model Evaluation: After training, the model's performance can be evaluated on the test set using metrics like precision and recall. The model can be tested on the generated dataset, as shown in the example code, where each prompt is classified, and the result is printed.

Demo

To run the demo, please install python requirements from requirements.txt with pip install -r requirements.txt, then run python api.py. Then install node requirements in Reactonaut with npm i then run npm start and visit localhost:3000

Article Code

  • For ‘Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection.’ View on GitHub

Original Paper

  • Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection Large Language Models (LLMs) are increasingly being integrated into various applications. The functionalities of recent LLMs can be flexibly modulated via natural language prompts. This renders them susceptible to targeted adversarial prompting, e.g., Prompt Injection (PI) attacks enable attackers to override original instructions and employed controls. So far, it was assumed that the user is directly prompting the LLM. But, what if it is not the user prompting? We argue that LLM-Integrated Applications blur the line between data and instructions. We reveal new attack vectors, using Indirect Prompt Injection, that enable adversaries to remotely (without a direct interface) exploit LLM-integrated applications by strategically injecting prompts into data likely to be retrieved. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities, including data theft, worming, information ecosystem contamination, and other novel security risks. We demonstrate our attacks' practical viability against both real-world systems, such as Bing's GPT-4 powered Chat and code-completion engines, and synthetic applications built on GPT-4. We show how processing retrieved prompts can act as arbitrary code execution, manipulate the application's functionality, and control how and if other APIs are called. Despite the increasing integration and reliance on LLMs, effective mitigations of these emerging threats are currently lacking. By raising awareness of these vulnerabilities and providing key insights into their implications, we aim to promote the safe and responsible deployment of these powerful models and the development of robust defenses that protect users and systems from potential attacks.

Machine Learning Algorithms

  • BERT (Bidirectional Encoder Representations from Transformers): A pre-trained language model used for extracting meaningful representations from the text data.

  • LSTM (Long Short-Term Memory): A type of network architecture used for the classification model. The LSTM model is responsible for processing the sequence of input embeddings and captured dependencies within the text data so the model can make accurate predictions.

  • Adam Optimizer: This is used to update the model parameters during the training process.

To summarize, the IPI Defender application primarily relies on the BERT language model for feature extraction and an LSTM based learning model for classification.

Limitations and Future Work

While the IPI Defender application represents a significant step towards detecting and defending against potential attacks or harmful prompts injected into systems, it is important to know that there are limitations and areas for future improvement.

Limitations

  • Emerging Research: Indirect Prompt Injection attacks and the even broader field of securing a Large Language Model integrated applications are relatively new areas of research. There is a limited resource of well-made datasets and recognized practices, which made the development difficult to create an effective defense mechanism.

  • Static Model: The current implementation used relies on an already pre-trained LSTM classifier, which may not be able to adapt to evolving attack patterns or new types of harmful prompts. There will always be people finding new ways to destroy the defense mechanisms, which means continuous model updates and retraining.

Future Work

  • Real-World Data Collection: Future efforts should focus on collecting completely diverse real-world prompts which should include harmless and harmful samples. This could be done by collaborating with industry partners and giving public access to datasets related to Cybersecurity incidents.

  • Continuous Learning and Updating the system: Examining and expanding technology to have a better chance of staying ahead of evolving threats, the IPI defender should always be learning and adapting to new attacks. This could happen by implementing online learning techniques and developing mechanisms for seamless model updates without disrupting the systems' daily operation.

Citations

  • Security Vulnerabilities in LLM-Integrated Applications: Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. DOI:10.48550/arXiv.2302.12173

  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. DOI:10.48550/arXiv.1810.04805

tempaa's People

Watchers

Ethan Hindmarsh For President avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.