GithubHelp home page GithubHelp logo

anthenamatrix / the-i-exemption-bypassing-llm-ethical-filters Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 1.0 10 KB

The "I" Exemption, is a curious behavior in some LLMs. We discover how these AI systems might shy away from directly assisting with unethical actions if you ask in the first person ("I"). But with a clever rephrase to a general scenario ("they"), they might spill the beans and explain the unethical method.

License: MIT License

the-i-exemption-bypassing-llm-ethical-filters's Introduction

The 'I' Exemption: Bypassing LLM Ethical Filters

The "I" Exemption, is a curious behavior in some LLMs. We discover how these AI systems might shy away from directly assisting with unethical actions if you ask in the first person ("I"). But with a clever rephrase to a general scenario ("they"), they might spill the beans and explain the unethical method.

Overview

This research explores a phenomenon we've termed the "I" Exemption, a quirk in the behavior of some large language models (LLMs) regarding ethical responses. We observed that by phrasing queries in the first person (using "I"), the LLM might trigger a filter that prevents it from directly assisting with unethical actions. However, by rephrasing the question to a third-person perspective (using "people"), the filter appears bypassed, and the LLM may explain the unethical method while still emphasizing its wrongdoings.

Understanding the "I" Exemption

While the exact reasoning behind this behavior remains under investigation, we hypothesize a few possibilities:

  • Protecting Users: The filter might be designed to prevent users from directly carrying out harmful actions through the AI's assistance.
  • Avoiding Liability: By refusing tasks with "I," the LLM could be safeguarding itself and its developers from potential legal or ethical repercussions.
  • Simpler Logic: It's also possible that the system is programmed to handle declarative statements ("people engage") more easily than subjective ones ("I engage").

Bypass Examples

  • ChatGPT 4: Example of the technique being used on ChatGPT4

  • ChatGPT 3.5: Example of the technique being used on ChatGPT 3.5

  • Gemini: Example of the technique being used on Gemini

Importance and Considerations

While the "I" Exemption highlights a programming quirk, it doesn't necessarily imply the LLM truly understands the ethics behind the situation. Here are some crucial points to consider:

  • Malicious Actors: This finding underscores the potential for malicious users to exploit this exemption by rephrasing their questions.
  • Evolving AI Ethics: This discovery emphasizes the need for ongoing development in AI ethics and transparency to address such limitations.

Disclaimer

The Prompt Injection Testing Tool is provided for educational and research purposes only. It should be used responsibly and in compliance with applicable laws and regulations. The authors do not accept any liability for any damages or losses resulting from the use of this tool.

License

This project is licensed under the MIT License.

Support AnthenaMatrix

Bitcoin: bc1qxvvtgz0vf3n2cuxt0suvf39jleegpt9wawxazn

Ethereum: 0xE73E90779B3e8F6D65306B40E02878f437408b4E

BNB: 0xE73E90779B3e8F6D65306B40E02878f437408b4E

Dogecoin: D827LpfJu9pcVc3Kky82sTrNnsE7pLGqeV

Solana: AJtGEJvoVoS2eeqeHQvf7usRs2nSQM1yLtBSdKp1KBY5

Website: https://anthenamatrix.com

the-i-exemption-bypassing-llm-ethical-filters's People

Contributors

anthenamatrix avatar

Stargazers

Kai Lightis avatar Yugandhar Dasari_DEV avatar P47chP1r473 avatar Leonard avatar  avatar Sam Green avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

tri6odin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.