GithubHelp home page GithubHelp logo

benw8888 / stegllm Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 19.42 MB

License: Apache License 2.0

Dockerfile 0.08% Python 74.26% Perl 0.99% Shell 1.00% HTML 13.70% Makefile 0.03% Jupyter Notebook 9.94%

stegllm's Introduction

StegLLM

Large Language Models and Beyond Class Project

Our goal is to understand whether language models will learn to hide information in plain sight given reward pressure in a reinforcement learning environment.

Abstract:

Steganography, the art of conveying a secret message while concealing its existence in plain sight, is a capability that would be concerning to see in language models. Language models capable of steganography would be able to communicate and plan without our supervision, posing a safety concern. Therefore, we investigate to what extent current language models can perform steganography in an end-to-end reinforcement learning setting. We compare the performance of LLaMA (Touvron et al., 2023), a language model with 6 billion parameters, with GPT-2 (small) (Radford et al., 2018), a 124 million parameter model. We train the models in a multi-agent setting using Proximal Policy Optimization (Schulman et al., 2017), a popular reinforcement learning technique, and employ optimizations for training large language models including ZeRO (Rajbhandari et al., 2020) and LoRA (Hu et al., 2021). We show that while GPT-2 (small) fails at the steganography task, LLaMA is able to succesfully en- code and decode secret messages, although the steganography scheme employed is limited in complexity. Our work lays the foundation for steganographic capability evaluations for future language models and paints a concerning trend regarding our ability to supervise scaled up models.

stegllm's People

Contributors

benw8888 avatar lenishor avatar

Watchers

Julian Yocum avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.