GithubHelp home page GithubHelp logo

ts_watermark's Introduction

Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models

This repository contains the code for our ICML 2024 paper on Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models. You can read the full paper here.

Introduction

We introduce a novel watermarking method for large language models (LLMs), focusing on two primary objectives:

  • Detectability: Measured by the z-score.
  • Semantic Coherence: Assessed by the cosine similarity between the embeddings of watermarked and non-watermarked texts.

These metrics are controlled by two hyperparameters: the split ratio ($\gamma$) and watermark logit ($\delta$). These values are adjusted for different tokens to account for their unique characteristics.

To determine token-specific values for $\gamma$ and $\delta$, we use two lightweight networks: the $\gamma$-generator ($G_\gamma$) and the $\delta$-generator ($G_\delta$). These networks are optimized using a specialized multi-objective optimization framework. Below is an overview of our proposed training method:

overview

Environment Setup

Ensure that all packages listed in requirements.txt are installed in your environment.

Demo

For a quick start, refer to demo.ipynb. This notebook generates watermarked text from a given prompt and computes the z-score, PPL, and SimCSE. Note that this demo is only for OPT models. For llama models, make sure to run watermark.py as instructed below. Our token-specific gamma/delta values were trained on OPT tokenizers, necessitating an additional conversion process (implemented in watermark.py) to evaluate on Llama.

Training

To train the network, run the following command:

bash run_pipeline.sh

Select between Multi-Objective Optimization (MOO) or Weighted Sum for training:

  • For MOO: z_score_factor=1.0
  • For Weighted Sum: z_score_factor=4e-4

Evaluation

Default Settings

  • LLM: OPT-1.7B
  • Sampling: Multinomial sampling with temperature=1.0, top_k=50
  • Dataset: C4 realnewslike official validation split from Hugging Face. It is further divided into our validation and test sets, with the test split as the default.
  • Sample Generation: 500 prompts, with each generates 200 tokens.
  • Batch Size: Default is 20, requiring approximately 25GB of GPU memory for OPT-1.7B model.

To modify default settings, check the config folder. For details on each keyword, refer to config/README.md.

Running Evaluation

Results are stored in the eval folder by default.

  • Our Method:

    CUDA_VISIBLE_DEVICES=0 python watermark.py --config_file config/TS.yaml
    
    • If testing on llama models, in config/TS.yaml, change the model_name_or_path to the desired model or local location, and also change ckpt_path to be ckpt/llama/init_0.25_1.75_default.pth.
    • Adjust watermark strength by choosing different checkpoints in ckpt folder, which were trained from different initializations. Use ckpt/opt to test on OPT models, and ckpt/llama to test on llama models.
  • KGW:

    CUDA_VISIBLE_DEVICES=0 python watermark.py --config_file config/KGW.yaml
    

Citation

If you use this work in your research or applications, please cite it as follows:

@article{huo2024token,
  title={Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models},
  author={Huo, Mingjia and Somayajula, Sai Ashish and Liang, Youwei and Zhang, Ruisi and Koushanfar, Farinaz and Xie, Pengtao},
  journal={arXiv preprint arXiv:2402.18059},
  year={2024}
}

ts_watermark's People

Contributors

mignonjia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

ts_watermark's Issues

Should add entropy thresholding in SWEET detection

Hi, thanks for the great repo.

Unfortunately, I found a bug in the SWEET implementation.

According to the original SWEET paper and its repository, entropy thresholding is applied during the generation phase and detection phase.

However, in your implementation reproducing the SWEET method, lines for applying entropy threshold are omitted in the detection phase.
(c.f., implementation in original SWEET code is here: https://github.com/hongcheki/sweet-watermark/blob/master/sweet.py#L100)

Is it the code you used for your experiments?

If this code is used in your paper's experiment, and if you also identify it as a bug, please consider fixing it and rerun experiments.

Thank you.

how to train δ generator?

forgive me for asking this stupid question, but i really cannot understand how does it train.
while training, LS = − cos_sim(fθ(s), fθ(sw)). but how can you get sw? does it mean a fixed δ to get watermarked text to train or what? i cannot understand. thx!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.