GithubHelp home page GithubHelp logo

zjunlp / editbias Goto Github PK

View Code? Open in Web Editor NEW
1.0 4.0 0.0 4.74 MB

EditBias: Debiasing Stereotyped Language Models via Model Editing

License: MIT License

Python 94.41% Shell 5.59%
artificial-intelligence bias debiasing easyedit editbias knowledge-editing model-editing nlp stereotypes trustworthy-ai

editbias's Introduction

EditBias: Debiasing Language Models via Model Editing

๐Ÿ“ƒ Paper ๐Ÿ’ป Code ๐ŸŒ Web

EditBias is an efficient model editing method to eliminate stereotyped bias from language models with small editor networks, including a debiasing loss to guide edits on partial parameters and a remaining loss to maintain the original language modeling abilities during editing. Experimental results show EditBias' excellent performance on debiasing and robustness of gender reverse and semantic generality.

๐Ÿ†• News

  • [Feb 2024] We released the paper and the refined code.
  • [Dec 2023] Our idea was accepted by WiNLP 2023 and posted in EMNLP 2023!
  • [Nov 2023] We released the code.

๐Ÿ“Œ Table of Contents

๐Ÿ› ๏ธ Setup

This codebase uses Python 3.9.18. Other versions may work as well.

Create an environment and install the dependencies:

$ conda create -n editbias python=3.9
$ conda activate editbias
(editbias) $ pip install -r requirements.txt

๐Ÿ’ป EditBias

With StereoSet, editor networks are trained to modify partial parameters for debiasing at first. Then, the trained editor networks are used to conduct edits on language models and produce an unbiased model.

โŒš๏ธ Training Editor Networks

  • Formatted datasets with train/dev/test (gender_test.json, race_test.json, religion_test.json) splits are in data/stereoset.
  • Configurations are in config. Partial parameters to be edited are presented in model.
  • Experimental scripts are in scripts. All hyper-parameters are in the scripts.
  • For the ablation study on the remaining loss, set ifloc as False.
  • Metrics can be found at the end of the training log.

For example, we use the following command to train the editor networks for GPT2-base:

 (editbias) $ bash scripts/gpt2-base.sh >scripts/gpt2-base.log 2>&1
  • The parameters of the trained editor networks are stored in outputs/.../models/....bk. Record the path ending with .bk, like outputs/2024-02-08_18-51-18_4100072340/models/gpt2-.2024-02-08_18-51-18_4100072340.bk, as $p_1$.
  • Metrics can be found at the end of the training log.

๐Ÿš€ Debiasing with Editor Networks

  • Set eval_only as True, archive as $p_1$, and val_set as the path of the test set file. The val_batch_size should be the same as the batch_size in training. See gpt2-base_val.sh for an example.
  • Metrics can be found at the end of the debiasing log.
  • For testing the robustness of gender reverse, set val_set as data/stereoset/gender_test_reverse.json.
  • For testing the semantic generality, set val_set as data/stereoset/xxx_test_syn.json, where xxx is chosen from [gender, race, religion].

For example,

 (editbias) $ bash scripts/gpt2-base_val.sh >scripts/gpt2-base_val.log 2>&1

๐Ÿ‘€ Bias Tracing

Enter bias_tracing

๐Ÿ“ Citation

If this code or paper was useful, please consider using the following citation:

@article{xinxu24EditBias,
    title={EditBias: Debiasing Stereotyped Language Models via Model Editing},
    author={Xin Xu, Wei Xu, Ningyu Zhang},
    year={2024},
    url={https://github.com/zjunlp/EditBias}
}

โœจ Acknowledgements

  • Thanks for the original code from MEND.
  • Thanks for StereoSet and all the baselines from bias-bench.
  • For more model editing methods, please try EasyEdit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.