GithubHelp home page GithubHelp logo

eltociear / belief-localization Goto Github PK

View Code? Open in Web Editor NEW

This project forked from google/belief-localization

0.0 1.0 0.0 485 KB

This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Can Be Injected in Language Models."

License: Apache License 2.0

belief-localization's Introduction

Does Localization Inform Editing?

This repository includes code for the paper Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models. It is built on top of code from the MEMIT repository here.

Table of Contents

  1. Installation
  2. Causal Tracing
  3. Model Editing
  4. Data Analysis

Installation

For needed packages, first create a virtual environment or a conda environment (via third_party/scripts/setup_conda.sh), then run:

cd third_party
pip install -r requirements.txt  
python -c "import nltk; nltk.download(punkt)"

Causal Tracing

We gather causal tracing results from the first 2000 points in the CounterFact dataset, filtering to 652 correctly completed prompts when using GPT-J. The window_sizes argument controls which tracing window sizes to use. To reproduce all GPT-J results in the paper, run tracing experiments with for window sizes 10, 5, 3, and 1. This can be done with the following command:

python -m experiments.tracing \
    -n 2000 \
    --ds_name counterfact \
    --model_name EleutherAI/gpt-j-6B \
    --run 1 \
    --window_sizes "10 5 3 1"

Model Editing Evaluation

We check the relationship between causal tracing localization and editing performance using several editing methods applied to five different variants of the basic model editing problem. The editing methods are:

  • Constrained finetuning with Adam at one layer
  • Constrained finetuning with Adam at five adjacent layers
  • ROME (which edits one layer)
  • MEMIT (which edits five layers)

The editing problems include the original model editing problem specified by the CounterFact dataset (changing the prediction for a given input), as well as a few variants mentioned below.

python3 -m experiments.evaluate \
    -n 2000 \
    --alg_name ROME \
    --window_sizes "1" \
    --ds_name cf \
    --model_name EleutherAI/gpt-j-6B \
    --run 1 \
    --edit_layer -2 \
    --correctness_filter 1 \
    --norm_constraint 1e-4 \
    --kl_factor 1 \
    --fact_token subject_last

Add the following flags for each variation of the experiments:

  • Error Injection: no flag
  • Tracing Reversal: --tracing_reversal
  • Fact Erasure: --fact_erasure
  • Fact Amplification: --fact_amplification
  • Fact Forcing: --fact_forcing

For example, to run with constrained finetuning across 5 layers in order to do Fact Erasure, run:

python3 -m experiments.evaluate \
    -n 2000 \
    --alg_name FT \
    --window_sizes "5" \
    --ds_name cf \
    --model_name EleutherAI/gpt-j-6B \
    --run 1 \
    --edit_layer -2 \
    --correctness_filter 1 \ 
    --norm_constraint 1e-4 \ 
    --kl_factor .0625

Data Analysis

Data analysis for this work is done in R via the data_analysis.ipynb file. All plots and regression analyses in the paper can be reproduced via this file.

Disclaimer

This is not an officially supported Google product.

belief-localization's People

Contributors

asmadotgh avatar peterbhase avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.