GithubHelp home page GithubHelp logo

casszhao / automatic-circuit-discovery Goto Github PK

View Code? Open in Web Editor NEW

This project forked from arthurconmy/automatic-circuit-discovery

0.0 0.0 0.0 114.61 MB

License: MIT License

Shell 0.02% Python 6.56% TeX 0.02% Makefile 0.43% Jupyter Notebook 92.96% Dockerfile 0.01%

automatic-circuit-discovery's Introduction

Python Open Pull Requests

Automatic Circuit DisCovery

This is the accompanying code to the paper "Towards Automated Circuit Discovery for Mechanistic Interpretability" (NeurIPS 2023 Spotlight).

  • โšก To run ACDC, see acdc/main.py, or this Colab notebook
  • ๐Ÿ”ง To see how edit edges in computational graphs in models, see notebooks/editing_edges.py or this Colab notebook
  • โ‡๏ธ To understand the low-level implementation of completely editable computational graphs, see this Colab notebook or notebooks/implementation_demo.py

This library builds upon the abstractions (HookPoints and standardised HookedTransformers) from TransformerLens ๐Ÿ”Ž

Installation:

First, install the system dependencies for either Mac or Linux.

Then, you need Python 3.8+ and Poetry to install ACDC, like so

git clone git+https://github.com/ArthurConmy/Automatic-Circuit-Discovery.git
cd Automatic-Circuit-Discovery
poetry env use 3.10      # Or be inside a conda or venv environment
                         # Python 3.10 is recommended but use any Python version >= 3.8
poetry install

System Dependencies

๐Ÿง Ubuntu Linux

sudo apt-get update && sudo apt-get install libgl1-mesa-glx graphviz build-essential graphviz-dev

You may also need apt-get install python3.x-dev where x is your Python version (also see the issue and pygraphviz installation troubleshooting)

๐ŸŽ Mac OS X

On Mac, you need to let pip (inside poetry) know about the path to the Graphviz libraries.

brew install graphviz
export CFLAGS="-I$(brew --prefix graphviz)/include"
export LDFLAGS="-L$(brew --prefix graphviz)/lib"

Reproducing results

To reproduce the Pareto Frontier of KL divergences against number of edges for ACDC runs, run python experiments/launch_induction.py. Similarly, python experiments/launch_sixteen_heads.py and python subnetwork_probing/train.py were used to generate individual data points for the other methods, using the CLI help. All these three commands can produce wandb runs. We use notebooks/roc_plot_generator.py to process data from wandb runs into JSON files (see experiments/results/plots_data/Makefile for the commands) and notebooks/make_plotly_plots.py to produce plots from these JSON files.

Tests

From the root directory, run

pytest -vvv -m "not slow"

This will only select tests not marked as slow. These tests take a long time, and are good to run occasionally, but not every time.

You can run the slow tests with

pytest -s -m slow

Contributing

We welcome issues where the code is unclear!

If your PR affects the main demo, rerun

chmod +x experiments/make_notebooks.sh
./experiments/make_notebooks.sh

to automatically turn the main.py into a working demo and check that no errors arise. It is essential that the notebooks converted here consist only of #%% [markdown] markdown-only cells, and #%% cells with code.

Citing ACDC

If you use ACDC, please reach out! You can reference the work as follows:

@inproceedings{conmy2023automated,
      title={Towards Automated Circuit Discovery for Mechanistic Interpretability}, 
      author={Arthur Conmy and Augustine N. Mavor-Parker and Aengus Lynch and Stefan Heimersheim and Adri{\`a} Garriga-Alonso},
      booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
      year={2023},
      eprint={2304.14997},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

TODO

Mostly finished TODO list

[ x ] Make TransformerLens install be Neel's code not my PR

[ x ] Add hook_mlp_in to TransformerLens and delete hook_resid_mid (and test to ensure no bad things?)

[ x ] Delete arthur-try-merge-tl references from the repo

[ x ] Make notebook on abstractions

[ ? ] Fix huge edge sizes in Induction Main example and change that occurred

[ x ] Find a better way to deal with the versioning on the Colabs installs...

[ ] Neuron-level experiments

[ ] Position-level experiments

[ ] Edge gradient descent experiments

[ ] Implement the circuit breaking paper

[ x ] tracr and other dependencies better managed

[ ? ] Make SP tests work (lots outdated so skipped) - and check SubnetworkProbing installs properly (no init.pys !!!)

[ ? ] Make the 9 tests also failing on TransformerLens-main pass

[ x ] Remove Codebase under construction

automatic-circuit-discovery's People

Contributors

0amp avatar adamyedidia avatar aengusl avatar afspies avatar alan-cooney avatar aprillion avatar arthurconmy avatar arthurdupe avatar aslvrstn avatar avariengien avatar ckkissane avatar daspartho avatar derpyplops avatar dkamm avatar glerzing avatar jaybaileycs avatar jbloomaus avatar joelburget avatar matthewbaggins avatar meg-tong avatar neelnanda-io avatar rhaps0dy avatar rusheb avatar seanwentzel avatar slavachalnev avatar smithjessk avatar stefan-heimersheim avatar ufo-101 avatar xmaster6y avatar zshn-gvg avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.