Topic: mechanistic-interpretability Goto Github
Some thing interesting about mechanistic-interpretability
Some thing interesting about mechanistic-interpretability
mechanistic-interpretability,Solution to ML assignments from the Alignment Research Engineering Accelerator (ARENA) in-person program
User: alejoacelas
mechanistic-interpretability,Interpretability on 1-layer Transformer models that converge on the Bayesian-optimal solution for statistical tasks
User: alejoacelas
mechanistic-interpretability,Reversed-engineered Transformer models as a benchmark for interpretability methods
User: alejoacelas
mechanistic-interpretability,Starting Kit for the CodaBench competition on Transformer Interpretability
User: alejoacelas
mechanistic-interpretability,Organizer's repository for the Transformer Interpretability CodaBench competition
User: alejoacelas
mechanistic-interpretability,🦠 DeepDecipher: An open source API to MLP neurons
Organization: apartresearch
Home Page: https://apartresearch.com
mechanistic-interpretability,🧠 Starter templates for doing interpretability research
Organization: apartresearch
Home Page: https://alignmentjam.com/jam/interpretability
mechanistic-interpretability,CausalGym: Benchmarking causal interpretability methods on linguistic tasks
User: aryamanarora
Home Page: https://arxiv.org/abs/2402.12560
mechanistic-interpretability,Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages"
Organization: batsresearch
Home Page: https://arxiv.org/abs/2406.16235
mechanistic-interpretability,Exploring length generalization in the context of indirect object identification (IOI) task for mechanistic interpretability.
User: cx0
mechanistic-interpretability,Identifying Circuit behind Pronoun Prediction in GPT-2 Small
User: daspartho
mechanistic-interpretability,A mechanistic interpretability study invvestigating a sequential model trained to play the board game Othello
User: deanhazineh
mechanistic-interpretability,Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
Organization: epfl-dlab
mechanistic-interpretability,graphpatch is a library for activation patching on PyTorch neural network models.
User: evan-lloyd
mechanistic-interpretability,Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals
User: francescortu
Home Page: https://arxiv.org/abs/2402.11655
mechanistic-interpretability,Interpreting how transformers simulate agents performing RL tasks
User: jbloomaus
Home Page: https://jbloomaus-decisiontransformerinterpretability-app-4edcnc.streamlit.app/
mechanistic-interpretability,PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
User: koayon
mechanistic-interpretability,A project that simulates a game of shuffling cups with a hidden ball underneath one of them. It also trains a Transformer based deep learning model to predict the final position of the ball after a series of swaps.
User: lejoon
mechanistic-interpretability,CoSy: Evaluating Textual Explanations
User: lkopf
mechanistic-interpretability,Visualising (self)-attention as a vector field: exploring and building intuition. Based on anvaka.github.io/fieldplay.
User: matthiasdellago
mechanistic-interpretability,Explain a black-box module in natural language.
Organization: microsoft
Home Page: https://arxiv.org/abs/2305.09863
mechanistic-interpretability,This repository contains the code used for the experiments in the paper "Discovering Variable Binding Circuitry with Desiderata".
User: nix07
Home Page: https://dcm.baulab.info/
mechanistic-interpretability,This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking".
User: nix07
Home Page: https://finetuning.baulab.info
mechanistic-interpretability,For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
Organization: openmoss
mechanistic-interpretability,Mechanistically interpretable neurosymbolic AI (Nature Comput Sci 2024): losslessly compressing NNs to computer code and discovering new algorithms which generalize out-of-distribution and outperform human-designed algorithms
User: pauljblazek
Home Page: https://rdcu.be/dy2Go
mechanistic-interpretability,This repository collects all relevant resources about interpretability in LLMs
User: ruizheliuoa
mechanistic-interpretability,Physiological modeling into the metaverse of Mycobacterium tuberculosis beta CA inhibition mechanism
User: sagarss24
mechanistic-interpretability,Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Organization: stanfordnlp
Home Page: http://pyvene.ai
mechanistic-interpretability,Steering vectors for transformer language models in Pytorch / Huggingface
Organization: steering-vectors
Home Page: https://steering-vectors.github.io/steering-vectors/
mechanistic-interpretability,Sparse and discrete interpretability tool for neural networks
User: taufeeque9
Home Page: https://huggingface.co/spaces/taufeeque/codebook-features
mechanistic-interpretability,Sparse probing paper full code.
User: wesg52
Home Page: https://arxiv.org/abs/2305.01610
mechanistic-interpretability,Universal Neurons in GPT2 Language Models
User: wesg52
mechanistic-interpretability,Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.
User: yash-srivastava19
Home Page: https://arrakis-mi.readthedocs.io/en/latest/README.html
mechanistic-interpretability,Implementation for the paper "Understanding and Patching Compositional Reasoning in LLMs" @ ACL2024-Findings, Bangkok, Thailand.
User: zhaoyi-li21
Home Page: https://arxiv.org/abs/2402.14328
mechanistic-interpretability,A replication of "Toy Models of Superposition," a groundbreaking machine learning research paper published by authors affiliated with Anthropic and Harvard in 2022.
User: zroe1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.