GithubHelp home page GithubHelp logo

ahxt / fair_fairness_benchmark Goto Github PK

View Code? Open in Web Editor NEW
19.0 2.0 2.0 7.34 MB

FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods.

License: MIT License

Python 63.10% Jupyter Notebook 36.90%
demographic-parity fairness

fair_fairness_benchmark's Introduction



Update:

  • [08/23/2023]: update all running logs from our $45,079$ experiments ($14,428$ GPU hours)!
  • [08/18/2023]: add a jupyter notebook tutorial for running FFB!
  • [08/18/2023]: add a step-by-step guideline for running FFB!
  • [08/18/2023]: add NLP task - Jigsaw Toxic Comment Classification!
  • [08/01/2023]: design a logo for FFB!
  • [07/12/2023]: update the datasets and downloading instructions!

1. Overview

The Fair Fairness Benchmark is a PyTorch-based framework for evaluating the fairness of machine learning models. The framework is designed to be simple and customizable, making it accessible to researchers with varying levels of expertise. The benchmark includes a set of predefined fairness metrics and algorithms, but users can easily modify or add new metrics and algorithms to suit their specific research questions. For more information, please refer to our paper FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods.

2. Our Goals

This benchmark aims to be

  • minimalistic
  • hackable
  • beginner-friendly
  • torch-idiomatic
  • reference implementation for researchers
  • ......

3. Fair Fairness Benchmark(FFB)

3.1 Datasets

Please refer to the datasets/readme.md for datasets downloading instructions.

  • UCI Adult: U.S. census data predicting an individual's income over $50K using demographic and financial details.
  • COMPAS: Criminal defendants' records used to predict recidivism within two years.
  • German Credit: Information about credit applicants at a German bank used for credit risk rating prediction.
  • Bank Marketing: Data from a Portuguese bank used to predict client subscription to term deposit.
  • ACS: From the American Community Survey, used for multiple prediction tasks such as income and employment.
  • KDD Census: Like UCI Adult but with more instances, used to predict if an individual’s income is over $50K.
  • CelebFaces Attributes: 20k celebrity face images annotated with 40 binary labels of specific facial attributes.
  • UTKFace: Over 20k face images from diverse ethnicities and ages, annotated with age, gender, and ethnicity.

The statistics of the datasets are as the following:

3.2 In-Processing Group Fairness Methods

  • ERM: Standard machine learning method that minimizes the empirical risk of the training data. Serves as a common baseline for fairness methods.
  • DiffDP, DiffEopp, DiffEodd: Gap regularization methods for demographic parity, equalized opportunity, and equalized odds. These fairness definitions cannot be optimized directly, but gap regularization is differentiable and can be optimized using gradient descent.
  • PRemover: Aims to minimize the mutual information between the prediction accuracy and the sensitive attributes.
  • HSIC: Minimizes the Hilbert-Schmidt Independence Criterion between the prediction accuracy and the sensitive attributes.
  • AdvDebias: Learns a classifier that maximizes the prediction ability and simultaneously minimizes an adversary's ability to predict the sensitive attributes from the predictions.
  • LAFTR: A fair representation learning method aiming to learn an intermediate representation that minimizes the classification loss, reconstruction error, and the adversary's ability to predict the sensitive attributes from the representation.

3.4 Our Results

1. Not all widely used fairness datasets stably exhibit fairness issues. We found that in some cases, the bias in these datasets is either not consistently present or its manifestation varies significantly. This finding indicates that relying on these datasets for fairness analysis might not always provide stable or reliable results.

2.The utility-fairness performance of the current fairness method exhibits trade-offs. We conduct experiments using various in-processing fairness methods and analyze the ability to adjust the trade-offs to cater to specific needs while maintaining a balance between accuracy and fairness.

4. How to Run

4.1 Setup

To install the Fair Fairness Benchmark, simply clone this repository and install the required dependencies by running the following command:

pip install -r requirements.txt

4.2 Run Example

python -u ./ffb_tabular_erm.py --dataset acs --model erm --sensitive_attr age --target_attr income --batch_size 32 --seed 89793 --log_freq 1 --num_training_steps 150
python -u ./ffb_tabular_diffdp.py --dataset acs --model diffdp --sensitive_attr race --target_attr income --batch_size 4096 --lam 1.4 --seed 89793 --log_freq 1 --num_training_steps 150
wait;

5. Step-by-Step Guideline for Running FFB

Step 1: System Preparation

Ensure you have Anaconda or Miniconda installed on your system. If not, download and install from the official Miniconda site. The important python packages are:

pandas==1.5.3
torch==1.13.1+cu116
wandb==0.14.0
scikit-learn==1.2.2
tabulate==0.9.0
statsmodels==0.13.5

Step 2: Clone the Repository

# Navigate to your preferred directory
cd path/to/your/directory

# Clone the repository from GitHub
git clone https://github.com/ahxt/fair_fairness_benchmark.git

Step 3: Setting Up Conda Environment

# Navigate to the cloned directory
cd fair_fairness_benchmark

# Create a new conda environment
conda create --name ffb_env python=3.8

# Activate the environment
conda activate ffb_env

# Install required packages
pip install -r requirements.txt

Step 4: Setting up wandb

Weights & Biases is a tool used for experiment tracking. Our code in the repository uses wandb. We highly recomend to use wandb for tracking, if not, you can just only delete the all lines of codes that inlcudes "wandb". Ples follow these steps to setup wandb.

# Install wandb
pip install wandb

# Login to your wandb account. If you don't have one, you'll be prompted to create it.
wandb login

Step 5: Running the Code

Before running, ensure you've downloaded the necessary datasets as per the instructions in datasets/readme.md.

# Run the first example
python -u ./ffb_tabular_erm.py --dataset acs --model erm --sensitive_attr age --target_attr income --batch_size 32 --seed 89793 --log_freq 1 --num_training_steps 150

# Run the second example
python -u ./ffb_tabular_diffdp.py --dataset acs --model diffdp --sensitive_attr race --target_attr income --batch_size 4096 --lam 1.4 --seed 89793 --log_freq 1 --num_training_steps 150

6. Running Logs

Dataset (Method) name wandb logs #Runs Wall_time/GPU_Hours
Tabular (ERM) exp1.erm https://wandb.ai/fair_benchmark/exp1.erm 190 35
Tabular (Adv) exp1.adv_gr https://wandb.ai/fair_benchmark/exp1.adv_gr 2660 863
Image (ERM) exp2.erm https://wandb.ai/fair_benchmark/exp2.erm 720 166
Image (DiffDP) exp2.diffdp https://wandb.ai/fair_benchmark/exp2.diffdp 5040 1213
Image (DiffEopp) exp2.diffeopp https://wandb.ai/fair_benchmark/exp2.diffeopp 5040 1215
Tabular (ERM) exp1.pr https://wandb.ai/fair_benchmark/exp1.pr 2850 312
Tabular (HSIC) exp1.hsic https://wandb.ai/fair_benchmark/exp1.hsic 2850 749
Image (ERM) exp2.diffeodd https://wandb.ai/fair_benchmark/exp2.diffeodd 5040 2638
Tabular (ERM) exp1.laftr https://wandb.ai/fair_benchmark/exp1.laftr 1902 199
Tabular (DiffEopp) exp1.diffeopp https://wandb.ai/fair_benchmark/exp1.diffeopp 2660 533
Tabular (DiffEodd) exp1.diffeodd https://wandb.ai/fair_benchmark/exp1.diffeodd 2664 529
Image (Pr) exp2.pr https://wandb.ai/fair_benchmark/exp2.pr 5402 2860
Tabular (DiffDP) exp1.diffdp https://wandb.ai/fair_benchmark/exp1.diffdp 2660 523
Image (HSIC) exp2.hsic https://wandb.ai/fair_benchmark/exp2.hsic 5401 2593
Total total https://wandb.ai/fair_benchmark/ 45079 14428

7. Contributing

We welcome contributions from the research community to improve and extend the Fair Fairness Benchmark. If you have an idea for a new metric or algorithm, or would like to report a bug, please open an issue or submit a pull request.

8. License

The Fair Fairness Benchmark is released under the MIT License.


If you find our resources useful, please kindly cite our paper.

@misc{han2023ffb,
      title={FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods}, 
      author={Xiaotian Han and Jianfeng Chi and Yu Chen and Qifan Wang and Han Zhao and Na Zou and Xia Hu},
      year={2023},
      eprint={2306.09468},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

fair_fairness_benchmark's People

Contributors

ahxt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

fair_fairness_benchmark's Issues

Setup CeleBA dataset

Hi, can I ask where I can download the "celebs.csv" dataset included in the line below? Also, it would be really appreciated if dataset setup process can be included in the documentations.

"df = pd.read_csv( os.path.join(path, "celeba.csv"), na_values="NA", index_col=None, sep=",", header=0)"

Thanks!

Datasets Preprocessing

Thank you so much for updating the datasets and the running logs! I noticed that the CelebA dataset has been pre-processed into a .csv file. Would you mind share the details about how the pre-processing was done for CelebA? Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.