dssg / aequitas Goto Github PK

View Code? Open in Web Editor NEW

663.0 43.0 111.0 1000.98 MB

Bias Auditing & Fair ML Toolkit

Home Page: http://www.datasciencepublicpolicy.org/aequitas/

License: MIT License

Python 72.71% HTML 5.43% Shell 0.52% CSS 4.18% JavaScript 14.68% Dockerfile 0.01% Jupyter Notebook 0.68% SCSS 1.78%

fairness bias machine-bias fairness-testing

aequitas's People

Stargazers

Watchers

Forkers

afcarl nishkavijay deparkes ghostintheshellarise polinapolina leosonh d4g-no dataforgood-norway nkundiushuti ml-lab koolhead17 cjiajiazhuiqiu kivuhub kalyserge pratixaadesai juanhernandz arshdeep9 aculich aideenf darg0001 aimachinelearn qunderriner rjof fernandascovino nullnotfound pxzhang94 yunque harrisonwilde nehasontakk ghzsa jaeyk sdjoko yqrickw10 nirupam1sharma fagan2888 martingoldthen rodriguit pranika98 khanhnham lifaythegoblin stjordanis andrefcruz janafe rfox12 madhugraj rgtippens stat4jason giuliocmsanto franxan meowmeow114 shawnsyms bawow50 fkaya92 kiminh ysongsta patrickocr ishasameen arjunroyihrpa absharma23 reemalattas ercshieh s-hosseini hua1846 jphall663 monjoybme bryan3387 qualitymeasurement ksoleman manikant92 thomasbhard hansamos hegu2692 ahmtcnbs sailfish009 hieuqtran hailuteju 078datascience gapdata radovankavicky lijinfeng0713 nyujwc331 allenpu arickmunoz15 antoinelb ianliu0313 joaopalmeiro manu87ds jbarsotti amadabhu aahmadai detasar mfcardenas 00mjk furrutiav capriele jpherrerap 5l1v3r1 valmik-patel quantumai-asean miriamspsantos

aequitas's Issues

get_crosstabs should run for 1 model, 1 parameter each time

Treatment of Group Metric of Zero and/ or NA

Determine desired behavior of Aequitas methods including visualization for when group metrics equal zero and when they are NA

get it set up for conda install

clarify rank_pct values in documentation

Edit documentation to clarify that rank_pct input values must be floats between 0 and 1

Data formatting: Documentation-link not found

Hi,

in the Data Formatting section to preprocess input-data for Aequitas the documentation link provided does not work. Coud you please update it? Thanks.

Verify Integrity of the input df, e.g. column names (in both csv and db mode)

final metrics naming? Prevalence or Base Rate? PPrev or PBR?

Automatic detection of score vs binary prediction (implies automatic parameter value)

print tolerance threshold on graphs that turn disparities into fair or unfair

Calculate ranking abs and pct within the Group class.

plotting disparities is not passing the model_id to _locate_ref_group_indices

Fairness Measures with type (supervised vs unsupervised) and retrieve supported

fairness.get_fairness_measures_supported(self, input_df)
make it more flexible and less hardcoded.

actually the all fairness class should be agnostic of the actual names of columns

Documentation clarity

I'm working through using Aequitas for a bias audit and am noticing a few issues in the documentation:

The Python API preprocessing input statement is incorrect, it should be from Aequitas.preprocessing import preprocess_input_df, i.e. preprocess_input_df should not be callable
The documentation says that attribute columns can be categorical or continuous, this is confusing in terms of when cleaning functions are called in the pre-processing step. For instance I had made age categories using the cut function that returned a categorical datatype, which caused the discretize cleaning function to be called which then threw an error. It should be clear that categorical columns must be converted to strings.
Import statements are missing for the Plot and Bias modules; the docs should include from aequitas.bias import Bias and from aequitas.plotting import Plot in those example calls.
The call for the visualization of the disparity treemaps doesn't work as is, it should be j = p.plot_disparity_all(.... rather than j = aqp.plot_disparity_all(... to match the earlier use of Plot().

[COMPAS Example] Switched variable names

https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb

Under "Levels of recidivism" you write label_by_age = sns.countplot(x="sex", hue="label_value", data=df, palette=aq_palette) and then label_by_sex = sns.countplot(x="age_cat", hue="label_value", data=df, palette=aq_palette). It seems like the names were switched.

Also I can make a pull request for the issues I am filing.

Pandas Warnings in CLI

There is a warning associated with copying dataframes. See the bottom of the CLI output.

CLI OUTPUT:

`############################################################################

Center for Data Science and Public Policy

http://dsapp.uchicago.edu

Copyright © 2018. The University of Chicago. All Rights Reserved.

############################################################################

                ___                    _ __            
               /   | ___  ____ ___  __(_) /_____ ______
              / /| |/ _ \/ __ `/ / / / / __/ __ `/ ___/
             / ___ /  __/ /_/ / /_/ / / /_/ /_/ (__  ) 
            /_/  |_\___/\__, /\__,_/_/\__/\__,_/____/  
                          /_/

                  Bias and Fairness Audit Tool

/mnt/data/users/aanisfeld/venv/aequitas_env/lib/python3.4/site-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi.
""")
Welcome to Aequitas-Audit
Fairness measures requested: Statistical Parity,Impact Parity,FPR Parity,FDR Parity
model_id, score_thresholds 1 {'rank_abs': [150, 300, 500, 1000], 'rank_pct': [1.0, 2.0, 5.0, 10.0]}
COUNTS::: gender
F 82601
M 310418
dtype: int64
COUNTS::: race
Black 90769
Hispanic 206528
Other 42483
White 53239
dtype: int64
COUNTS::: age
18-25 138154
26-35 129956
36-45 75097
46-55 34431

55 15381
dtype: int64
audit: df shape from the crosstabs: (88, 26)
get_disparity_predefined_group()
Any NaN?: True
bias_df shape: (88, 46)
Fairness Threshold: 0.8
Fairness Measures: ['Statistical Parity', 'Impact Parity', 'FPR Parity', 'FDR Parity']
get_group_value_fairness: No Parity measure input found on bias_df
{'Unsupervised Fairness': False, 'Supervised Fairness': False, 'Overall Fairness': False}
****************** 0 0.21
1 0.79
2 0.21
3 0.79
4 0.21
5 0.79
6 0.21
7 0.79
8 0.21
9 0.79
10 0.21
11 0.79
12 0.21
13 0.79
14 0.21
15 0.79
16 0.23
17 0.53
18 0.11
19 0.14
20 0.23
21 0.53
22 0.11
23 0.14
24 0.23
25 0.53
26 0.11
27 0.14
28 0.23
29 0.53
...
58 0.35
59 0.33
60 0.19
61 0.09
62 0.04
63 0.35
64 0.33
65 0.19
66 0.09
67 0.04
68 0.35
69 0.33
70 0.19
71 0.09
72 0.04
73 0.35
74 0.33
75 0.19
76 0.09
77 0.04
78 0.35
79 0.33
80 0.19
81 0.09
82 0.04
83 0.35
84 0.33
85 0.19
86 0.09
87 0.04
Name: group_size_pct, Length: 88, dtype: object
/mnt/data/users/aanisfeld/venv/aequitas_env/lib/python3.4/site-packages/pandas-0.21.0-py3.4-linux-x86_64.egg/pandas/core/indexing.py:194: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
/mnt/data/users/aanisfeld/venv/aequitas_env/lib/python3.4/site-packages/aequitas-0.23.0-py3.4.egg/aequitas_cli/utils/report.py:126: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
aux_df.loc[idx, col] = 'Ref'`

create tables with replace

Issue with writing new rows to table when previous rows in column had null value. Postgres expects double precision when col should be boolean per error message below.

  File "/home/ubuntu/.local/lib/python3.6/site-packages/ohio/ext/pandas.py", line 96, in to_sql_method_pg_copy_to
    cursor.copy_expert(sql, csv_buffer)
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type double precision: "True"
CONTEXT:  COPY aequitas_group, line 1, column FOR Parity: "True"

update readme to list different ways aequitas can be used and requirements for each

we can copy this from the documentation but would be good for people at first glance to see how it can be used (cmd line, web demo, python)

[COMPAS Example] Inconsistency in "Visualizing a single absolute group metric across all population groups"

About https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb

In the part "Visualizing a single absolute group metric across all population groups" you say "We can see from the longer bars that across 'age_cat', 'sex', and 'race' attributes, the groups COMPAS incorrectly predicts as 'low' or 'medium' risk most often are 25-45, Male, and African American.".

You display the FNR, which evaluates which groups have incorrectly been assigned 'low' or 'medium' scores as you say - but what you say about the bar length doesn't add up. Are you talking about the absolute numbers instead of the bar lengths? Or am I missing something?

[Resolved] get_disparity_predefined_group() raises TypeError

Hi everyone,

I am a CS student and I am trying out your Aequitas COMPAS analysis example.

When I run https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb on my local machine (Python 3), when trying to execute the cell with code

bdf = b.get_disparity_predefined_groups(xtab, original_df=df, 
                                        ref_groups_dict={'race':'Caucasian', 'sex':'Male', 'age_cat':'25 - 45'}, 
                                        alpha=0.05, check_significance=True, 
                                        mask_significance=True)
bdf.style

I get the error

TypeError: Input must be Index or array-like

With the following details:

get_disparity_predefined_group()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-46-a7be4e5ab5de> in <module>
      2                                         ref_groups_dict={'race':'Caucasian', 'sex':'Male', 'age_cat':'25 - 45'},
      3                                         alpha=0.05, check_significance=True,
----> 4                                         mask_significance=True)
      5 bdf.style

C:\Program_Files\Anaconda3\lib\site-packages\aequitas\bias.py in get_disparity_predefined_groups(self, df, original_df, ref_groups_dict, key_columns, input_group_metrics, fill_divbyzero, check_significance, alpha, mask_significance)
    370             # for predefined groups, use the largest of the predefined groups as
    371             # ref group for score and label value
--> 372             check_significance = df_cols.intersection(check_significance).tolist()
    373 
    374             # compile dictionary of reference groups based on bias-augmented crosstab

C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in intersection(self, other, sort)
   2391         """
   2392         self._validate_sort_keyword(sort)
-> 2393         self._assert_can_do_setop(other)
   2394         other = ensure_index(other)
   2395 

C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _assert_can_do_setop(self, other)
   2590     def _assert_can_do_setop(self, other):
   2591         if not is_list_like(other):
-> 2592             raise TypeError('Input must be Index or array-like')
   2593         return True
   2594

Does this run on your own Notebook? If so, any guesses why it doesn't work for me?

EDIT:
The problem seems to be that when I installed Aequitas using pip it did not install the up to date version? My version is 38.0 and apparently there is a newer version 38.1.

EDIT2:
Yup, it works now, updating the version of aequitas was it :)

get_disparity_min_metric squashes any score_threshold configuration in groups model

If multiple 'score_thresholds' are configured in group crosstabs, they get squashed if passed to 'get_disparity_min_metric'.

from aequitas.preprocessing import preprocess_input_df
from aequitas.bias import Bias
from aequitas.group import Group
import pandas as pd

protected_df = pd.read_csv('compas_for_aequitas.csv')
g = Group()
score_thresholds = {'rank_abs': [25], 'rank_pct': [50]}
df, attr_cols = preprocess_input_df(protected_df)
groups_model, attr_cols = g.get_crosstabs(df, score_thresholds=score_thresholds, model_id=45, attr_cols=attr_cols)
bias = Bias()
bias_df = bias.get_disparity_min_metric(groups_model, df)
bias_df

The bias_df in this case does end up with more rows than if we were to only give it one score threshold, but they all have the same resulting score_threshold and k values.

This problem does not seem to apply to get_disparity_major_group.

Tableau template for visualization

Update License

specifying columns check_significance

when trying to specify the 'for' and 'fpr' columns for the check_significance argument of the get_disparity_predefined_groups method of the Bias class, I am running into a number of key errors for columns that have not been included ('fpr', 'precision', and 'tpr' to start). It seems that the user cannot specify a small number of cols for check significance

split out "extra" installation requirements

A basic installation of the aequitas distribution should only satisfy the requirements of the aequitas package:

pip install aequitas

Having done the above, import aequitas_cli should raise an ImportError (either because the package is not installed or because it raises this error due to unsatisfied dependencies).

To install the CLI, (etc.):

pip install aequitas[cli]

Implement individual fairness metrics

This issue it's about creating a new class maybe named "Individual" that implements individual notions of fairness based on label differences (impurities) for similar individuals. Each method of the class just needs a list of dataframes as input (let's consider that in the future we might want to compare multiple train/test sets labels) and finds similar data points and then look to the label distribution of the pair/cluster.

Cynthia Dwork's notion of individual fairness (Lipschitz condition).
sub methods:

create pairwise distance metric in feature space
create pairwise distance metric in output space
some sort of aggregator
e.g. count number of times the lipshitz condition is not met for each point, normalize and average?

Matching methods to find similar data points and then calculate label purity.
sub methods:

Create clusters (start with k-means)
Calculate purity metric of labels within a cluster (output k metrics)
Visualize clusters (if not 2-d use principal components?)
Visualize the purity metric per cluster

Question: Can we use the fairness tree for regression tasks?

On the fairness tree, it asks whether we trust the labels.

Could we use this for regression tasks? I thought discrete class labels are for classification tasks.

Kind regards,
Alex

Statistical Significance of Metrics

Add methods for calculating statistical significance of scores, labels, TP, TF, FP, FN

[COMPAS Example] Inconsistency in "Visualizing default absolute group metrics across all population groups"

In https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb
Under "Visualizing default absolute group metrics across all population groups"
Under "Default absolute group metrics"

You say "We can also see that the model is equally likely to predict a woman as 'high' risk as it is for a man (false positive rate FPR of 0.32 for both Male and Female)."

As far as I understood the rate that measures the likelyhood to predict someone as 'high' risk is the PPR (Predicted Positive Rate) and the FPR measures how likely it is that someone was incorrectly predicted as 'high' risk. Did you mean to say "equally likely to incorrectly predict"?

[Solved] meaning of get_crosstabs() param "score_thresholds", "score"

Hi everyone,

I am analyzing whether an API performs equally well for all groups. So I have a data set where I have continuous "score" values between 0 and 1. They are predictions made by the API for it's result being correct. Then there is the "label_value" which is either 0 (meaning that the API result was incorrect) or 1 (meaning that the API result was correct).

Now if I calculate the true positives for instance for women by hand I do:
f_tp = df[((df['sex'] == 'Female') & (df['label_value'] == 1) & (df['score'] >= t))]
where t is my treshold of 0.8. So I consider a score of 0.8 to be a prediction of being correct.

I do similar stuff to calculate tn, fp, fn:

f_tn = df[((df['sex'] == 'Female') & (df['label_value'] == 0) & (df['score'] < t))]
f_fp = df[((df['sex'] == 'Female') & (df['label_value'] == 0) & (df['score'] >= t))]
f_fn = df[((df['sex'] == 'Female') & (df['label_value'] == 1) & (df['score'] < t))]

And I obtain:

tp = 1185
tn = 43
fp = 63
fn = 104

Now I would like to make these calculations automatically using Aequitas.

g = Group()
xtab, _ = g.get_crosstabs(df, attr_cols=["sex"], score_thresholds= {'score': [0.8]})

In the result, I get values similar to those I got by hand but they are switched around :

tp = 104
tn = 63
fp = 43
fn = 1185

Here, tp is what I considered to be fn and fn is what I considered to be tp. What I considered to be tn is fp and what I considered to be fp is tn.

I can't quite wrap my mind around this. What am I missing? I think I might be confusing something? I'd be so happy if you could give me a hint.

Thanks!

Separate Statistical Significance DF

To Explore

May want to have an option return of information based on statistical significance parameters and outputs such as:

Group size
P-value
T/F significant
Which method used for equal variance calculation
Which method used for t-test, equal variance with ref group Y/N
Which group compared to if not ref group.
- Currently can only compares to ref group, may want to change that as well

Absolute vs. relative disparity and the implications for fairness results

According to Verma & Rubin 2018 PPV + FDR = 1 and thus if there is PPV Parity there is also FDR Parity.
However, Aequitas sais there is no FDR Parity, but PPV Parity:

I understand that using Aequitas it looks like there was no FDR Parity because Aequitas looks at relative disparities while Verma & Rubin talk about absolute differences. Why does Aequitas prefer relative disparities?

Aequitas CLI Config requires secrets

Aequitas CLI config file currently requires the DB credentials in the file. Should be updated to reference an external secrets file.

Test fails

Test: test_all_0_scores_4 fails.
The reason for it is that in group.get_crosstabs there is a line calling for df['score'].value_counts()[1.0], however the values are all 0s and so there is no threshold.
Not sure if the fix is to change the test or the check in group.

Confusion of Statistical Parity and Impact Parity?

Here: https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb
Under "How do I assess model fairness?" > "Pairities Calcuated" it sais

Predicted Positive Ratio k Parity	Statistical Parity

According to your definitions Predicted Positive Ratio k is PPR and Predicted Positive Ratio g is PPrev. So Aequitas defines Statistical Parity as PPR Parity That is not the definition according to Verma & Rubin 2018, who say Statistical Parity was PPrev Parity. Then again, if taking Statistical Parity synonymous to Demographic Parity, your definition of it being PPR Parity makes more sense I think.

So, any thoughts on this?

Demo for Adult Income dataset

Current get_engine is specific only to Postgres. Parameterize to allow for access to different database servers like Microsoft SQL Server.

Demo for Chicago Police Strategic Subject List Risk

Multiprocessing support when running multiple model_ids

[Feature] Expose the required cols in get_crosstabs signature

attrs_cols = ['age']
score_col = 'predict_proba'
label_col = 'label'

Demo for COMPAS

Demo for lending club dataset

New README

Details on how significance is calculated?

Hi, sorry to be bothering you again!

I am currently looking at src/aequitas/bias.py to find out how the significance is calculated. Here is what I think I understood:

Check if sample group is normally distributed
a. If sample group is normally distributed, calculate whether sample group and ref. group have equal variances using levene
b. If sample group is not normally distributed, calculate whether sample group and ref. group have equal variances using bartlett
a. if both groups have equal variances, perform independent 2 sample t-test
b. if both groups have different variances, perform Welch's t-test

I have two questions:
~~1. Is this correct?~~
I didn't understand what you mean with "sample" group at first. Now I understand that you mean the lists of binary encoded values that say for each entry of each group whether the entry belongs to whichever measure is relevant, fpr or fnr. It canalso be a list of scores, right?

~~2. Do we not check whether the ref. group is normally distributed? If not, why not? If we do, where?~~
I found it! It is in an if condition

aequitas/src/aequitas/bias.py

Line 521 in a61ef33

if attr_value == ref_group:

import wizzard (allow users to select score and label_value columns in a dataset)

[Error] get_disparity_predefined_group() raises AttributeError

I try executing the following code:

bdf = b.get_disparity_predefined_groups(xtab, original_df=df, 
                                        ref_groups_dict={'race':'Caucasian'}, 
                                        alpha=0.05, check_significance=True, 
                                        mask_significance=False)
bdf.style

but it raises an Attribute Error with the following details:

get_disparity_predefined_group()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-22-8a5ae26f1e35> in <module>
      2                                         ref_groups_dict={'race':'Caucasian'},
      3                                         alpha=0.05, check_significance=True,
----> 4                                         mask_significance=False)
      5 bdf.style

C:\Program_Files\Anaconda3\lib\site-packages\aequitas\bias.py in get_disparity_predefined_groups(self, df, original_df, ref_groups_dict, key_columns, input_group_metrics, fill_divbyzero, check_significance, alpha, mask_significance, selected_significance)
    439             self._get_statistical_significance(
    440                 original_df, df, ref_dict=full_ref_dict, score_thresholds=None,
--> 441                 attr_cols=None, alpha=5e-2, selected_significance=selected_significance)
    442 
    443             # if specified, apply T/F mask to significance columns

C:\Program_Files\Anaconda3\lib\site-packages\aequitas\bias.py in _get_statistical_significance(cls, original_df, disparity_df, ref_dict, score_thresholds, attr_cols, alpha, selected_significance)
    745                 for name, func in binary_col_functions.items():
    746                     func = func(thres_unit, 'label_value', thres_val)
--> 747                     original_df.loc[:, name] = original_df.apply(func, axis=1)
    748 
    749         # add columns for error-based significance

C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
   6485                          args=args,
   6486                          kwds=kwds)
-> 6487         return op.get_result()
   6488 
   6489     def applymap(self, func):

C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self)
    149             return self.apply_raw()
    150 
--> 151         return self.apply_standard()
    152 
    153     def apply_empty_result(self):

C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    255 
    256         # compute the result using the series generator
--> 257         self.apply_series_generator()
    258 
    259         # wrap results

C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
    284             try:
    285                 for i, v in enumerate(series_gen):
--> 286                     results[i] = self.f(v)
    287                     keys.append(v.name)
    288             except Exception as e:

C:\Program_Files\Anaconda3\lib\site-packages\aequitas\bias.py in <lambda>(x)
    734 
    735         binary_score = lambda rank_col, label_col, thres: lambda x: (
--> 736                 x[rank_col] <= thres).astype(int)
    737 
    738         binary_col_functions = {'binary_score': binary_score,

AttributeError: ("'bool' object has no attribute 'astype'", 'occurred at index 0')

It works if I set check_significance=False.

My data frame:

entity_id        int64
race            object
score          float64
label_value    float64
rank_abs         int32
rank_pct       float64
dtype: object

Any ideas why this is? I have the up to date Aequitas version this time.

dssg / aequitas Goto Github PK

aequitas's People

Stargazers

Watchers

Forkers

aequitas's Issues

Center for Data Science and Public Policy

Copyright © 2018. The University of Chicago. All Rights Reserved.

To Explore

Recommend Projects

Recommend Topics

Recommend Org

Jobs