GithubHelp home page GithubHelp logo

dssg / aequitas Goto Github PK

View Code? Open in Web Editor NEW
663.0 43.0 111.0 1000.98 MB

Bias Auditing & Fair ML Toolkit

Home Page: http://www.datasciencepublicpolicy.org/aequitas/

License: MIT License

Python 72.71% HTML 5.43% Shell 0.52% CSS 4.18% JavaScript 14.68% Dockerfile 0.01% Jupyter Notebook 0.68% SCSS 1.78%
fairness bias machine-bias fairness-testing

aequitas's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aequitas's Issues

Documentation clarity

I'm working through using Aequitas for a bias audit and am noticing a few issues in the documentation:

  • The Python API preprocessing input statement is incorrect, it should be from Aequitas.preprocessing import preprocess_input_df, i.e. preprocess_input_df should not be callable

  • The documentation says that attribute columns can be categorical or continuous, this is confusing in terms of when cleaning functions are called in the pre-processing step. For instance I had made age categories using the cut function that returned a categorical datatype, which caused the discretize cleaning function to be called which then threw an error. It should be clear that categorical columns must be converted to strings.

  • Import statements are missing for the Plot and Bias modules; the docs should include from aequitas.bias import Bias and from aequitas.plotting import Plot in those example calls.

  • The call for the visualization of the disparity treemaps doesn't work as is, it should be j = p.plot_disparity_all(.... rather than j = aqp.plot_disparity_all(... to match the earlier use of Plot().

Pandas Warnings in CLI

There is a warning associated with copying dataframes. See the bottom of the CLI output.

CLI OUTPUT:

`############################################################################

Center for Data Science and Public Policy

http://dsapp.uchicago.edu

Copyright © 2018. The University of Chicago. All Rights Reserved.

############################################################################


                ___                    _ __            
               /   | ___  ____ ___  __(_) /_____ ______
              / /| |/ _ \/ __ `/ / / / / __/ __ `/ ___/
             / ___ /  __/ /_/ / /_/ / / /_/ /_/ (__  ) 
            /_/  |_\___/\__, /\__,_/_/\__/\__,_/____/  
                          /_/    

                  Bias and Fairness Audit Tool                           

/mnt/data/users/aanisfeld/venv/aequitas_env/lib/python3.4/site-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi.
""")
Welcome to Aequitas-Audit
Fairness measures requested: Statistical Parity,Impact Parity,FPR Parity,FDR Parity
model_id, score_thresholds 1 {'rank_abs': [150, 300, 500, 1000], 'rank_pct': [1.0, 2.0, 5.0, 10.0]}
COUNTS::: gender
F 82601
M 310418
dtype: int64
COUNTS::: race
Black 90769
Hispanic 206528
Other 42483
White 53239
dtype: int64
COUNTS::: age
18-25 138154
26-35 129956
36-45 75097
46-55 34431

55 15381
dtype: int64
audit: df shape from the crosstabs: (88, 26)
get_disparity_predefined_group()
Any NaN?: True
bias_df shape: (88, 46)
Fairness Threshold: 0.8
Fairness Measures: ['Statistical Parity', 'Impact Parity', 'FPR Parity', 'FDR Parity']
get_group_value_fairness: No Parity measure input found on bias_df
{'Unsupervised Fairness': False, 'Supervised Fairness': False, 'Overall Fairness': False}
****************** 0 0.21
1 0.79
2 0.21
3 0.79
4 0.21
5 0.79
6 0.21
7 0.79
8 0.21
9 0.79
10 0.21
11 0.79
12 0.21
13 0.79
14 0.21
15 0.79
16 0.23
17 0.53
18 0.11
19 0.14
20 0.23
21 0.53
22 0.11
23 0.14
24 0.23
25 0.53
26 0.11
27 0.14
28 0.23
29 0.53
...
58 0.35
59 0.33
60 0.19
61 0.09
62 0.04
63 0.35
64 0.33
65 0.19
66 0.09
67 0.04
68 0.35
69 0.33
70 0.19
71 0.09
72 0.04
73 0.35
74 0.33
75 0.19
76 0.09
77 0.04
78 0.35
79 0.33
80 0.19
81 0.09
82 0.04
83 0.35
84 0.33
85 0.19
86 0.09
87 0.04
Name: group_size_pct, Length: 88, dtype: object
/mnt/data/users/aanisfeld/venv/aequitas_env/lib/python3.4/site-packages/pandas-0.21.0-py3.4-linux-x86_64.egg/pandas/core/indexing.py:194: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
/mnt/data/users/aanisfeld/venv/aequitas_env/lib/python3.4/site-packages/aequitas-0.23.0-py3.4.egg/aequitas_cli/utils/report.py:126: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
aux_df.loc[idx, col] = 'Ref'`

create tables with replace

Issue with writing new rows to table when previous rows in column had null value. Postgres expects double precision when col should be boolean per error message below.

  File "/home/ubuntu/.local/lib/python3.6/site-packages/ohio/ext/pandas.py", line 96, in to_sql_method_pg_copy_to
    cursor.copy_expert(sql, csv_buffer)
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type double precision: "True"
CONTEXT:  COPY aequitas_group, line 1, column FOR Parity: "True"

[COMPAS Example] Inconsistency in "Visualizing a single absolute group metric across all population groups"

About https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb

In the part "Visualizing a single absolute group metric across all population groups" you say "We can see from the longer bars that across 'age_cat', 'sex', and 'race' attributes, the groups COMPAS incorrectly predicts as 'low' or 'medium' risk most often are 25-45, Male, and African American.".

image

You display the FNR, which evaluates which groups have incorrectly been assigned 'low' or 'medium' scores as you say - but what you say about the bar length doesn't add up. Are you talking about the absolute numbers instead of the bar lengths? Or am I missing something?

[Resolved] get_disparity_predefined_group() raises TypeError

Hi everyone,

I am a CS student and I am trying out your Aequitas COMPAS analysis example.

When I run https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb on my local machine (Python 3), when trying to execute the cell with code

bdf = b.get_disparity_predefined_groups(xtab, original_df=df, 
                                        ref_groups_dict={'race':'Caucasian', 'sex':'Male', 'age_cat':'25 - 45'}, 
                                        alpha=0.05, check_significance=True, 
                                        mask_significance=True)
bdf.style

I get the error

TypeError: Input must be Index or array-like

With the following details:

get_disparity_predefined_group()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-46-a7be4e5ab5de> in <module>
      2                                         ref_groups_dict={'race':'Caucasian', 'sex':'Male', 'age_cat':'25 - 45'},
      3                                         alpha=0.05, check_significance=True,
----> 4                                         mask_significance=True)
      5 bdf.style

C:\Program_Files\Anaconda3\lib\site-packages\aequitas\bias.py in get_disparity_predefined_groups(self, df, original_df, ref_groups_dict, key_columns, input_group_metrics, fill_divbyzero, check_significance, alpha, mask_significance)
    370             # for predefined groups, use the largest of the predefined groups as
    371             # ref group for score and label value
--> 372             check_significance = df_cols.intersection(check_significance).tolist()
    373 
    374             # compile dictionary of reference groups based on bias-augmented crosstab

C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in intersection(self, other, sort)
   2391         """
   2392         self._validate_sort_keyword(sort)
-> 2393         self._assert_can_do_setop(other)
   2394         other = ensure_index(other)
   2395 

C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _assert_can_do_setop(self, other)
   2590     def _assert_can_do_setop(self, other):
   2591         if not is_list_like(other):
-> 2592             raise TypeError('Input must be Index or array-like')
   2593         return True
   2594 

Does this run on your own Notebook? If so, any guesses why it doesn't work for me?


EDIT:
The problem seems to be that when I installed Aequitas using pip it did not install the up to date version? My version is 38.0 and apparently there is a newer version 38.1.

EDIT2:
Yup, it works now, updating the version of aequitas was it :)

get_disparity_min_metric squashes any score_threshold configuration in groups model

If multiple 'score_thresholds' are configured in group crosstabs, they get squashed if passed to 'get_disparity_min_metric'.

from aequitas.preprocessing import preprocess_input_df
from aequitas.bias import Bias
from aequitas.group import Group
import pandas as pd

protected_df = pd.read_csv('compas_for_aequitas.csv')
g = Group()
score_thresholds = {'rank_abs': [25], 'rank_pct': [50]}
df, attr_cols = preprocess_input_df(protected_df)
groups_model, attr_cols = g.get_crosstabs(df, score_thresholds=score_thresholds, model_id=45, attr_cols=attr_cols)
bias = Bias()
bias_df = bias.get_disparity_min_metric(groups_model, df)
bias_df

The bias_df in this case does end up with more rows than if we were to only give it one score threshold, but they all have the same resulting score_threshold and k values.

This problem does not seem to apply to get_disparity_major_group.

specifying columns check_significance

when trying to specify the 'for' and 'fpr' columns for the check_significance argument of the get_disparity_predefined_groups method of the Bias class, I am running into a number of key errors for columns that have not been included ('fpr', 'precision', and 'tpr' to start). It seems that the user cannot specify a small number of cols for check significance

split out "extra" installation requirements

A basic installation of the aequitas distribution should only satisfy the requirements of the aequitas package:

pip install aequitas

Having done the above, import aequitas_cli should raise an ImportError (either because the package is not installed or because it raises this error due to unsatisfied dependencies).

To install the CLI, (etc.):

pip install aequitas[cli]

Implement individual fairness metrics

This issue it's about creating a new class maybe named "Individual" that implements individual notions of fairness based on label differences (impurities) for similar individuals. Each method of the class just needs a list of dataframes as input (let's consider that in the future we might want to compare multiple train/test sets labels) and finds similar data points and then look to the label distribution of the pair/cluster.

  1. Cynthia Dwork's notion of individual fairness (Lipschitz condition).
    sub methods:
  • create pairwise distance metric in feature space
  • create pairwise distance metric in output space
  • some sort of aggregator
    e.g. count number of times the lipshitz condition is not met for each point, normalize and average?
  1. Matching methods to find similar data points and then calculate label purity.
    sub methods:
  • Create clusters (start with k-means)
  • Calculate purity metric of labels within a cluster (output k metrics)
  • Visualize clusters (if not 2-d use principal components?)
  • Visualize the purity metric per cluster

[COMPAS Example] Inconsistency in "Visualizing default absolute group metrics across all population groups"

In https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb
Under "Visualizing default absolute group metrics across all population groups"
Under "Default absolute group metrics"

You say "We can also see that the model is equally likely to predict a woman as 'high' risk as it is for a man (false positive rate FPR of 0.32 for both Male and Female)."

image

As far as I understood the rate that measures the likelyhood to predict someone as 'high' risk is the PPR (Predicted Positive Rate) and the FPR measures how likely it is that someone was incorrectly predicted as 'high' risk. Did you mean to say "equally likely to incorrectly predict"?

[Solved] meaning of get_crosstabs() param "score_thresholds", "score"

Hi everyone,

I am analyzing whether an API performs equally well for all groups. So I have a data set where I have continuous "score" values between 0 and 1. They are predictions made by the API for it's result being correct. Then there is the "label_value" which is either 0 (meaning that the API result was incorrect) or 1 (meaning that the API result was correct).

Now if I calculate the true positives for instance for women by hand I do:
f_tp = df[((df['sex'] == 'Female') & (df['label_value'] == 1) & (df['score'] >= t))]
where t is my treshold of 0.8. So I consider a score of 0.8 to be a prediction of being correct.

I do similar stuff to calculate tn, fp, fn:

f_tn = df[((df['sex'] == 'Female') & (df['label_value'] == 0) & (df['score'] < t))]
f_fp = df[((df['sex'] == 'Female') & (df['label_value'] == 0) & (df['score'] >= t))]
f_fn = df[((df['sex'] == 'Female') & (df['label_value'] == 1) & (df['score'] < t))]

And I obtain:

  • tp = 1185
  • tn = 43
  • fp = 63
  • fn = 104

Now I would like to make these calculations automatically using Aequitas.

g = Group()
xtab, _ = g.get_crosstabs(df, attr_cols=["sex"], score_thresholds= {'score': [0.8]})

In the result, I get values similar to those I got by hand but they are switched around :

  • tp = 104
  • tn = 63
  • fp = 43
  • fn = 1185

Here, tp is what I considered to be fn and fn is what I considered to be tp. What I considered to be tn is fp and what I considered to be fp is tn.

I can't quite wrap my mind around this. What am I missing? I think I might be confusing something? I'd be so happy if you could give me a hint.

Thanks!

Separate Statistical Significance DF

To Explore

May want to have an option return of information based on statistical significance parameters and outputs such as:

  • Group size
  • P-value
  • T/F significant
  • Which method used for equal variance calculation
  • Which method used for t-test, equal variance with ref group Y/N
  • Which group compared to if not ref group.
    • Currently can only compares to ref group, may want to change that as well

Absolute vs. relative disparity and the implications for fairness results

According to Verma & Rubin 2018 PPV + FDR = 1 and thus if there is PPV Parity there is also FDR Parity.
However, Aequitas sais there is no FDR Parity, but PPV Parity:
image
I understand that using Aequitas it looks like there was no FDR Parity because Aequitas looks at relative disparities while Verma & Rubin talk about absolute differences. Why does Aequitas prefer relative disparities?

Test fails

Test: test_all_0_scores_4 fails.
The reason for it is that in group.get_crosstabs there is a line calling for df['score'].value_counts()[1.0], however the values are all 0s and so there is no threshold.
Not sure if the fix is to change the test or the check in group.

Confusion of Statistical Parity and Impact Parity?

Here: https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb
Under "How do I assess model fairness?" > "Pairities Calcuated" it sais

Predicted Positive Ratio k Parity Statistical Parity

According to your definitions Predicted Positive Ratio k is PPR and Predicted Positive Ratio g is PPrev. So Aequitas defines Statistical Parity as PPR Parity That is not the definition according to Verma & Rubin 2018, who say Statistical Parity was PPrev Parity. Then again, if taking Statistical Parity synonymous to Demographic Parity, your definition of it being PPR Parity makes more sense I think.

So, any thoughts on this?

Details on how significance is calculated?

Hi, sorry to be bothering you again!

I am currently looking at src/aequitas/bias.py to find out how the significance is calculated. Here is what I think I understood:

  1. Check if sample group is normally distributed
  2. a. If sample group is normally distributed, calculate whether sample group and ref. group have equal variances using levene
    b. If sample group is not normally distributed, calculate whether sample group and ref. group have equal variances using bartlett
  3. a. if both groups have equal variances, perform independent 2 sample t-test
    b. if both groups have different variances, perform Welch's t-test

I have two questions:
1. Is this correct?
I didn't understand what you mean with "sample" group at first. Now I understand that you mean the lists of binary encoded values that say for each entry of each group whether the entry belongs to whichever measure is relevant, fpr or fnr. It canalso be a list of scores, right?

2. Do we not check whether the ref. group is normally distributed? If not, why not? If we do, where?
I found it! It is in an if condition

if attr_value == ref_group:

[Error] get_disparity_predefined_group() raises AttributeError

I try executing the following code:

bdf = b.get_disparity_predefined_groups(xtab, original_df=df, 
                                        ref_groups_dict={'race':'Caucasian'}, 
                                        alpha=0.05, check_significance=True, 
                                        mask_significance=False)
bdf.style

but it raises an Attribute Error with the following details:

get_disparity_predefined_group()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-22-8a5ae26f1e35> in <module>
      2                                         ref_groups_dict={'race':'Caucasian'},
      3                                         alpha=0.05, check_significance=True,
----> 4                                         mask_significance=False)
      5 bdf.style

C:\Program_Files\Anaconda3\lib\site-packages\aequitas\bias.py in get_disparity_predefined_groups(self, df, original_df, ref_groups_dict, key_columns, input_group_metrics, fill_divbyzero, check_significance, alpha, mask_significance, selected_significance)
    439             self._get_statistical_significance(
    440                 original_df, df, ref_dict=full_ref_dict, score_thresholds=None,
--> 441                 attr_cols=None, alpha=5e-2, selected_significance=selected_significance)
    442 
    443             # if specified, apply T/F mask to significance columns

C:\Program_Files\Anaconda3\lib\site-packages\aequitas\bias.py in _get_statistical_significance(cls, original_df, disparity_df, ref_dict, score_thresholds, attr_cols, alpha, selected_significance)
    745                 for name, func in binary_col_functions.items():
    746                     func = func(thres_unit, 'label_value', thres_val)
--> 747                     original_df.loc[:, name] = original_df.apply(func, axis=1)
    748 
    749         # add columns for error-based significance

C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
   6485                          args=args,
   6486                          kwds=kwds)
-> 6487         return op.get_result()
   6488 
   6489     def applymap(self, func):

C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self)
    149             return self.apply_raw()
    150 
--> 151         return self.apply_standard()
    152 
    153     def apply_empty_result(self):

C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    255 
    256         # compute the result using the series generator
--> 257         self.apply_series_generator()
    258 
    259         # wrap results

C:\Program_Files\Anaconda3\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
    284             try:
    285                 for i, v in enumerate(series_gen):
--> 286                     results[i] = self.f(v)
    287                     keys.append(v.name)
    288             except Exception as e:

C:\Program_Files\Anaconda3\lib\site-packages\aequitas\bias.py in <lambda>(x)
    734 
    735         binary_score = lambda rank_col, label_col, thres: lambda x: (
--> 736                 x[rank_col] <= thres).astype(int)
    737 
    738         binary_col_functions = {'binary_score': binary_score,

AttributeError: ("'bool' object has no attribute 'astype'", 'occurred at index 0')

It works if I set check_significance=False.

My data frame:

entity_id        int64
race            object
score          float64
label_value    float64
rank_abs         int32
rank_pct       float64
dtype: object

Any ideas why this is? I have the up to date Aequitas version this time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.